Instructions to use google/owlvit-base-patch32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/owlvit-base-patch32 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("zero-shot-object-detection", model="google/owlvit-base-patch32")# Load model directly from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection processor = AutoProcessor.from_pretrained("google/owlvit-base-patch32") model = AutoModelForZeroShotObjectDetection.from_pretrained("google/owlvit-base-patch32") - Notebooks
- Google Colab
- Kaggle
How can I ensemble multiple text/image queries?
#7
by flavourabbit - opened
Hello,
I am wondering how can I ensemble multiple text/image queries?
I assume there are two possibilities.
- Averaging probs(= sigmoid of logit) of different queries
- Plug-in averaged token to model
From my perspective,
Text-ensemble should be done with 1’s manner and image-ensemble for 2.
(bc I think averaging text token could mess thr embedding)
Please share your opinion regarding this!