Instructions to use google/pix2struct-large with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use google/pix2struct-large with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "image-to-text" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("image-to-text", model="google/pix2struct-large")# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("google/pix2struct-large") model = AutoModelForImageTextToText.from_pretrained("google/pix2struct-large") - Notebooks
- Google Colab
- Kaggle
Processor defaults to VQA
#1
by JakeMXT - opened
The image processor of this model has VQA enabled by default:
`Pix2StructProcessor:
- image_processor: Pix2StructImageProcessor {
"do_convert_rgb": true,
"do_normalize": true,
"image_processor_type": "Pix2StructImageProcessor",
"is_vqa": true,
"max_patches": 4096,
"patch_size": {
"height": 16,
"width": 16
},
"processor_class": "Pix2StructProcessor"
}
I am having to set it to False before fine-tuning:
processor.image_processor.is_vqa = False
Whereas the base model already has this set to False by default.
Is this expected?