Instructions to use HuggingFaceM4/idefics-9b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use HuggingFaceM4/idefics-9b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="HuggingFaceM4/idefics-9b-instruct")

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("HuggingFaceM4/idefics-9b-instruct")
model = AutoModelForImageTextToText.from_pretrained("HuggingFaceM4/idefics-9b-instruct")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use HuggingFaceM4/idefics-9b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "HuggingFaceM4/idefics-9b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/HuggingFaceM4/idefics-9b-instruct

SGLang

How to use HuggingFaceM4/idefics-9b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "HuggingFaceM4/idefics-9b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "HuggingFaceM4/idefics-9b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "HuggingFaceM4/idefics-9b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use HuggingFaceM4/idefics-9b-instruct with Docker Model Runner:
```
docker model run hf.co/HuggingFaceM4/idefics-9b-instruct
```

Error while loading the model

#12

by AL58763 - opened Jan 18, 2024

Discussion

AL58763

Jan 18, 2024

I'm coming accross an error while trying to load the idefics-9b-instruct model. Even after downloading the model weights and loading them using model = IdeficsForVisionText2Text.from_pretrained(model_path, torch_dtype=torch.bfloat16).to("cuda:1") I'm getting this error:

OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory model_dir/idefics_9b/.

I haven't changed the name of the weigths after downloading. So not sure what's causing this error.

AL58763 changed discussion title from Is this model still maintained? I'm seeing no replies on the questions to Error while loading the model Jan 18, 2024

AL58763

Jan 18, 2024

I got it working by combining the weights into a single pytorch_model.bin file. I'm trying to run the model on v100 gpu but getting Cuda out of memory error. I have two v100 gpus available and i'm loading the model on a single gpu. Is it possible to split the weights among two gpus?

VictorSanh

Jan 18, 2024

Hi @AL58763
can you ls the content of your model_dir/idefics_9b/ folder? i am assuming that model_path = model_dir/idefics_9b/

AL58763

Jan 19, 2024

•

edited Jan 19, 2024

Hi @AL58763
can you ls the content of your model_dir/idefics_9b/ folder? i am assuming that model_path = model_dir/idefics_9b/

Thanks for the reply. Yes, model_path = model_dir/idefics_9b/. Here are the contents of the folder:

config.json
model-00002-of-00002.safetensors
pytorch_model-00002-of-00002.bin
tokenizer.json
generation_config.json
preprocessor_config.json
pytorch_model.bin
tokenizer.model
model-00001-of-00002.safetensors
pytorch_model-00001-of-00002.bin
tokenizer_config.json
urls.txt

As I said above, I combined both the pytorch bin files into one pytorch_model.bin

AL58763 changed discussion status to closed Jan 19, 2024

AL58763 changed discussion status to open Jan 19, 2024

AL58763

Jan 19, 2024

@VictorSanh i can provide local paths of images in the prompts as well right? I provided a local image and then checked inputs['pixel_values'], its just black pixels.

VictorSanh

Jan 19, 2024

I think you are missing some files in your folder. for instance model.safetensors.index.json is the file that maps each weight to a specific location. Not having this file means that the loading logic does not know where to get the weights (i.e. model-00002-of-00002.safetensors or model-00001-of-00002.safetensors) and as such you had to create a merged pytorch_model.bin. There might be other files missing but that's the first one that came to my mind.

VictorSanh

Jan 19, 2024

@VictorSanh i can provide local paths of images in the prompts as well right?

On top of my mind, I think you need to load the image (and put it into a pil object) yourself if it's local.

AL58763

Jan 22, 2024

Thanks @VictorSanh it worked!

VictorSanh

Jan 22, 2024

great! i'll close that discussion. feel free to re-open (or create another one) if you have other questions!

VictorSanh changed discussion status to closed Jan 22, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment