Instructions to use SeanScripts/NVLM-D-72B-nf4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SeanScripts/NVLM-D-72B-nf4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="SeanScripts/NVLM-D-72B-nf4", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import NVLM_D model = NVLM_D.from_pretrained("SeanScripts/NVLM-D-72B-nf4", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SeanScripts/NVLM-D-72B-nf4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SeanScripts/NVLM-D-72B-nf4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeanScripts/NVLM-D-72B-nf4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/SeanScripts/NVLM-D-72B-nf4
- SGLang
How to use SeanScripts/NVLM-D-72B-nf4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SeanScripts/NVLM-D-72B-nf4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeanScripts/NVLM-D-72B-nf4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SeanScripts/NVLM-D-72B-nf4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SeanScripts/NVLM-D-72B-nf4", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use SeanScripts/NVLM-D-72B-nf4 with Docker Model Runner:
docker model run hf.co/SeanScripts/NVLM-D-72B-nf4
Converted using BitsAndBytes to NF4 (with double quantization) from nvidia/NVLM-D-72B. The model belongs to Nvidia and has the Creative Commons Attribution Non Commercial 4.0 license.
This quantization seems to work fine when only using text, but I haven't been able to get coherent responses when an image is included. Work in progress, I could use some help figuring this out.
I made a slight modification to the modeling_intern_vit.py file by replacing a few occurrences like torch.matmul(x, linearmodule.weight.t()) + linearmodule.bias with linearmodule(x). I'm not sure why these linear module applications were written this way, when it's equivalent but fails when the module is quantized because it's accessing the weight directly instead of using the module. Making this change makes the model "work" by at least not giving any errors when trying to run it, but I still haven't been able to get coherent outputs when sending images.
It might have something to do with how the QKV modules were packed, not playing well with quantization. I'll look into how they can be split into regular Q, K, and V tensors later. Or maybe someone else would like to help.
I also modified the generate call in modeling_nvlm_d.py slightly by having it not force use_cache=True, because this was causing an issue for me with cache tensors being on the wrong GPU if I tried to use the model more than once.
Requires at least 48 GB of VRAM. Probably still can't have very long context with only 48 GB though.
- Downloads last month
- 6
Model tree for SeanScripts/NVLM-D-72B-nf4
Base model
nvidia/NVLM-D-72B