Integrate with Sentence Transformers v5.4

#10
by tomaarsen HF Staff - opened
Nomic AI org
β€’
edited 5 days ago

Hello!

Pull Request overview

  • Integrate nomic-embed-vision-v1.5 with Sentence Transformers v5.4+

Details

The integration uses a Transformer -> Pooling(cls) -> Normalize pipeline with modality_config set to {"image": {"method": "forward", "method_output_name": "last_hidden_state"}} so the model accepts image inputs. A processor_config.json was added to ensure AutoProcessor loads the CLIPImageProcessor instead of falling back to a tokenizer (since model_type: nomic_bert would otherwise resolve to BertTokenizer).

Note: this model requires https://huggingface.co/nomic-ai/nomic-bert-2048/discussions/23 to fix three transformers v5 compatibility issues in the shared modeling code:

  1. Adding self.post_init() to NomicVisionModel.__init__ (required for all_tied_weights_keys)
  2. Lazy recomputation of rotary position embeddings in NomicVisionRotaryEmbeddingCat.get_embed (non-persistent buffers are not materialized when from_pretrained initializes on torch.device("meta") in v5)
  3. Replacing self.norm_factor buffer with inline math.sqrt(self.head_dim) in NomicAttentionPooling and NomicBertAttention (same meta-device issue)

Added files:

  • modules.json: Defines the Transformer -> Pooling -> Normalize pipeline
  • config_sentence_transformers.json: ST model config with cosine similarity
  • sentence_bert_config.json: Transformer config with image modality_config
  • 1_Pooling/config.json: CLS pooling mode, 768-dim embeddings
  • processor_config.json: Ensures AutoProcessor loads CLIPImageProcessor

Modified files:

  • config.json: Fixed n_inner from 2048.0 (float) to 2048 (int) for transformers v5 strict validation
  • README.md: Added sentence-transformers library tag, and a "Using Sentence Transformers" usage section

Here's a script that uses both this PR and the companion PR:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-vision-v1.5", revision="refs/pr/10", model_kwargs={"code_revision": "refs/pr/23"}, trust_remote_code=True)

embeddings = model.encode("http://images.cocodataset.org/val2017/000000039769.jpg")
print(embeddings.shape)
# (768,)

Once both are merged, then the revision/model_kwargs can be excluded.

Note that none of the old behaviour is affected/changed. It only adds an additional way to run this model in a familiar and common format.

  • Tom Aarsen
tomaarsen changed pull request status to open

image
I am getting nan when i try to replicate this with sentence-transformers==5.4.0

Nomic AI org
β€’
edited 2 days ago

Hello!

This is what I get, for reference:

>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("nomic-ai/nomic-embed-vision-v1.5", revision="refs/pr/10", model_kwargs={"code_revision": "refs/pr/23"}, trust_remote_code=True)
>>> embeddings = model.encode("http://images.cocodataset.org/val2017/000000039769.jpg")
>>> print(embeddings)
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 211/211 [00:00<00:00, 5870.40it/s]
[ 4.71330713e-03 -2.53534522e-02  6.63616322e-03 -2.95666978e-02
 -4.34983559e-02 -1.22364080e-02  2.38989759e-03 -3.60762812e-02
. . .
 -4.39791530e-02 -3.05440221e-02 -1.93784963e-02 -1.76065695e-02
 -3.54587808e-02 -4.97163460e-02  7.33873341e-03 -3.87372449e-02]

Can you share your torch and transformers versions?
I'm using torch 2.10.0+cu128 and transformers 5.5.0. I know some older torch versions had some issues regarding nan, although you shouldn't need a version as new as 2.10.0.

  • Tom Aarsen
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment