F2P Decoder

Hugging Face AutoModel wrapper for the SigLIP2 feature-to-pixel decoder used in this repository.

import torch
from transformers import AutoModel

model = AutoModel.from_pretrained(
    "toilaluan/f2p_decoder",
    trust_remote_code=True,
).eval()

features = torch.randn(1, 257, 1152)
reconstruction = model(features)
print(reconstruction.shape)  # (1, 3, 224, 224)

The model expects SigLIP2 patch features with a CLS token, for example from google/siglip2-so400m-patch14-224. The output is an image tensor in the decoder's reconstructed pixel space.

Source .pt checkpoint: nyu-visionx/siglip2_decoder/model.pt.

Downloads last month
22
Safetensors
Model size
0.4B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support