Video-Text-to-Text
Transformers
Safetensors
English
qwen2_5_vl
image-text-to-text
video-understanding
reasoning
multimodal
reinforcement-learning
question-answering
text-generation-inference
Instructions to use Falconss1/VideoThinker-R1-3B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Falconss1/VideoThinker-R1-3B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("Falconss1/VideoThinker-R1-3B") model = AutoModelForImageTextToText.from_pretrained("Falconss1/VideoThinker-R1-3B") - Notebooks
- Google Colab
- Kaggle
Improve model card and link to paper
#1
by nielsr HF Staff - opened
This PR improves the model card for VideoThinker-R1-3B. Key changes include:
- Linked the model to the corresponding research paper on Hugging Face: Beyond Perceptual Shortcuts: Causal-Inspired Debiasing Optimization for Generalizable Video Reasoning in Lightweight MLLMs.
- Replaced the full paper abstract with a concise summary of the framework and results to improve readability.
- Maintained and organized metadata tags for better discoverability.
- Provided clear links to the official code repository.
Falconss1 changed pull request status to merged