Yuting6/geoqa-r1v-8k-mixup
Viewer • Updated • 11.1k • 136
The model was presented in the paper Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning. You can also find the paper on arXiv: Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning (arXiv:2506.09736)
Vision-Matters is a simple visual perturbation framework that can be easily integrated into existing post-training pipelines including SFT, DPO, and GRPO. Our findings highlight the critical role of visual perturbation: better reasoning begins with better seeing.
Base model
Qwen/Qwen2.5-VL-7B-Instruct