RL checkpoints of Octopus-8B and baselines of paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation
Yi Ding
Tuwhy
AI & ML interests
None yet
Organizations
Sherlock
Series model of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"
-
Tuwhy/Llama-3.2V-11B-Sherlock-SFT
Image-Text-to-Text • 11B • Updated • 4 -
Tuwhy/Llama-3.2V-11B-Sherlock-Offline
Image-Text-to-Text • 11B • Updated • 3 -
Tuwhy/Llama-3.2V-11B-Sherlock-iter1
Image-Text-to-Text • 11B • Updated • 3 -
Tuwhy/Llama-3.2V-11B-Sherlock-iter2
Image-Text-to-Text • 11B • Updated • 11 • 2
Octopus
RL checkpoints of Octopus-8B and baselines of paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation
Sherlock
Series model of paper "Sherlock: Self-Correcting Reasoning in Vision-Language Models"
-
Tuwhy/Llama-3.2V-11B-Sherlock-SFT
Image-Text-to-Text • 11B • Updated • 4 -
Tuwhy/Llama-3.2V-11B-Sherlock-Offline
Image-Text-to-Text • 11B • Updated • 3 -
Tuwhy/Llama-3.2V-11B-Sherlock-iter1
Image-Text-to-Text • 11B • Updated • 3 -
Tuwhy/Llama-3.2V-11B-Sherlock-iter2
Image-Text-to-Text • 11B • Updated • 11 • 2
models 17
Tuwhy/Octopus-8B
Image-Text-to-Text • 9B • Updated • 50
Tuwhy/Qwen3-VL-8B-GRPO-n16
9B • Updated
Tuwhy/Qwen3-VL-8B-DAPO-n16
9B • Updated • 2
Tuwhy/Qwen3-VL-8B-SRPO-n16
9B • Updated
Tuwhy/Qwen3-VL-8B-SCPO-random
9B • Updated • 2
Tuwhy/Qwen3-VL-8B-SRPO-n8
9B • Updated • 4
Tuwhy/Qwen3-VL-8B-GSPO-n8
9B • Updated • 1
Tuwhy/Qwen3-VL-8B-SCPO-no-aug
9B • Updated
Tuwhy/Qwen3-VL-8B-GSPO-n16
9B • Updated • 7
Tuwhy/Qwen3-VL-8B-GRPO-n8
9B • Updated
datasets 7
Tuwhy/mathverse_testmini
Viewer • Updated • 3.94k • 5
Tuwhy/mmmu_val
Viewer • Updated • 900 • 8
Tuwhy/mathvision
Viewer • Updated • 304 • 15
Tuwhy/mathvista_testmini
Viewer • Updated • 1k • 6
Tuwhy/virl_39k
Viewer • Updated • 36.6k • 3
Tuwhy/MIS_Test
Updated • 94
Tuwhy/MIS_Train
Viewer • Updated • 3.93k • 14