arxiv:2605.12500
Zhongang Cai
·
AI & ML interests
Multimodal, Video Reasoning, Spatial Intelligence, Virtual Humans.
Recent Activity
upvoted a paper 8 days ago
From Pixels to Words -- Towards Native One-Vision Models at Scale liked a dataset 22 days ago
sensenova/SenseNova-SI-8M