From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Paranioar
AI & ML interests
Vision-and-Language, Parameter-efficient Transfer Learning, Multi-modal Large Language Model
Recent Activity
liked a model 2 days ago
sensenova/SenseNova-U1-8B-MoT-Infographic authored a paper 4 days ago
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation authored a paper 4 days ago
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?