From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing Paper • 2605.15181 • Published 11 days ago • 12
SMART Collection Your Single-Vector Embedding Model is SMARTer Than You Think • 4 items • Updated 17 days ago
SMART Collection Your Single-Vector Embedding Model is SMARTer Than You Think • 4 items • Updated 17 days ago
Exploration and Exploitation Errors Are Measurable for Language Model Agents Paper • 2604.13151 • Published Apr 14 • 25
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published Mar 18 • 14
Unified Spatio-Temporal Token Scoring for Efficient Video VLMs Paper • 2603.18004 • Published Mar 18 • 14
Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding Paper • 2601.10611 • Published Jan 15 • 35
Contamination Detection for VLMs using Multi-Modal Semantic Perturbation Paper • 2511.03774 • Published Nov 5, 2025 • 13