OScaR: The Occam's Razor for Extreme KV Cache Quantization in LLMs and Beyond Paper • 2605.19660 • Published 4 days ago • 39
World Action Models: The Next Frontier in Embodied AI Paper • 2605.12090 • Published 11 days ago • 64
BandPO: Bridging Trust Regions and Ratio Clipping via Probability-Aware Bounds for LLM Reinforcement Learning Paper • 2603.04918 • Published Mar 5 • 56
R3-RAG: Learning Step-by-Step Reasoning and Retrieval for LLMs via Reinforcement Learning Paper • 2505.23794 • Published May 26, 2025 • 1
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242