ACC: Compiling Agent Trajectories for Long-Context Training Paper • 2605.21850 • Published 9 days ago • 59
Beyond Accuracy: Unveiling Inefficiency Patterns in Tool-Integrated Reasoning Paper • 2604.05404 • Published Apr 7 • 43
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance Paper • 2502.16944 • Published Feb 24, 2025 • 10