Combining On-Policy Optimization and Distillation for Long-Context Reasoning in Large Language Models Paper • 2605.12227 • Published 7 days ago • 1