view article Article Distribution Matching Prevents Mode Collapse in Training Reasoning Models Mar 17 • 2
Running on CPU Upgrade Featured 3.13k The Smol Training Playbook 📚 3.13k The secrets to building world-class LLMs
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30, 2025 • 51
Running 3.82k The Ultra-Scale Playbook 🌌 3.82k The ultimate guide to training LLM on large GPU Clusters