SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work? Paper • 2606.07682 • Published 17 days ago • 2
SWE-Marathon: Can Agents Autonomously Complete Ultra-Long-Horizon Software Work? Paper • 2606.07682 • Published 17 days ago • 2
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites Paper • 2504.11543 • Published Apr 15, 2025 • 2
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 41 items • Updated Mar 2 • 154
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites Paper • 2504.11543 • Published Apr 15, 2025 • 2
System 2 Attention (is something you might need too) Paper • 2311.11829 • Published Nov 20, 2023 • 43