Running on CPU Upgrade Agents 60 Open CoT Leaderboard 🥇 60 Track, rank and evaluate open LLMs' CoT quality
Running Agents 230 BigCodeBench Leaderboard 🥇 230 Explore code-generation model leaderboards and task details