pinned
Running
Agents
9
SGI-Bench Leaderboard
🥇
Scientific General Intelligence of LLMs/vLLMs
None defined yet.
Scaling the Horizon, Not the Parameters: Reaching Trillion-Parameter Performance with a 35B Agent
ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research
Scientific General Intelligence of LLMs/vLLMs
Open, science-focus leaderboards benchmarking LLMs and VLMs
Lightweight harness for tool-using LLM agents.
Submit and validate a ResearchClawBench task ZIP