view article Article AI evals are becoming the new compute bottleneck evaleval • 18 days ago • 26
Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering Paper • 2604.08224 • Published Apr 9 • 51
view article Article A New Framework for Evaluating Voice Agents (EVA) ServiceNow-AI • Mar 24 • 93
OpenResearcher: A Fully Open Pipeline for Long-Horizon Deep Research Trajectory Synthesis Paper • 2603.20278 • Published Mar 17 • 98
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 ggerganov, ngxson, allozaur, lysandre, victor, julien-c • Feb 20 • 505
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 57
A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Paper • 2510.23587 • Published Oct 27, 2025 • 67
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published Dec 18, 2025 • 222
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 48
view article Article Groq on Hugging Face Inference Providers 🔥 +3 benank-groq, hozen, celinah, Wauplin, sbrandeis • Jun 16, 2025 • 44
view article Article Building the Hugging Face MCP Server +2 evalstate, julien-c, coyotte508, abidlabs • Jul 10, 2025 • 67