WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation Paper • 2605.25874 • Published 5 days ago • 97
LARYBench Collection Models trained in LARYBench. https://huggingface.co/papers/2604.11689 • 2 items • Updated Apr 21
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks Paper • 2604.11778 • Published Apr 13 • 9
EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans Paper • 2512.01340 • Published Dec 1, 2025
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published Apr 13 • 21
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published Apr 13 • 21
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment Paper • 2604.11689 • Published Apr 13 • 21
LongCat-Flash-Prover: Advancing Native Formal Reasoning via Agentic Tool-Integrated Reinforcement Learning Paper • 2603.21065 • Published Mar 22 • 78