Tmax Collection Data and models associated with "Tmax: A simple recipe for terminal agents". paper: https://arxiv.org/abs/2606.23321 • 23 items • Updated about 1 hour ago • 5
Beyond the Current Observation: Evaluating Multimodal Large Language Models in Controllable Non-Markov Games Paper • 2606.19338 • Published 6 days ago • 46
EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts Paper • 2606.18967 • Published 6 days ago • 24
The Reward Was in Your Data All Along: Correcting Flow Matching with Discriminator-Guided RL Paper • 2606.19162 • Published 6 days ago • 20
Learning from the Self-future: On-policy Self-distillation for dLLMs Paper • 2606.18195 • Published 7 days ago • 73
TRIAGE: Dialectical Reasoning for Explainable Risk Prediction on Irregularly Sampled Medical Time Series with LLMs Paper • 2606.09030 • Published 15 days ago • 30
OPD-Evolver: Cultivating Holistic Agent Evolver via On-Policy Distillation Paper • 2606.17628 • Published 7 days ago • 27
GameCraft-Bench: Can Agents Build Playable Games End-to-End in a Real Game Engine? Paper • 2606.17861 • Published 7 days ago • 53
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 7 days ago • 200
Zone of Proximal Policy Optimization: Teacher in Prompts, Not Gradients Paper • 2606.18216 • Published 7 days ago • 58
DavidAU/Qwen3.6-40B-Claude-4.6-Opus-Deckard-Heretic-Uncensored-Thinking-NEO-CODE-Di-IMatrix-MAX-GGUF Image-Text-to-Text • 39B • Updated 12 days ago • 585k • 429