Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning Paper • 2606.04923 • Published 4 days ago • 37
Guiding LLM Post-training Data Engineering with Model Internals from Sparse Autoencoders Paper • 2605.27354 • Published 12 days ago • 15