Small Vectors, Big Effects: A Mechanistic Study of RL-Induced Reasoning via Steering Vectors Paper • 2509.06608 • Published Sep 8, 2025
Train One Sparse Autoencoder Across Multiple Sparsity Budgets to Preserve Interpretability and Accuracy Paper • 2505.24473 • Published May 30, 2025
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders Paper • 2606.12138 • Published 8 days ago • 6
Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders Paper • 2606.12138 • Published 8 days ago • 6
Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders Paper • 2606.10029 • Published 9 days ago • 12
Trust-Region Behavior Blending for On-Policy Distillation Paper • 2605.31159 • Published 20 days ago • 66
Next Embedding Prediction Makes World Models Stronger Paper • 2603.02765 • Published Mar 3 • 21
Running Featured 25 Chasing the Counting Manifold in Open LLMs 📚 25 Counting manifolds in open LLMs from behavior to SAEs.
ESSA: Evolutionary Strategies for Scalable Alignment Paper • 2507.04453 • Published Jul 6, 2025 • 5
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published Feb 6 • 75
T-pro 2.0: An Efficient Russian Hybrid-Reasoning Model and Playground Paper • 2512.10430 • Published Dec 11, 2025 • 119
Unveiling Intrinsic Dimension of Texts: from Academic Abstract to Creative Story Paper • 2511.15210 • Published Nov 19, 2025 • 91
Enhancing Vision-Language Model Training with Reinforcement Learning in Synthetic Worlds for Real-World Success Paper • 2508.04280 • Published Aug 6, 2025 • 35
Teach Old SAEs New Domain Tricks with Boosting Paper • 2507.12990 • Published Jul 17, 2025 • 12
Teach Old SAEs New Domain Tricks with Boosting Paper • 2507.12990 • Published Jul 17, 2025 • 12
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation Paper • 2505.22255 • Published May 28, 2025 • 24
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation Paper • 2505.22255 • Published May 28, 2025 • 24
Train Sparse Autoencoders Efficiently by Utilizing Features Correlation Paper • 2505.22255 • Published May 28, 2025 • 24 • 2