RLPR: Extrapolating RLVR to General Domains without Verifiers Paper β’ 2506.18254 β’ Published Jun 23, 2025 β’ 34
Reinforcement-aware Knowledge Distillation for LLM Reasoning Paper β’ 2602.22495 β’ Published Feb 26 β’ 5
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper β’ 2602.01058 β’ Published Feb 1 β’ 44
Running 342 LLM Embeddings Explained: A Visual and Intuitive Guide π 342 How Language Models Turn Text into Meaning, From Traditional