Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 4 days ago • 75
Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents Paper • 2604.14004 • Published 3 days ago • 24
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 3 days ago • 12
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated 3 days ago • 12
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 5 days ago • 11
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated 5 days ago • 11
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated 6 days ago • 18
Sangsang/feedback_asymmetric_fixed_ema_Llama-3.1-8B-Instruct_bw0p5_fw0p5_ema0p999_ep30_v2 Text Generation • Updated 6 days ago • 18