Debiased Model-based Representations for Sample-efficient Continuous Control Paper • 2605.11711 • Published 3 days ago • 8
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning Paper • 2504.00891 • Published Apr 1, 2025 • 14
SEABO: A Simple Search-Based Method for Offline Imitation Learning Paper • 2402.03807 • Published Feb 6, 2024
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model Paper • 2311.13231 • Published Nov 22, 2023 • 28