LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs Paper • 2606.06286 • Published 6 days ago • 8
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 16 days ago • 12