PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models Paper • 2606.09697 • Published 2 days ago • 5
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models Paper • 2606.09697 • Published 2 days ago • 5
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 2 days ago • 6
BrainSurgery: Reproducible and Reliable Declarative Weight Manipulations for Model Editing and Upcycling Paper • 2606.09707 • Published 2 days ago • 6
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs Paper • 2606.06286 • Published 7 days ago • 8
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs Paper • 2606.06286 • Published 7 days ago • 8
LLMs Can Leak Training Data But Do They Want To? A Propensity-Aware Evaluation of Memorization in LLMs Paper • 2606.06286 • Published 7 days ago • 8
DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors Paper • 2512.04799 • Published Dec 4, 2025
SommBench: Assessing Sommelier Expertise of Language Models Paper • 2603.12117 • Published Mar 12 • 1
Confidence and Calibration of Activation Oracles for Reliable Interpretation of Language Model Internals Paper • 2605.26045 • Published 17 days ago • 12