Long Context Pre-Training with Lighthouse Attention Paper • 2605.06554 • Published 24 days ago • 31
Efficient Pre-Training with Token Superposition Paper • 2605.06546 • Published 24 days ago • 46