khtsly's picture

khtsly

khtsly

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

upvoted a paper 5 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

upvoted a paper 6 days ago

HRM-Text: Efficient Pretraining Beyond Scaling

View all activity

Organizations

None yet

upvoted a paper 3 days ago

LLMs as Noisy Channels: A Shannon Perspective on Model Capacity and Scaling Laws

Paper • 2605.23901 • Published 7 days ago • 11

upvoted a paper 5 days ago

Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information

Paper • 2605.11609 • Published 17 days ago • 193

upvoted 2 papers 6 days ago

HRM-Text: Efficient Pretraining Beyond Scaling

Paper • 2605.20613 • Published 9 days ago • 89

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Paper • 2605.22791 • Published 8 days ago • 30

upvoted a paper 7 days ago

Generative Recursive Reasoning

Paper • 2605.19376 • Published 9 days ago • 29

upvoted a paper about 1 month ago

Attention Sink in Transformers: A Survey on Utilization, Interpretation, and Mitigation

Paper • 2604.10098 • Published Apr 11 • 82

upvoted 2 papers 2 months ago

MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

Paper • 2603.23516 • Published Mar 6 • 50

Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context

Paper • 2603.15653 • Published Mar 7 • 12

upvoted 2 collections 3 months ago

Qwen3.5-Abliterated-Opus-4.6-Distilled

Qwen3.5-Abliterated • 0 items • Updated Apr 26 • 1

Qwen3.5-Opus-4.6-Distilled

0 items • Updated Apr 26 • 2