Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Paper • 2505.15431 • Published May 21, 2025 • 2
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation Paper • 2604.18486 • Published Apr 20 • 95
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31, 2025 • 21
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought Paper • 2505.15431 • Published May 21, 2025 • 2
TransMamba: Flexibly Switching between Transformer and Mamba Paper • 2503.24067 • Published Mar 31, 2025 • 21
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published Jan 5, 2025 • 26
Scaling Laws for Floating Point Quantization Training Paper • 2501.02423 • Published Jan 5, 2025 • 26