Rethinking the Multilingual Reasoning Gap with Layer Swap
Abstract
Large-scale multilingual reasoning experiments show a reduced native reasoning gap when using aligned fine-tuning and layer swapping techniques to preserve language-specific reasoning while maintaining strong performance.
Recent reasoning Large Language Models produce a chain-of-thought (CoT) predominantly in English, even when prompted in non-English languages. Prior work suggests that forcing the CoT to remain in the input language (native reasoning) substantially degrades performance relative to allowing the model to reason in English before answering in the input language (English-pivoted reasoning). However, most studies of this native reasoning gap rely on inference-time interventions or limited native-language training data. We revisit this comparison at a larger scale and under comparable supervision. We construct long multilingual reasoning datasets across six languages (English, French, German, Spanish, Chinese and Swahili); fine-tune specialists in both native and English-pivoted regimes on top of Qwen/Qwen3-8B-Base, and evaluate across mathematics, science, general knowledge, and code. In this setting, the average native reasoning gap shrinks to 1.9--3.5\% across the five non-English languages, considerably smaller than previously reported. Weight-space analysis of the native specialists reveals aligned fine-tuning updates in the middle layers and divergence in the outer layers. This points to a largely language-agnostic reasoning core surrounded by language-specific layers. Exploiting this structure, we introduce a Layer Swap: transferring the English specialist's stronger reasoning mid-layers into each native specialist, closing most of the native reasoning gap across the five non-English languages while preserving CoT in the target language. We release all models and datasets.
Get this paper in your agent:
hf papers read 2605.26735 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 16
lightonai/Qwen3-8B-EN
Datasets citing this paper 5
lightonai/Dolci-Think-SFT-32B-Multilingual
lightonai/gpqa_diamond_multilingual
Spaces citing this paper 0
No Space linking this paper