arxiv:2601.22040

Leviathan: Decoupling Input and Output Representations in Language Models

Published on May 7

Authors:

Abstract

Leviathan is a Transformer architecture that decouples token representation from vocabulary discrimination through learned embedding vectorization, improving language modeling performance especially for rare tokens.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Modern language models use a single matrix for input embedding and output projection. This couples two distinct objectives: token representation and discrimination over a vocabulary. This work introduces Leviathan, a Transformer architecture that replaces the input embedding matrix with learned embedding vectorization (LEV), a compact continuous mapping from token indices to embeddings. Leviathan's output head remains untied for a parameter increase of as low as 0.2%. Under controlled comparisons with identical Transformer backbones, Leviathan consistently improves language modeling performance over standard tied-embedding baselines across a 200M-1.2B parameter regime on The Pile with gains that grow during training. At 1.2B scale, Leviathan reduces validation perplexity by 9%, requires 2.1times fewer training tokens to reach the tied baseline's final loss, and improves on all six downstream benchmarks evaluated, including a 30% reduction in LAMBADA perplexity. Frequency-stratified analysis reveals gains to be concentrated in rare tokens, where continuous parameterization reduces perplexity by 81%, falling to near zero for the most frequent.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2601.22040

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.22040 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.22040 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.