13.1 GB
34 files
Updated 4 minutes ago
NameSize
ckpts
code
README.md1.52 kB
xet
README.md

AGILLM-4.1 — code + compatible checkpoints (self-contained)

This bucket holds the AGILLM-4.1 model: the exact runtime code (code/) and the checkpoints (ckpts/) that load under it, kept together so there is no version skew.

This supersedes the legacy OpenTransformer/AGILLM-4 model repo, which mixes older agillm4 code with older checkpoints that are not guaranteed to be loadable under the current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival.

Architecture (from the live checkpoint cfg)

  • layers: 28 (4 DiffusionBlocks x 7), d_model: 1280, heads: 20, rank: 160
  • MoE FFN: 2 experts, top-1, mult 4 · tied weights · sublinear attention
  • vocab: 129280 · tokenizer: deepseek-ai/DeepSeek-V4-Pro
  • objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training

Loading

The runtime is the single file code/agillm41.py (mainline @ 1e7f963). Checkpoints in ckpts/pretrain_step*.pt are plain torch.save dicts with keys cfg, step, tokenizer_id, tie_weights, and the model tensors. Pull with the hf client:

pip install "huggingface_hub[hf_xet]>=1.18"
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./

Backed by HF Storage Buckets (Xet dedup, mutable): each backup ships only changed chunks. Synced from the live trainer; latest.json names the newest checkpoint.

Total size
13.1 GB
Files
34
Last updated
Jun 8
Pre-warmed CDN
US EU US EU

Contributors