Buckets:

OpenTransformer
/

agillm41-checkpoints

13.1 GB

34 files

Updated 4 minutes ago

Ctrl+K

Name	Size	Uploaded	Xet hash
ckpts		about 1 hour ago	18 items
code		4 minutes ago	15 items
README.md	1.52 kB xet	about 11 hours ago	57e32ce8

README.md

AGILLM-4.1 — code + compatible checkpoints (self-contained)

This bucket holds the AGILLM-4.1 model: the exact runtime code (code/) and the checkpoints (ckpts/) that load under it, kept together so there is no version skew.

This supersedes the legacy OpenTransformer/AGILLM-4 model repo, which mixes older agillm4 code with older checkpoints that are not guaranteed to be loadable under the current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival.

Architecture (from the live checkpoint cfg)

layers: 28 (4 DiffusionBlocks x 7), d_model: 1280, heads: 20, rank: 160
MoE FFN: 2 experts, top-1, mult 4 · tied weights · sublinear attention
vocab: 129280 · tokenizer: deepseek-ai/DeepSeek-V4-Pro
objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training

Loading

The runtime is the single file code/agillm41.py (mainline @ 1e7f963). Checkpoints in ckpts/pretrain_step*.pt are plain torch.save dicts with keys cfg, step, tokenizer_id, tie_weights, and the model tensors. Pull with the hf client:

pip install "huggingface_hub[hf_xet]>=1.18"
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./

Backed by HF Storage Buckets (Xet dedup, mutable): each backup ships only changed chunks. Synced from the live trainer; latest.json names the newest checkpoint.

Total size: 13.1 GB

Files: 34

Last updated: Jun 8

Pre-warmed CDN: US EU US EU

AGILLM-4.1 — code + compatible checkpoints (self-contained)

Architecture (from the live checkpoint cfg)

Loading

Contributors