Buckets:
| Name | Size | Uploaded | Xet hash |
|---|---|---|---|
| ckpts | 18 items | ||
| code | 15 items | ||
| README.md | 1.52 kB xet | 57e32ce8 |
AGILLM-4.1 — code + compatible checkpoints (self-contained)
This bucket holds the AGILLM-4.1 model: the exact runtime code (code/) and the
checkpoints (ckpts/) that load under it, kept together so there is no version skew.
This supersedes the legacy
OpenTransformer/AGILLM-4model repo, which mixes older agillm4 code with older checkpoints that are not guaranteed to be loadable under the current 4.1 architecture. Use this bucket for 4.1; treat AGILLM-4 as archival.
Architecture (from the live checkpoint cfg)
- layers: 28 (4 DiffusionBlocks x 7), d_model: 1280, heads: 20, rank: 160
- MoE FFN: 2 experts, top-1, mult 4 · tied weights · sublinear attention
- vocab: 129280 · tokenizer:
deepseek-ai/DeepSeek-V4-Pro - objective: AR / SAT / NAT (stochastic) · DiffusionBlocks EDM block-wise training
Loading
The runtime is the single file code/agillm41.py (mainline @ 1e7f963). Checkpoints in
ckpts/pretrain_step*.pt are plain torch.save dicts with keys cfg, step,
tokenizer_id, tie_weights, and the model tensors. Pull with the hf client:
pip install "huggingface_hub[hf_xet]>=1.18"
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/ckpts/<file>.pt ./
hf buckets cp hf://buckets/OpenTransformer/agillm41-checkpoints/code/agillm41.py ./
Backed by HF Storage Buckets (Xet dedup, mutable): each backup ships only changed
chunks. Synced from the live trainer; latest.json names the newest checkpoint.
- Total size
- 13.1 GB
- Files
- 34
- Last updated
- Jun 8
- Pre-warmed CDN
- US EU US EU