Training Agents Inside of Scalable World Models
Paper • 2509.24527 • Published • 8
Nicklas Hansen · Xiaolong Wang · UC San Diego
The world model follows the architecture and two-stage training recipe of Dreamer 4, adapted for large-scale multi-task continuous control, and is trained on MMBench2 — a 427-hour, 210-task dataset for visual world modeling (see the dataset repository). Each variant is a (tokenizer.pt, dynamics.pt) pair at 224×224 resolution:
| Variant | Description |
|---|---|
base |
Pretrained world model (200 tasks) |
coverage_aware |
Coverage-aware finetuned world model (200 tasks) |
combined |
coverage_aware finetuned with all targeted data collection sources (210 tasks) |
base/ tokenizer.pt dynamics.pt
coverage_aware/ tokenizer.pt dynamics.pt
combined/ tokenizer.pt dynamics.pt
Using the accompanying code release:
cd dreamer4
python download_checkpoints.py --variant combined # or: base | coverage_aware | all
./run_interactive.sh combined # launch the interactive interface
download_checkpoints.py fetches the (tokenizer.pt, dynamics.pt) pair into ./checkpoints/<variant>/. Alternatively, download directly with the Hugging Face CLI:
hf download nicklashansen/mmbench2-models --include "combined/*" --local-dir ./checkpoints
See the paper and the code release for architecture details, training recipes, and the hallucination detection and mitigation methods.
Released under the MIT License.
@article{Hansen2026Hallucination,
title={Hallucination in World Models is Predictable and Preventable},
author={Nicklas Hansen and Xiaolong Wang},
year={2026},
}