---
tags:
  - robotics
  - vla
  - rl-token
---

# pi05-so101-armnetbench-tool-insert-rlt-v50

RL Token (RLT) encoder-decoder trained on the SO101 tool insertion task, on top of the **step_24999** checkpoint from [lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50](https://huggingface.co/lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50).

## What is this?

This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See [Xu et al. (2026), Precise Manipulation with Efficient Online RL](https://www.pi.website/research/rlt) for the method.

## Training

- **Config:** `pi05_rlt_armnetbench_tool_insert`
- **VLA backbone:** `lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50` step_24999 (frozen, `rl_vla_loss_weight=0.0`)
- **Encoder-decoder:** 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim
- **Dataset:** `villekuosmanen/armnetbench_tool_insert`
- **Batch size:** 32
- **LR:** 2.5e-5 cosine (1k warmup, 20k decay)
- **Steps:** 20,000
- **Runtime:** ~4h14m on 4x GH200 (Isambard)

No validation split was used — the dataset is too small for a held-out eval split.

## Loss progression (train)

| Step | Train Loss |
|------|-----------|
| 0 | 10754.3 |
| 1,000 | 873.7 |
| 5,000 | 552.0 |
| 10,000 | 430.0 |
| 15,000 | 377.6 |
| 19,900 | 356.4 |

## Checkpoints

| Step | Recommended | Params SHA256 |
|------|-------------|---------------|
| 19999 | ✓ | `d9ddbbbefc07b3700f7df5dd161c14de3291bfcf805f71e39ababb902e1501b2` |

### Verifying checkpoint hashes

```bash
cd checkpoints/19999 && find params -type f | sort | xargs sha256sum | sha256sum
```

## Repo layout

```
assets/                         # Norm stats, valid indices
checkpoints/19999/params/       # Step 19999 model weights (recommended)
```

## W&B

Training curves: https://wandb.ai/pravsels/pi05_rlt_armnetbench_tool_insert/runs/jbdqrmu0

## Usage

```python
import openpi.models.model as _model
import openpi.training.config as _config

config = _config.get_config("pi05_rlt_armnetbench_tool_insert")
params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray)
model = config.model.load(params)
```