--- tags: - robotics - vla - rl-token --- # pi05-so101-armnetbench-tool-insert-rlt-v50 RL Token (RLT) encoder-decoder trained on the SO101 tool insertion task, on top of the **step_24999** checkpoint from [lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50](https://huggingface.co/lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50). ## What is this? This model is a lightweight transformer encoder-decoder which takes inputs from a frozen Pi-05 VLA backbone. The encoder compresses the VLA final-layer prefix embeddings into a single RL token via a learned query. The decoder autoregressively reconstructs the original embeddings from only this token, forcing it to act as an information bottleneck. See [Xu et al. (2026), Precise Manipulation with Efficient Online RL](https://www.pi.website/research/rlt) for the method. ## Training - **Config:** `pi05_rlt_armnetbench_tool_insert` - **VLA backbone:** `lorenzouttini/pi05-so101-armnetbench-tool-insert-isambard-v50` step_24999 (frozen, `rl_vla_loss_weight=0.0`) - **Encoder-decoder:** 2-layer transformer, 8 heads, 8192 MLP dim, 2048 embedding dim - **Dataset:** `villekuosmanen/armnetbench_tool_insert` - **Batch size:** 32 - **LR:** 2.5e-5 cosine (1k warmup, 20k decay) - **Steps:** 20,000 - **Runtime:** ~4h14m on 4x GH200 (Isambard) No validation split was used — the dataset is too small for a held-out eval split. ## Loss progression (train) | Step | Train Loss | |------|-----------| | 0 | 10754.3 | | 1,000 | 873.7 | | 5,000 | 552.0 | | 10,000 | 430.0 | | 15,000 | 377.6 | | 19,900 | 356.4 | ## Checkpoints | Step | Recommended | Params SHA256 | |------|-------------|---------------| | 19999 | ✓ | `d9ddbbbefc07b3700f7df5dd161c14de3291bfcf805f71e39ababb902e1501b2` | ### Verifying checkpoint hashes ```bash cd checkpoints/19999 && find params -type f | sort | xargs sha256sum | sha256sum ``` ## Repo layout ``` assets/ # Norm stats, valid indices checkpoints/19999/params/ # Step 19999 model weights (recommended) ``` ## W&B Training curves: https://wandb.ai/pravsels/pi05_rlt_armnetbench_tool_insert/runs/jbdqrmu0 ## Usage ```python import openpi.models.model as _model import openpi.training.config as _config config = _config.get_config("pi05_rlt_armnetbench_tool_insert") params = _model.restore_params("checkpoints/19999/params", restore_type=np.ndarray) model = config.model.load(params) ```