Abstract
SkillOpt introduces a systematic text-space optimizer for agent skills that trains skills as external agent state with stable updates and zero deployment inference overhead, achieving superior performance across multiple benchmarks and execution environments.
Agent skills today are hand-crafted, generated one-shot, or evolved through loosely controlled self-revision, none of which behaves like a deep-learning optimizer for the skill, and none of which reliably improves over its starting point under feedback. We argue the skill should instead be trained as the external state of a frozen agent, with the same discipline that makes weight-space optimization reproducible. SkillOpt is, to our knowledge, the first systematic controllable text-space optimizer for agent skills: a separate optimizer model turns scored rollouts into bounded add/delete/replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. A textual learning-rate budget, rejected-edit buffer, and epoch-wise slow/meta update make skill training stable while adding zero inference-time model calls at deployment. Across six benchmarks, seven target models, and three execution harnesses (direct chat, Codex, Claude Code), SkillOpt is best or tied on all 52 evaluated (model, benchmark, harness) cells and beats every per-cell competitor among human, one-shot LLM, Trace2Skill, TextGrad, GEPA, and EvoSkill skills. On GPT-5.5 it lifts the average no-skill accuracy by +23.5 points in direct chat, by +24.8 inside the Codex agentic loop, and by +19.1 inside Claude Code. Transfer experiments further show that optimized skill artifacts retain value when moved across model scales, between Codex and Claude Code execution environments, and to a nearby math benchmark without further optimization.
Community
π SkillOpt: Train agent skills like neural networks β without touching model weights.
What if an agent could improve not by finetuning the LLM, but by self-optimizing its own skill document? π§ β¨
SkillOpt treats a natural-language skill as the agentβs trainable external state:
π§ͺ rollout β π reflect β βοΈ edit β β
validate β π improve
We evaluate SkillOpt across 6 benchmarks and 7 models, under both direct model calls and real agent execution loops with Codex + Claude Code. SkillOpt achieves best or tied-best results in 52/52 settings.
The key idea is simple but powerful:
as AI moves from assistant to worker, the bottleneck is no longer just knowledge β it is procedural capability: tool use, intermediate-state inspection, domain conventions, and recovery from failure. π οΈπ€
We believe optimized, reusable, and inspectable skills could become a new adaptation layer for future agents.
π Project: https://aka.ms/skillopt
π Paper: https://arxiv.org/pdf/2605.23904
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SkillEvolver: Skill Learning as a Meta-Skill (2026)
- SkillGen: Verified Inference-Time Agent Skill Synthesis (2026)
- SkillMAS: Skill Co-Evolution with LLM-based Multi-Agent System (2026)
- Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents (2026)
- SkillFlow:Benchmarking Lifelong Skill Discovery and Evolution for Autonomous Agents (2026)
- PACE: Two-Timescale Self-Evolution for Small Language Model Agents (2026)
- ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.23904 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper