RL env training agents to find OWASP API vulnerabilities
1.5B LoRA monitor vs frontier attacker — RL gym
Qwen 1.5B baseline vs GRPO-trained LoRA monitor.