山田蒼
jwilson8
AI & ML interests
Research on LLM agents and evaluation.
Recent Activity
liked a dataset about 13 hours ago
wop/XXXXXL-chain-of-thought upvoted a paper 4 days ago
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation liked a dataset 5 days ago
nvidia/PhysicalAI-Autonomous-VehiclesOrganizations
None yet