Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".
Xuan Yang
TorresYang
·
AI & ML interests
LLM reasoning, agent
Recent Activity
updated a collection 1 day ago
RUT-Bench updated a collection 1 day ago
RUT-Bench new activity 1 day ago
Miaow-Lab/RUT-Bench:Add task categories and link to paper