π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 14 days ago • 102
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples Paper • 2410.14669 • Published Oct 18, 2024 • 39
liuhaotian/llava-v1-0719-336px-lora-merge-vicuna-13b-v1.3 Text Generation • Updated Jul 19, 2023 • 69 • 9
perceptiveshawty/compositional-bert-large-uncased Sentence Similarity • Updated Jul 19, 2024 • 10.4k • 2