Papers
arxiv:2606.08415

CoVEBench: Can Video Editing Models Handle Complex Instructions?

Published on Jun 7
· Submitted by
Jiaming Wang
on Jun 9
Authors:
,
,
,
,
,
,
,
,
,

Abstract

A new benchmark called CoVEBench is introduced to evaluate compositional video editing capabilities, addressing limitations of existing models in handling complex, multi-step editing tasks while preserving spatiotemporal content.

While recent text-guided video editing models excel at elementary tasks (e.g., style transfer, object insertion), real-world user requests are highly compositional. A single prompt often demands multiple coupled edits, such as modifying subjects, actions, and camera views, while strictly preserving unrelated spatiotemporal content. Existing benchmarks, heavily constrained by isolated edits and coarse global metrics, fail to diagnose how models handle such complex workflows. To address this gap, we introduce CoVEBench, a compositional video editing benchmark comprising 416 curated source videos, 626 multi-point editing instructions, and 9,990 fine-grained checklist items. Covering diverse editing dimensions, CoVEBench evaluates models via MLLM-judged instruction compliance and video fidelity, alongside automated metrics for video quality. Extensive experiments reveal that compositional editing remains a profound challenge: current models frequently omit edits, violate preservation constraints, or introduce artifacts when handling multiple operations simultaneously. CoVEBench provides a challenging, diagnostic testbed to advance video editing toward realistic user workflows.

Community

Paper submitter

We introduce CoVEBench, a fine-grained checklist benchmark to evaluate text-guided video editing models on complex, compositional instructions.

·

截屏2026-06-09 12.05.42
截屏2026-06-09 12.05.53
截屏2026-06-09 12.05.59
截屏2026-06-09 12.06.06
截屏2026-06-09 12.06.14

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.08415
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.08415 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.08415 in a Space README.md to link it from this page.

Collections including this paper 2