Granary: Speech Recognition and Translation Dataset in 25 European Languages Paper • 2505.13404 • Published May 19, 2025 • 4
DMGD: Train-Free Dataset Distillation with Semantic-Distribution Matching in Diffusion Models Paper • 2605.03877 • Published May 5 • 3
ConceptScope: Characterizing Dataset Bias via Disentangled Visual Concepts Paper • 2510.26186 • Published Oct 30, 2025 • 3
OCTScenes: A Versatile Real-World Dataset of Tabletop Scenes for Object-Centric Learning Paper • 2306.09682 • Published Jun 16, 2023 • 7
FLAIR: a Country-Scale Land Cover Semantic Segmentation Dataset From Multi-Source Optical Imagery Paper • 2310.13336 • Published Oct 20, 2023 • 7
Seeing Faces in Things: A Model and Dataset for Pareidolia Paper • 2409.16143 • Published Sep 24, 2024 • 18
Thinking Like an Annotator: Generation of Dataset Labeling Instructions Paper • 2306.14035 • Published Jun 24, 2023 • 10
Heavy Labels Out! Dataset Distillation with Label Space Lightening Paper • 2408.08201 • Published Aug 15, 2024 • 22
Grounding and Enhancing Informativeness and Utility in Dataset Distillation Paper • 2601.21296 • Published Jan 29 • 21
D3: A Massive Dataset of Scholarly Metadata for Analyzing the State of Computer Science Research Paper • 2204.13384 • Published Apr 28, 2022 • 8
Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains Paper • 2309.11285 • Published Sep 20, 2023 • 1