ERNIE-Image-Aes: Robust Image Aesthetics Scoring with Balanced Category Generalization

[📄 Paper]

🌟 Highlights

ERNIE-Image-Aes is a 8B vision-language model for image aesthetic scoring, initialized from ArtiMuse and fine-tuned on a diverse, professionally annotated dataset. It substantially outperforms existing aesthetic predictors (LAION-AES, ArtiMuse, UniPercept) in generalization across diverse image categories.

Key advantages:

Balanced predictions across photography, anime, design, everyday snapshots, and film photography
No systematic bias toward specific image types (e.g., AI-generated content or black-and-white photos)
Swiss-tournament based pairwise annotation for high-quality training labels
Achieves 0.7445 SRCC and 0.7598 PLCC on ERIA-1K benchmark

🔍 Motivation

Off-the-shelf aesthetic predictors exhibit systematic biases:

Model	Bias
LAION-Aesthetic	Disproportionately high scores for AI-generated/anime content
ArtiMuse	Overscores black-and-white photography and casual everyday snapshots
UniPercept	Strong preference for monochrome images; overscores casual snapshots

ERNIE-Image-Aes addresses these failure modes through a purpose-built annotation pipeline with explicit category balance.

📊 Results on ERIA-1K Benchmark

Model	SRCC	PLCC
LAION AES	0.2944	0.3138
ArtiMuse	0.4277	0.4704
UniPercept	0.4533	0.4748
ERNIE-Image-Aes	0.7445	0.7598

Annotation Protocol:

Pairwise Swiss-system tournament for stable and reproducible rankings
Tier labels from 1 to 10
Annotators recruited from professional backgrounds (Central Academy of Fine Arts, Sichuan Fine Arts Institute, Communication University of China, etc.)
All annotators passed aesthetic calibration screening prior to participation

⚙️ Setup

Please follow the setup instructions in the ArtiMuse repository.

🙏 Acknowledgements

Our work builds upon ArtiMuse and InternVL-3. We sincerely thank the authors for their excellent contributions to the community.

✒️ Citation

If you find this work useful, please consider citing:

Downloads last month: 62

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including baidu/ERNIE-Image-Aes

ERNIE-Image

Collection

The serieas of image generation models, including text2img、img2img. • 4 items • Updated 1 day ago • 23