-
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Paper • 2411.17459 • Published • 12 -
MAGVIT: Masked Generative Video Transformer
Paper • 2212.05199 • Published -
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Paper • 2310.05737 • Published • 6 -
Finite Scalar Quantization: VQ-VAE Made Simple
Paper • 2309.15505 • Published • 24
Nuo Xu
Norm
AI & ML interests
Video Diffusion; Large Language Model; Object Detection; OCR
Recent Activity
upvoted an article 4 days ago
GLM-5.2: Built for Long-Horizon Tasks authored a paper 10 days ago
MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning liked a dataset 15 days ago
facebook/wearable-ai