We downsample high-resolution images so that the shorter side is 1024 pixels (MetaQuery_Instruct_2.4M) or 512 pixels (MetaQuery_Instruct_2.4M_512res)
Xichen Pan PRO
xcpan
AI & ML interests
None yet
Recent Activity
authored a paper 44 minutes ago
RepFusion: Leveraging Multimodal Priors for Denoising in Representation Space upvoted a paper about 7 hours ago
Playful Agentic Robot Learning upvoted a paper about 7 hours ago
UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer