Chat2Workflow: A Benchmark for Generating Executable Visual Workflows with Natural Language Paper • 2604.19667 • Published 4 days ago • 19
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 23 days ago • 877
RealMaster: Lifting Rendered Scenes into Photorealistic Video Paper • 2603.23462 • Published Mar 24 • 33
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding Paper • 2603.22458 • Published Mar 23 • 135
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics Paper • 2603.14375 • Published Mar 15 • 19
MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data Paper • 2603.25319 • Published 30 days ago • 32
Calibri: Enhancing Diffusion Transformers via Parameter-Efficient Calibration Paper • 2603.24800 • Published about 1 month ago • 68
RealRestorer: Towards Generalizable Real-World Image Restoration with Large-Scale Image Editing Models Paper • 2603.25502 • Published 30 days ago • 57
PixelSmile: Toward Fine-Grained Facial Expression Editing Paper • 2603.25728 • Published 29 days ago • 117
Evo Collection Evo 2 is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. • 13 items • Updated Feb 25 • 4
Helios Collection Helios: 14B Real-Time Long Video Generation Model can be Cheaper, Faster but Keep Stronger than 1.3B ones • 7 items • Updated Mar 15 • 24
Kiwi-Edit: Versatile Video Editing via Instruction and Reference Guidance Paper • 2603.02175 • Published Mar 2 • 24
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 Feb 20 • 503