Xiangtai Li's picture

Xiangtai Li

LXT

·

https://lxtgh.github.io/

AI & ML interests

Computer Vision, Multi-Modal Understanding, Generative AI

Recent Activity

upvoted a paper 2 days ago

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

upvoted a paper 19 days ago

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

upvoted a paper 2 months ago

A Very Big Video Reasoning Suite

View all activity

Organizations

upvoted a paper 2 days ago

LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model

Paper • 2604.20796 • Published 4 days ago • 226

upvoted a paper 19 days ago

VideoZeroBench: Probing the Limits of Video MLLMs with Spatio-Temporal Evidence Verification

Paper • 2604.01569 • Published 24 days ago • 13

upvoted a paper 2 months ago

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 519

upvoted 5 papers 3 months ago

SAMTok: Representing Any Mask with Two Words

Paper • 2601.16093 • Published Jan 22 • 43

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 34

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 195

BabyVision: Visual Reasoning Beyond Language

Paper • 2601.06521 • Published Jan 10 • 201

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published Jan 11 • 216

upvoted 3 papers 4 months ago

Vision-Language-Action Models for Autonomous Driving: Past, Present, and Future

Paper • 2512.16760 • Published Dec 18, 2025 • 15

LLaDA2.0: Scaling Up Diffusion Language Models to 100B

Paper • 2512.15745 • Published Dec 10, 2025 • 88

RecTok: Reconstruction Distillation along Rectified Flow

Paper • 2512.13421 • Published Dec 15, 2025 • 5

upvoted 3 papers 5 months ago

Does Hearing Help Seeing? Investigating Audio-Video Joint Denoising for Video Generation

Paper • 2512.02457 • Published Dec 2, 2025 • 14

Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark

Paper • 2511.13853 • Published Nov 17, 2025 • 37

MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published Nov 12, 2025 • 71

upvoted 6 papers 6 months ago

Visual Spatial Tuning

Paper • 2511.05491 • Published Nov 7, 2025 • 53

PairUni: Pairwise Training for Unified Multimodal Language Models

Paper • 2510.25682 • Published Oct 29, 2025 • 14

Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark

Paper • 2510.26802 • Published Oct 30, 2025 • 34

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30, 2025 • 115

MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query

Paper • 2506.03144 • Published Jun 3, 2025 • 8

Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence

Paper • 2510.20579 • Published Oct 23, 2025 • 56