Rui Sun's picture

Rui Sun PRO

ThreeSR

·

https://threesr.github.io/

AI & ML interests

Vision and Language Multimodal Learning, CV, NLP, LLM

Recent Activity

upvoted a paper 3 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

upvoted a paper 3 days ago

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

upvoted a paper 3 days ago

Teaching Language Models to Think in Code

View all activity

Organizations

upvoted 4 papers 3 days ago

SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture

Paper • 2605.12500 • Published 8 days ago • 181

HyperEyes: Dual-Grained Efficiency-Aware Reinforcement Learning for Parallel Multimodal Search Agents

Paper • 2605.07177 • Published 12 days ago • 62

Teaching Language Models to Think in Code

Paper • 2605.07237 • Published 9 days ago • 30

From Web to Pixels: Bringing Agentic Search into Visual Perception

Paper • 2605.12497 • Published 8 days ago • 14

upvoted a paper 19 days ago

Image Generators are Generalist Vision Learners

Paper • 2604.20329 • Published 28 days ago • 20

upvoted 2 papers 20 days ago

Co-Director: Agentic Generative Video Storytelling

Paper • 2604.24842 • Published 23 days ago • 16

AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery

Paper • 2604.25256 • Published 22 days ago • 29

upvoted a paper 27 days ago

Mind DeepResearch Technical Report

Paper • 2604.14518 • Published Apr 17 • 23

upvoted 11 papers about 1 month ago

DR^{3}-Eval: Towards Realistic and Reproducible Deep Research Evaluation

Paper • 2604.14683 • Published Apr 16 • 36

WildDet3D: Scaling Promptable 3D Detection in the Wild

Paper • 2604.08626 • Published Apr 9 • 245

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

Paper • 2604.07296 • Published Apr 8 • 40

MolmoWeb: Open Visual Web Agent and Open Data for the Open Web

Paper • 2604.08516 • Published Apr 9 • 43

Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models

Paper • 2604.08545 • Published Apr 9 • 41

LPM 1.0: Video-based Character Performance Model

Paper • 2604.07823 • Published Apr 9 • 80

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Paper • 2604.08224 • Published Apr 9 • 51

KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation

Paper • 2604.08455 • Published Apr 9 • 47

DMax: Aggressive Parallel Decoding for dLLMs

Paper • 2604.08302 • Published Apr 9 • 52

OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks

Paper • 2604.08539 • Published Apr 9 • 49

Think in Strokes, Not Pixels: Process-Driven Image Generation via Interleaved Reasoning

Paper • 2604.04746 • Published Apr 8 • 72

upvoted a paper about 2 months ago

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 351