In a Training Loop 🔄

3 27 28

Zinan Tang

Word2Li

https://zinantang.works/

AI & ML interests

NLP、LLM、Data4LLM、LLM4Data

Recent Activity

commentedon a paper 1 day ago

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

upvoted a paper 1 day ago

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

upvoted a paper 1 day ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

View all activity

Organizations

None yet

upvoted 7 papers 1 day ago

Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning

Paper • 2605.30039 • Published 19 days ago • 20

MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

Paper • 2606.13473 • Published 6 days ago • 87

upvoted a paper 2 months ago

Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

Paper • 2604.10480 • Published Apr 12 • 20

upvoted a collection 2 months ago

Nemotron-Post-Training-v3

Collection

Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 5 days ago • 157

upvoted 2 papers 3 months ago

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Paper • 2603.03202 • Published Mar 3 • 18

SciDER: Scientific Data-centric End-to-end Researcher

Paper • 2603.01421 • Published Mar 2 • 6

upvoted 2 papers 4 months ago

CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation

Paper • 2602.01660 • Published Feb 2 • 8

Closing the Data Loop: Using OpenDataArena to Engineer Superior Training Datasets

Paper • 2601.09733 • Published Dec 30, 2025 • 9

upvoted a paper 6 months ago

OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value

Paper • 2512.14051 • Published Dec 16, 2025 • 47

upvoted a paper 8 months ago

MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing

Paper • 2509.22186 • Published Sep 26, 2025 • 166

upvoted a collection 8 months ago

Qwen3-VL

Collection

37 items • Updated Dec 31, 2025 • 743

upvoted a paper 8 months ago

Scaling Code-Assisted Chain-of-Thoughts and Instructions for Model Reasoning

Paper • 2510.04081 • Published Oct 5, 2025 • 23

upvoted a paper 9 months ago

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

Paper • 2509.26603 • Published Sep 30, 2025 • 18

upvoted a collection 9 months ago

Qwen3

Collection

84 items • Updated Dec 31, 2025 • 1.81k

upvoted a paper 9 months ago

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

Paper • 2508.21589 • Published Aug 29, 2025 • 3

Zinan Tang

AI & ML interests

Recent Activity

Organizations

Word2Li's activity