Zhihan Liu's picture

5

Zhihan Liu

ZHLiu627

·

AI & ML interests

LLMs

Organizations

None yet

models 739

ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-30step

Updated Oct 23, 2025

ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-15step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-150step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-135step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-120step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-105step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-90step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-75step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-60step

Updated Oct 23, 2025

ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-45step

Updated Oct 23, 2025

View 739 models

datasets 20

ZHLiu627/logger-a100

Updated Aug 1, 2025 • 393

ZHLiu627/logger-h100

Updated Aug 1, 2025 • 106

ZHLiu627/warm_start_sft_v2

Preview • Updated Aug 1, 2025 • 5

ZHLiu627/sciworld_dataset

Preview • Updated Aug 1, 2025 • 4

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1

Viewer • Updated Feb 27, 2025 • 29.3k • 7

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1

Viewer • Updated Feb 27, 2025 • 29.3k • 21 • 1

ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1

Viewer • Updated Feb 22, 2025 • 29.3k • 11

ZHLiu627/updated-code-qwen7-edufiltered

Viewer • Updated Feb 21, 2025 • 43k • 9

ZHLiu627/updated-code-qwen7-edu

Viewer • Updated Feb 21, 2025 • 75.6k • 30

ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered

Viewer • Updated Feb 19, 2025 • 28.9k • 13

View 20 datasets