·
AI & ML interests
LLMs
Organizations
None yet
models 739
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-30step
Updated
ZHLiu627/sokoban-GRPO-from-sft-Llama-3.1-8B-Instruct-window-1-nothink-15step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-150step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-135step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-120step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-105step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-90step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-75step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-60step
Updated
ZHLiu627/aug_verl_agent_webshop-GRPO-kl0.01-from-webshop-20step-v2-Llama-3.1-8B-Instruct-info40-45step
Updated
datasets 20
Updated • 393
Updated • 106
ZHLiu627/warm_start_sft_v2
Preview
• Updated • 5
ZHLiu627/sciworld_dataset
Preview
• Updated • 4
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v1
Viewer
• Updated • 29.3k • 7
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1_v1
Viewer
• Updated • 29.3k • 21
• 1
ZHLiu627/dataset_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212_2_global_step_70filtered_v1
Viewer
• Updated • 29.3k • 11
ZHLiu627/updated-code-qwen7-edufiltered
Viewer
• Updated • 43k • 9
ZHLiu627/updated-code-qwen7-edu
Viewer
• Updated • 75.6k • 30
ZHLiu627/updated_qwen2.5_code_1.5b_grpo_iter0_full_data_miao_0212__self_correction_iter1_v2filtered
Viewer
• Updated • 28.9k • 13