Pythia models supervised finetuned and DPO finetuned with all of Anthropic-hh-rlhf dataset for 1 epoch.
Laura O'Mahony
lomahony
AI & ML interests
PhD student
Organizations
None yet
models 42
lomahony/pythia-1.4b-helpful-sft
Text Generation • 1B • Updated • 16
lomahony/pythia-410m-helpful-sft
Text Generation • 0.4B • Updated • 12
lomahony/pythia-70m-helpful-sft
Text Generation • 70.4M • Updated • 2.41k
lomahony/pythia-1b-helpful-sft
Text Generation • 1B • Updated • 14
lomahony/pythia-160m-helpful-sft
Text Generation • 0.2B • Updated • 12
lomahony/pythia-1b-helpful-dpo
Text Generation • Updated • 11
lomahony/pythia-70m-helpful-dpo
Text Generation • Updated • 111
lomahony/pythia-160m-helpful-dpo
Text Generation • Updated • 21
lomahony/pythia-1.4b-helpful-dpo
Text Generation • Updated • 31
lomahony/pythia-2.8b-helpful-dpo
Text Generation • Updated • 10
datasets 0
None public yet