ldp72/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28 Text Generation • 8B • Updated 1 day ago • 7
ldp72/Wolof-Qwen3-8B-it-v2-fc-v2-conv-v1-t5-torch28 Text Generation • 8B • Updated 1 day ago • 7
Running 3.89k The Ultra-Scale Playbook 🌌 3.89k The ultimate guide to training LLM on large GPU Clusters
DivMerge: A divergence-based model merging method for multi-tasking Paper • 2509.02108 • Published Sep 2, 2025 • 26