North-Mini-Code-1.0-NVFP4
NVFP4 (4-bit) quant of CohereLabs/North-Mini-Code-1.0, made with llm-compressor (nvfp4-pack-quantized). ~17 GB. Experts quantized; router/gate/lm_head kept high-precision.
Benchmarks (DGX Spark / GB10, vLLM):
- HumanEval pass@1: 90.2% — matches the FP8 build (90.2%), within one problem of bf16 (90.9%).
- Decode: ~58 tok/s single-request, reasoning off (spark-arena nightly image) — ≈1.67× the FP8 build, at ~17 GB vs ~28 GB.
Serve with vLLM (needs Cohere2MoeForCausalLM support + cohere_melody):
vllm serve XanuNetworks/North-Mini-Code-1.0-NVFP4 \
--max-model-len 262144 \
--enable-auto-tool-choice \
--tool-call-parser cohere_command4 \
--reasoning-parser cohere_command4
Apache-2.0, inherited from the base model. All credit to Cohere for North Mini Code.
- Downloads last month
- 2,038
Model tree for XanuNetworks/North-Mini-Code-1.0-NVFP4
Base model
CohereLabs/North-Mini-Code-1.0