North-Mini-Code-1.0-NVFP4

NVFP4 (4-bit) quant of CohereLabs/North-Mini-Code-1.0, made with llm-compressor (nvfp4-pack-quantized). ~17 GB. Experts quantized; router/gate/lm_head kept high-precision.

Benchmarks (DGX Spark / GB10, vLLM):

  • HumanEval pass@1: 90.2% — matches the FP8 build (90.2%), within one problem of bf16 (90.9%).
  • Decode: ~58 tok/s single-request, reasoning off (spark-arena nightly image) — ≈1.67× the FP8 build, at ~17 GB vs ~28 GB.

Serve with vLLM (needs Cohere2MoeForCausalLM support + cohere_melody):

vllm serve XanuNetworks/North-Mini-Code-1.0-NVFP4 \
  --max-model-len 262144 \
  --enable-auto-tool-choice \
  --tool-call-parser cohere_command4 \
  --reasoning-parser cohere_command4

Apache-2.0, inherited from the base model. All credit to Cohere for North Mini Code.

Downloads last month
2,038
Safetensors
Model size
17B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for XanuNetworks/North-Mini-Code-1.0-NVFP4

Quantized
(34)
this model