North-Mini-Code-1.0-NVFP4

NVFP4 (4-bit) quant of CohereLabs/North-Mini-Code-1.0, made with llm-compressor (nvfp4-pack-quantized). ~17 GB. Experts quantized; router/gate/lm_head kept high-precision.

Benchmarks (DGX Spark / GB10, vLLM):

HumanEval pass@1: 90.2% — matches the FP8 build (90.2%), within one problem of bf16 (90.9%).
Decode: ~58 tok/s single-request, reasoning off (spark-arena nightly image) — ≈1.67× the FP8 build, at ~17 GB vs ~28 GB.

Serve with vLLM (needs Cohere2MoeForCausalLM support + cohere_melody):

vllm serve XanuNetworks/North-Mini-Code-1.0-NVFP4 \
  --max-model-len 262144 \
  --enable-auto-tool-choice \
  --tool-call-parser cohere_command4 \
  --reasoning-parser cohere_command4

Apache-2.0, inherited from the base model. All credit to Cohere for North Mini Code.

Downloads last month: 2,038

Safetensors

Model size

17B params

Tensor type

F32

BF16

F8_E4M3

Model tree for XanuNetworks/North-Mini-Code-1.0-NVFP4

Base model

CohereLabs/North-Mini-Code-1.0

Quantized

(34)

this model