Quark-50m-Instruct
Quark-50m-Instruct is a small (β56M parameters) decoder-only language model, fine-tuned for instruction following. It is built on the same architecture of βSmolLMβ family and was fully pretrained on 5 billion tokens from HuggingFaceTB/smollmβcorpus.
- Model type: Causal Language Model (LLaMAβstyle decoder)
- Architecture: GQA Β· SwiGLU Β· RMSNorm Β· RoPE Β· Weightβtying
- Pretraining tokens: 5β―B
- Fineβtuning: Instructionβtuned (details below)
- Creators: OvercastLab (research & development lab for ML/AI)
- Release date: 22 April 2026
Model Summary
Quark-50m-Instruct is designed to be an efficient assistant that can run on consumer GPUs (e.g., RTX 3070 with 8β―GB VRAM) and even on CPU for light workloads. It is not competitive with large models on knowledgeβintensive tasks, but it excels at:
- Simple conversational tasks
- Code generation and explanation (Python)
- Short text rewriting and summarisation
- Onβdevice / edge inference
The architecture closely follows the efficientβsmallβLM blueprint popularised by SmolLM:
| Component | Details |
|---|---|
| Vocab size | 49,152 |
| Hidden size | 384 |
| Layers | 24 |
| Attention | Grouped Query (6 Q heads, 2 KV heads) |
| FFN | SwiGLU with 1,024 intermediate |
| Position | RoPE (ΞΈ = 10,000) |
| Normalisation | RMSNorm (preβblock) |
Total trainable parameters: β48β―M (with weight tying).
Benchmark Evaluation Metrics
| Category | Benchmark | Metric | Score / Value | Status |
|---|---|---|---|---|
| Linguistics & Grammar | BLiMP | Accuracy | 68.12% | Success |
| Commonsense & Reasoning | PIQA | Normalized Accuracy | 57.83% | Success |
| COPA | Accuracy | 57.00% | Success | |
| BoolQ | Accuracy | 52.17% | Success | |
| WinoGrande | Accuracy | 47.36% | Success | |
| HellaSwag | Normalized Accuracy | 28.49% | Success | |
| RACE | Accuracy | 26.41% | Success | |
| CommonsenseQA | Accuracy | 20.31% | Success | |
| Academic & Knowledge | SciQ | Normalized Accuracy | 49.00% | Success |
| ARC-Easy | Normalized Accuracy | 36.49% | Success | |
| MMLU | Accuracy | 25.64% | Success | |
| ARC-Challenge | Normalized Accuracy | 25.17% | Success | |
| OpenBookQA | Normalized Accuracy | 25.40% | Success | |
| Language Modeling | LAMBADA | Accuracy | 15.87% | Success |
| WikiText-2 | Word Perplexity | 251.76 | Success |
Note: The Arithmetic benchmark failed due to outdated script support (arithmetic.py), and SocialIQA failed due to a registration tag error (siqa). Total baseline execution completed successfully for all other 15 tasks.
Uses
Direct Use
The model can be used via the π€ Transformers library for standard text generation. It expects chatβformatted input (see example below).
Downstream Use
Because of the open Apacheβ2.0 license, you may fineβtune Quark-50mβInstruct on your own data for domainβspecific tasks β for instance, a customerβsupport bot, a code reviewer, or a story writer.
Limitations
- Limited world knowledge (stopped at midβ2025 pretraining data).
- Short context window (2,048 tokens).
- Small size means it can make more factual mistakes than larger models.
How to Get Started
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "ThingAI/Quark-50m-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
messages = [
{"role": "system", "content": "You are Quark, a helpful assistant."},
{"role": "user", "content": "Explain group query attention in one sentence."}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
- Downloads last month
- 2,800