SPM-70M-Vertigo
SPM-70M-Vertigo is the first model in the SPM (Strategic Pretrained Model) family. A ~71M parameter English foundation model trained from scratch.
Model Architecture
| Parameter | Value |
|---|---|
| Architecture | Decoder-only Transformer (Llama-style) |
| Parameters | ~71M |
| Hidden Size | 512 |
| Layers | 12 |
| Attention Heads | 8 |
| KV Heads | 4 |
| Vocab Size | 32000 |
| Max Sequence Length | 2048 |
| Positional Encoding | RoPE |
| Normalization | RMSNorm |
| Activation | SwiGLU |
Training
- Dataset: FineWeb (sample-10BT)
- Tokenizer: SentencePiece BPE (vocab=32000)
- Precision: BF16
- Effective Batch Size: 128
Usage
from transformers import LlamaTokenizer, LlamaForCausalLM
model = LlamaForCausalLM.from_pretrained("SPM-50M-Alpha")
tokenizer = LlamaTokenizer.from_pretrained("SPM-50M-Alpha")
inputs = tokenizer("The meaning of life is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))
License
Apache 2.0
- Downloads last month
- 51
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support