SPM-70M-Vertigo

SPM-70M-Vertigo is the first model in the SPM (Strategic Pretrained Model) family. A ~71M parameter English foundation model trained from scratch.

Model Architecture

Parameter Value
Architecture Decoder-only Transformer (Llama-style)
Parameters ~71M
Hidden Size 512
Layers 12
Attention Heads 8
KV Heads 4
Vocab Size 32000
Max Sequence Length 2048
Positional Encoding RoPE
Normalization RMSNorm
Activation SwiGLU

Training

  • Dataset: FineWeb (sample-10BT)
  • Tokenizer: SentencePiece BPE (vocab=32000)
  • Precision: BF16
  • Effective Batch Size: 128

Usage

from transformers import LlamaTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained("SPM-50M-Alpha")
tokenizer = LlamaTokenizer.from_pretrained("SPM-50M-Alpha")

inputs = tokenizer("The meaning of life is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

License

Apache 2.0

Downloads last month
51
Safetensors
Model size
70.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support