SPM-70M-Vertigo

SPM-70M-Vertigo is the first model in the SPM (Strategic Pretrained Model) family. A ~71M parameter English foundation model trained from scratch.

Model Architecture

Parameter	Value
Architecture	Decoder-only Transformer (Llama-style)
Parameters	~71M
Hidden Size	512
Layers	12
Attention Heads	8
KV Heads	4
Vocab Size	32000
Max Sequence Length	2048
Positional Encoding	RoPE
Normalization	RMSNorm
Activation	SwiGLU

Training

Dataset: FineWeb (sample-10BT)
Tokenizer: SentencePiece BPE (vocab=32000)
Precision: BF16
Effective Batch Size: 128

Usage

from transformers import LlamaTokenizer, LlamaForCausalLM

model = LlamaForCausalLM.from_pretrained("SPM-50M-Alpha")
tokenizer = LlamaTokenizer.from_pretrained("SPM-50M-Alpha")

inputs = tokenizer("The meaning of life is", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

License

Apache 2.0

Downloads last month: 51

Safetensors

Model size

70.5M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support