Instructions to use shohuu/Pyroton with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use shohuu/Pyroton with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="shohuu/Pyroton", filename="pyroton-q4.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use shohuu/Pyroton with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf shohuu/Pyroton # Run inference directly in the terminal: llama cli -hf shohuu/Pyroton
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf shohuu/Pyroton # Run inference directly in the terminal: llama cli -hf shohuu/Pyroton
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf shohuu/Pyroton # Run inference directly in the terminal: ./llama-cli -hf shohuu/Pyroton
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf shohuu/Pyroton # Run inference directly in the terminal: ./build/bin/llama-cli -hf shohuu/Pyroton
Use Docker
docker model run hf.co/shohuu/Pyroton
- LM Studio
- Jan
- Ollama
How to use shohuu/Pyroton with Ollama:
ollama run hf.co/shohuu/Pyroton
- Unsloth Studio
How to use shohuu/Pyroton with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shohuu/Pyroton to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for shohuu/Pyroton to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for shohuu/Pyroton to start chatting
- Pi
How to use shohuu/Pyroton with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf shohuu/Pyroton
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "shohuu/Pyroton" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use shohuu/Pyroton with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf shohuu/Pyroton
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default shohuu/Pyroton
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use shohuu/Pyroton with Docker Model Runner:
docker model run hf.co/shohuu/Pyroton
- Lemonade
How to use shohuu/Pyroton with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull shohuu/Pyroton
Run and chat with the model
lemonade run user.Pyroton-{{QUANT_TAG}}List all available models
lemonade list
๐ฅ Pyroton
A lightweight Python code generation model fine-tuned from Qwen2.5-Coder-0.5B-Instruct.
Overview
Pyroton is a lightweight Python-focused code generation model fine-tuned from Qwen/Qwen2.5-Coder-0.5B-Instruct using supervised fine-tuning (SFT) on Python instruction-style datasets.
The goal is to create a small, efficient model that handles easy to medium Python tasks while remaining practical for free-tier GPUs and lightweight deployment including mobile phones.
Model Variants
Base adapter
- shohuu/pyroton โ general Python instruction-tuned adapter
Prime-fix patched adapter
- shohuu/pyroton-primefix-v3 โ targeted repair finetuning for correctness bugs (recommended)
Quick Start
Load latest patched adapter (recommended)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-Coder-0.5B-Instruct",
dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "shohuu/pyroton-primefix-v3")
tokenizer = AutoTokenizer.from_pretrained("shohuu/pyroton-primefix-v3")
tokenizer.pad_token = tokenizer.eos_token
prompt = "### Instruction:\nWrite a Python function to reverse a string\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=220,
do_sample=False,
repetition_penalty=1.1,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Run GGUF locally (Ollama / LM Studio / PocketPal)
# Download pyroton-q4.gguf from Files tab
# Then load in your preferred local LLM app
Recommended Inference Settings
Correctness mode (recommended)
Most reliable for code generation:
outputs = model.generate(
**inputs,
max_new_tokens=220,
do_sample=False,
repetition_penalty=1.1,
)
Sampled mode (more variety, less stable)
outputs = model.generate(
**inputs,
max_new_tokens=220,
do_sample=True,
temperature=0.1,
top_p=0.9,
repetition_penalty=1.2,
)
Example Output
Prompt:
Write a Python function to check if a number is prime
Pyroton Output:
import math
def is_prime(n):
"""Check if the given integer n is prime."""
if n <= 1:
return False
for i in range(2, int(math.sqrt(n)) + 1):
if n % i == 0:
return False
return True
Training Details
| Setting | Value |
|---|---|
| Base Model | Qwen2.5-Coder-0.5B-Instruct |
| Datasets | python_code_instructions_18k_alpaca, CodeAlpaca-20k, code_instructions_122k_alpaca_style |
| Total Samples | ~95,362 (Python-filtered) |
| Training Strategy | Chunked SFT (5 chunks) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Batch Size | 2 |
| Gradient Accumulation | 8 |
| Learning Rate | 1e-4 |
| Precision | BFloat16 |
| Max Length | 512 |
| Final Training Loss | ~0.712 |
Repair finetuning
After main training, targeted repair finetuning was applied to fix:
- Missing
import math/math.sqrtissues - Incorrect handling of edge cases (negative numbers, 0, 1)
- Latest patched adapter:
shohuu/pyroton-primefix-v3
Evaluation
Tested against execution-based harness on is_prime() with inputs: -1, 0, 1, 2, 3, 4, 6, 9, 17, 49
- Greedy decoding: 5/5 passing โ
- Sampled decoding: improved but less stable
GGUF / Mobile Deployment
Pyroton is available as a GGUF file for local deployment:
| File | Quantization | Size |
|---|---|---|
pyroton-q4.gguf |
Q4_K_M | ~397MB |
Compatible apps:
- PocketPal AI (Android/iOS) โ search
shohuu/Pyroton - LM Studio (Desktop)
- Ollama (Desktop)
Known Limitations
- 0.5B model โ can degrade on harder tasks or complex reasoning
- Greedy decoding is more reliable than sampling for correctness
- Most thoroughly tested on short Python coding tasks
- Broader evaluation across more libraries still needed
GitHub
Requirements
transformers
datasets
trl
peft
bitsandbytes
accelerate
torchao
License
Apache 2.0 โ see LICENSE for details.
Base model (Qwen2.5-Coder) is also Apache 2.0. Attribution to Alibaba Cloud / Qwen Team.
Acknowledgements
- Qwen Team for the base model
- iamtarun for the original dataset
- Hugging Face for the training ecosystem
- My friend Yumi for the name ๐ฅ
- Downloads last month
- 1,698
We're not able to determine the quantization variants.
Model tree for shohuu/Pyroton
Base model
Qwen/Qwen2.5-0.5B