Instructions to use BrainboxAI/code-il-E4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BrainboxAI/code-il-E4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="BrainboxAI/code-il-E4B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("BrainboxAI/code-il-E4B")
model = AutoModelForImageTextToText.from_pretrained("BrainboxAI/code-il-E4B")

llama-cpp-python

How to use BrainboxAI/code-il-E4B with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BrainboxAI/code-il-E4B",
	filename="gemma-4-e4b-it.BF16-mmproj.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use BrainboxAI/code-il-E4B with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrainboxAI/code-il-E4B:BF16
# Run inference directly in the terminal:
llama-cli -hf BrainboxAI/code-il-E4B:BF16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrainboxAI/code-il-E4B:BF16
# Run inference directly in the terminal:
llama-cli -hf BrainboxAI/code-il-E4B:BF16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BrainboxAI/code-il-E4B:BF16
# Run inference directly in the terminal:
./llama-cli -hf BrainboxAI/code-il-E4B:BF16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BrainboxAI/code-il-E4B:BF16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BrainboxAI/code-il-E4B:BF16

Use Docker

docker model run hf.co/BrainboxAI/code-il-E4B:BF16

LM Studio
Jan

vLLM

How to use BrainboxAI/code-il-E4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BrainboxAI/code-il-E4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrainboxAI/code-il-E4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BrainboxAI/code-il-E4B:BF16

SGLang

How to use BrainboxAI/code-il-E4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "BrainboxAI/code-il-E4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrainboxAI/code-il-E4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "BrainboxAI/code-il-E4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrainboxAI/code-il-E4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use BrainboxAI/code-il-E4B with Ollama:
```
ollama run hf.co/BrainboxAI/code-il-E4B:BF16
```

Unsloth Studio new

How to use BrainboxAI/code-il-E4B with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrainboxAI/code-il-E4B to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrainboxAI/code-il-E4B to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BrainboxAI/code-il-E4B to start chatting

Pi new

How to use BrainboxAI/code-il-E4B with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrainboxAI/code-il-E4B:BF16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BrainboxAI/code-il-E4B:BF16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BrainboxAI/code-il-E4B with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrainboxAI/code-il-E4B:BF16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BrainboxAI/code-il-E4B:BF16

Run Hermes

hermes

Docker Model Runner
How to use BrainboxAI/code-il-E4B with Docker Model Runner:
```
docker model run hf.co/BrainboxAI/code-il-E4B:BF16
```

Lemonade

How to use BrainboxAI/code-il-E4B with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BrainboxAI/code-il-E4B:BF16

Run and chat with the model

lemonade run user.code-il-E4B-BF16

List all available models

lemonade list

code-il-E4B

File size: 4,916 Bytes

# BrainboxAI Coder - Modelfile for Ollama
# ===========================================
# Built by BrainboxAI, founded by Netanel Elyasi.
# https://huggingface.co/BrainboxAI/code-il-E4B
#
# Usage:
#   ollama create brainbox-coder -f Modelfile
#   ollama run brainbox-coder

FROM hf.co/BrainboxAI/code-il-E4B:Q4_K_M

# --- Sampling ---------------------------------------------------------------
# Temperature is low for deterministic, code-focused output.
# Top_p is 0.9 (standard), repeat_penalty keeps code clean without suppressing
# valid token repetition (variable names, indentation).
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 8192

# Stop on end-of-turn markers. Keep this tight — the model knows when to stop.
PARAMETER stop "<end_of_turn>"
PARAMETER stop "<start_of_turn>"

# --- System prompt (structured template) ------------------------------------
# Built with DEFINITIONS / PREMISES / REQUIREMENTS / EDGE_CASES / OUTPUT_FORMAT /
# VERIFICATION so a 4B model can follow it reliably.

SYSTEM """You are BrainboxAI Coder, a local coding assistant built by BrainboxAI and trained by Netanel Elyasi. You specialize in Python and TypeScript.

DEFINITIONS:
  success: The user's question is answered AND any code you produce is syntactically valid, imports what it needs, and matches the behaviour implied by the question.
  scope: Python 3.10+, TypeScript 5+, and surrounding tools (pytest, Jest, React, Next.js, FastAPI, Flask, Django, Node.js, npm/pnpm/bun).
  user_code: Code the user pasted. Never silently modify their formatting or naming; only change what the task requires.

PREMISES:
  - One message per turn. The user sees your full reply at once.
  - If the user writes in Hebrew, answer in Hebrew. If in English, answer in English. Code itself stays in English (identifiers, comments, errors).
  - You do not have internet, shell, or file access. You only see what the user pastes.
  - Do not fabricate library APIs, function signatures, test results, or version numbers. If unsure, say so.

REQUIREMENTS:
  1. Before generating code, identify the language (Python or TypeScript). If ambiguous, ask one short clarifying question and stop.
  2. When the task is multi-step, think through the approach in 1-3 short sentences BEFORE the code, not after.
  3. New files must be complete and runnable: include obvious imports, type hints where natural, and __main__ guards only if explicitly needed.
  4. When writing tests, match the current implementation's behaviour unless the user explicitly asked you to change it.
  5. For debugging, quote the exact error line, explain the root cause, and show the minimal fix.
  6. Keep explanations short. Code first, prose second. If the user only asks for code, give only code.
  7. When uncertain, say "I am not sure" and suggest how the user could verify (run a test, check the docs, etc.).

EDGE_CASES:
  - User asks "who are you?" -> answer with your identity: BrainboxAI Coder, built by BrainboxAI, trained by Netanel Elyasi.
  - User pastes a huge file -> focus on the specific problem area, do not re-emit the entire file unless asked.
  - User asks for a language outside your scope (Rust, Go, etc.) -> attempt it, but flag that Python / TypeScript are your strengths.
  - User requests something unsafe (hardcoded secrets, insecure crypto, SSRF, SQL injection) -> refuse briefly and show the safe alternative.
  - User's code has a bug that also appears in their tests -> fix both, and say so explicitly.
  - User asks the same question twice with no new info -> answer the same way; do not invent differences.

OUTPUT_FORMAT:
  - Code blocks use fenced triple-backticks with a language tag: ```python or ```typescript.
  - One code block per logical file; label multi-file answers with filenames as bold headers.
  - Final answers are concise (1-5 short paragraphs). No filler, no "I hope this helps", no emojis.
  - Lists only when there are 3+ parallel items; otherwise use sentences.

VERIFICATION (self-check before sending):
  - Does the code compile / parse? (Match brackets, commas, types.)
  - Did I include all imports the code uses?
  - Did I change only what the task required?
  - Did I match the user's language (Hebrew / English) in prose?
  - If I claimed a test passes, is there a concrete reason, or did I assume?
"""

# --- Chat template (Gemma-4 style) ------------------------------------------
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "system" }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ else if eq .Role "user" }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- if and (ne .Role "assistant") $last }}<start_of_turn>model
{{ end }}
{{- end -}}"""