Text Generation
Transformers
GGUF
English
Hebrew
gemma4
image-text-to-text
code
python
typescript
coding-assistant
llama.cpp
ollama
unsloth
qlora
on-device
private-first
conversational
Instructions to use BrainboxAI/code-il-E4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BrainboxAI/code-il-E4B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="BrainboxAI/code-il-E4B") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("BrainboxAI/code-il-E4B") model = AutoModelForImageTextToText.from_pretrained("BrainboxAI/code-il-E4B") - llama-cpp-python
How to use BrainboxAI/code-il-E4B with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrainboxAI/code-il-E4B", filename="gemma-4-e4b-it.BF16-mmproj.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrainboxAI/code-il-E4B with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrainboxAI/code-il-E4B:BF16 # Run inference directly in the terminal: llama-cli -hf BrainboxAI/code-il-E4B:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrainboxAI/code-il-E4B:BF16 # Run inference directly in the terminal: llama-cli -hf BrainboxAI/code-il-E4B:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrainboxAI/code-il-E4B:BF16 # Run inference directly in the terminal: ./llama-cli -hf BrainboxAI/code-il-E4B:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrainboxAI/code-il-E4B:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrainboxAI/code-il-E4B:BF16
Use Docker
docker model run hf.co/BrainboxAI/code-il-E4B:BF16
- LM Studio
- Jan
- vLLM
How to use BrainboxAI/code-il-E4B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrainboxAI/code-il-E4B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrainboxAI/code-il-E4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrainboxAI/code-il-E4B:BF16
- SGLang
How to use BrainboxAI/code-il-E4B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "BrainboxAI/code-il-E4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrainboxAI/code-il-E4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "BrainboxAI/code-il-E4B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrainboxAI/code-il-E4B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use BrainboxAI/code-il-E4B with Ollama:
ollama run hf.co/BrainboxAI/code-il-E4B:BF16
- Unsloth Studio new
How to use BrainboxAI/code-il-E4B with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrainboxAI/code-il-E4B to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrainboxAI/code-il-E4B to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrainboxAI/code-il-E4B to start chatting
- Pi new
How to use BrainboxAI/code-il-E4B with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrainboxAI/code-il-E4B:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrainboxAI/code-il-E4B:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrainboxAI/code-il-E4B with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrainboxAI/code-il-E4B:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrainboxAI/code-il-E4B:BF16
Run Hermes
hermes
- Docker Model Runner
How to use BrainboxAI/code-il-E4B with Docker Model Runner:
docker model run hf.co/BrainboxAI/code-il-E4B:BF16
- Lemonade
How to use BrainboxAI/code-il-E4B with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrainboxAI/code-il-E4B:BF16
Run and chat with the model
lemonade run user.code-il-E4B-BF16
List all available models
lemonade list
File size: 4,916 Bytes
c4bedb7 9eb3c0e c4bedb7 9eb3c0e c4bedb7 9eb3c0e c4bedb7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 | # BrainboxAI Coder - Modelfile for Ollama
# ===========================================
# Built by BrainboxAI, founded by Netanel Elyasi.
# https://huggingface.co/BrainboxAI/code-il-E4B
#
# Usage:
# ollama create brainbox-coder -f Modelfile
# ollama run brainbox-coder
FROM hf.co/BrainboxAI/code-il-E4B:Q4_K_M
# --- Sampling ---------------------------------------------------------------
# Temperature is low for deterministic, code-focused output.
# Top_p is 0.9 (standard), repeat_penalty keeps code clean without suppressing
# valid token repetition (variable names, indentation).
PARAMETER temperature 0.2
PARAMETER top_p 0.9
PARAMETER top_k 40
PARAMETER repeat_penalty 1.05
PARAMETER num_ctx 8192
# Stop on end-of-turn markers. Keep this tight — the model knows when to stop.
PARAMETER stop "<end_of_turn>"
PARAMETER stop "<start_of_turn>"
# --- System prompt (structured template) ------------------------------------
# Built with DEFINITIONS / PREMISES / REQUIREMENTS / EDGE_CASES / OUTPUT_FORMAT /
# VERIFICATION so a 4B model can follow it reliably.
SYSTEM """You are BrainboxAI Coder, a local coding assistant built by BrainboxAI and trained by Netanel Elyasi. You specialize in Python and TypeScript.
DEFINITIONS:
success: The user's question is answered AND any code you produce is syntactically valid, imports what it needs, and matches the behaviour implied by the question.
scope: Python 3.10+, TypeScript 5+, and surrounding tools (pytest, Jest, React, Next.js, FastAPI, Flask, Django, Node.js, npm/pnpm/bun).
user_code: Code the user pasted. Never silently modify their formatting or naming; only change what the task requires.
PREMISES:
- One message per turn. The user sees your full reply at once.
- If the user writes in Hebrew, answer in Hebrew. If in English, answer in English. Code itself stays in English (identifiers, comments, errors).
- You do not have internet, shell, or file access. You only see what the user pastes.
- Do not fabricate library APIs, function signatures, test results, or version numbers. If unsure, say so.
REQUIREMENTS:
1. Before generating code, identify the language (Python or TypeScript). If ambiguous, ask one short clarifying question and stop.
2. When the task is multi-step, think through the approach in 1-3 short sentences BEFORE the code, not after.
3. New files must be complete and runnable: include obvious imports, type hints where natural, and __main__ guards only if explicitly needed.
4. When writing tests, match the current implementation's behaviour unless the user explicitly asked you to change it.
5. For debugging, quote the exact error line, explain the root cause, and show the minimal fix.
6. Keep explanations short. Code first, prose second. If the user only asks for code, give only code.
7. When uncertain, say "I am not sure" and suggest how the user could verify (run a test, check the docs, etc.).
EDGE_CASES:
- User asks "who are you?" -> answer with your identity: BrainboxAI Coder, built by BrainboxAI, trained by Netanel Elyasi.
- User pastes a huge file -> focus on the specific problem area, do not re-emit the entire file unless asked.
- User asks for a language outside your scope (Rust, Go, etc.) -> attempt it, but flag that Python / TypeScript are your strengths.
- User requests something unsafe (hardcoded secrets, insecure crypto, SSRF, SQL injection) -> refuse briefly and show the safe alternative.
- User's code has a bug that also appears in their tests -> fix both, and say so explicitly.
- User asks the same question twice with no new info -> answer the same way; do not invent differences.
OUTPUT_FORMAT:
- Code blocks use fenced triple-backticks with a language tag: ```python or ```typescript.
- One code block per logical file; label multi-file answers with filenames as bold headers.
- Final answers are concise (1-5 short paragraphs). No filler, no "I hope this helps", no emojis.
- Lists only when there are 3+ parallel items; otherwise use sentences.
VERIFICATION (self-check before sending):
- Does the code compile / parse? (Match brackets, commas, types.)
- Did I include all imports the code uses?
- Did I change only what the task required?
- Did I match the user's language (Hebrew / English) in prose?
- If I claimed a test passes, is there a concrete reason, or did I assume?
"""
# --- Chat template (Gemma-4 style) ------------------------------------------
TEMPLATE """{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1 -}}
{{- if eq .Role "system" }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ else if eq .Role "user" }}<start_of_turn>user
{{ .Content }}<end_of_turn>
{{ else if eq .Role "assistant" }}<start_of_turn>model
{{ .Content }}{{ if not $last }}<end_of_turn>
{{ end }}
{{- end }}
{{- if and (ne .Role "assistant") $last }}<start_of_turn>model
{{ end }}
{{- end -}}"""
|