English-Document-OCR-Qwen3.5-0.8B
I built this model as part of my ongoing work in document digitization and archival OCR. My goal was to create a small, locally-runnable model that punches above its weight class. This 0.8B release keeps the same overall direction as my earlier Qwen OCR work, but uses improved dataset samples, stronger formatting targets, and better layout preservation. In practice, it produces better output than my previous 2B model on the kinds of structured document OCR I care about most.
This is an updated smaller release focused on English archival and document OCR. If you try it on your documents, I'd love to hear how it performs, feel free to leave feedback in the Community tab.
License: This model is intended for personal and research use only. If you want to use this model in a product or service, or need to process documents commercially, contact [email protected].
Model Details
- Fine-tuned by: loay
- Base Model:
unsloth/Qwen3.5-0.8B - Task: Document OCR
- Training Data: Improved synthetic English document samples following the same family of dataset styles as my earlier Qwen OCR releases, including faded ink, bleed-through artifacts, skewed layouts, historical serif typefaces, charts, figures, formulas, and more challenging structured page compositions
- Output Format: A markdown-first transcription format that preserves paragraph flow and layout structure, uses HTML for tables, uses LaTeX for formulas, emits
[image]tags for figures/images, and emits[chart: ...]tags when extracting chart content - Language Support: This release is optimized for English. I’m planning to release versions for additional languages soon, including support for right-to-left document OCR. See my OCR finetuned models for future updates and related releases.
Usage
The model does not require a specific prompt. It will perform OCR on any document image by default. To achieve the best results and prevent conversational hallucinations, use the exact instruction the model was fine-tuned on:
Extract all visible text from this document image and return only the transcription in reading order using a markdown-first format. Use HTML only for tables. Use LaTeX only for formulas.
GGUF & Local Inference
Quantized GGUF files are available for use with llama.cpp, LM Studio, Ollama, and similar runtimes.
You must load
mmproj-english-document-ocr-qwen3.5-0.8b-f16.ggufalongside your chosen weight file. Without the multimodal projector, the model cannot process images.
| File | Use Case |
|---|---|
english-document-ocr-qwen3.5-0.8b-f16.gguf |
Full precision, maximum accuracy |
english-document-ocr-qwen3.5-0.8b-q8_0.gguf |
Best quality/size tradeoff for OCR precision |
english-document-ocr-qwen3.5-0.8b-q6_k.gguf |
High quality, lower VRAM |
english-document-ocr-qwen3.5-0.8b-q5_k_m.gguf |
Balanced quality and speed |
english-document-ocr-qwen3.5-0.8b-q4_k_m.gguf |
Fast, efficient local inference |
mmproj-english-document-ocr-qwen3.5-0.8b-f16.gguf |
Required multimodal projector (load with any weight above) |
Example with llama.cpp:
llama-cli --model english-document-ocr-qwen3.5-0.8b-q4_k_m.gguf --mmproj mmproj-english-document-ocr-qwen3.5-0.8b-f16.gguf --image your_document.jpg
Limitations
- Trained exclusively on synthetic data. May degrade on severe real-world scan artifacts outside the training distribution.
- No handwriting support, relies on base model zero-shot for cursive or marginalia.
- Trained to represent figures and embedded images with
[image]tags and to extract chart content using[chart: ...]tags, but performance on complex real-world charts and scientific figures can still be inconsistent. - Supports LaTeX-style formula output as used in the training pipeline, but difficult mathematical layouts may still degrade on dense or low-quality scans.
- Optimized for LTR latin scripts. For Arabic/RTL documents, see my OCR models.
- May hallucinate or break on very long context from dense pages. If your document is text-heavy, consider splitting it into sections before inference.
- Downloads last month
- -
4-bit
5-bit
6-bit
8-bit
16-bit