English-Document-OCR-Qwen3.5-0.8B

I built this model as part of my ongoing work in document digitization and archival OCR. My goal was to create a small, locally-runnable model that punches above its weight class. This 0.8B release keeps the same overall direction as my earlier Qwen OCR work, but uses improved dataset samples, stronger formatting targets, and better layout preservation. In practice, it produces better output than my previous 2B model on the kinds of structured document OCR I care about most.

This is an updated smaller release focused on English archival and document OCR. If you try it on your documents, I'd love to hear how it performs, feel free to leave feedback in the Community tab.

License: This model is intended for personal and research use only. If you want to use this model in a product or service, or need to process documents commercially, contact [email protected].

Model Details

Fine-tuned by: loay
Base Model: unsloth/Qwen3.5-0.8B
Task: Document OCR
Training Data: Improved synthetic English document samples following the same family of dataset styles as my earlier Qwen OCR releases, including faded ink, bleed-through artifacts, skewed layouts, historical serif typefaces, charts, figures, formulas, and more challenging structured page compositions
Output Format: A markdown-first transcription format that preserves paragraph flow and layout structure, uses HTML for tables, uses LaTeX for formulas, emits [image] tags for figures/images, and emits [chart: ...] tags when extracting chart content
Language Support: This release is optimized for English. I’m planning to release versions for additional languages soon, including support for right-to-left document OCR. See my OCR finetuned models for future updates and related releases.

Usage

The model does not require a specific prompt. It will perform OCR on any document image by default. To achieve the best results and prevent conversational hallucinations, use the exact instruction the model was fine-tuned on:

Extract all visible text from this document image and return only the transcription in reading order using a markdown-first format. Use HTML only for tables. Use LaTeX only for formulas.

GGUF & Local Inference

Quantized GGUF files are available for use with llama.cpp, LM Studio, Ollama, and similar runtimes.

You must load mmproj-english-document-ocr-qwen3.5-0.8b-f16.gguf alongside your chosen weight file. Without the multimodal projector, the model cannot process images.

File	Use Case
`english-document-ocr-qwen3.5-0.8b-f16.gguf`	Full precision, maximum accuracy
`english-document-ocr-qwen3.5-0.8b-q8_0.gguf`	Best quality/size tradeoff for OCR precision
`english-document-ocr-qwen3.5-0.8b-q6_k.gguf`	High quality, lower VRAM
`english-document-ocr-qwen3.5-0.8b-q5_k_m.gguf`	Balanced quality and speed
`english-document-ocr-qwen3.5-0.8b-q4_k_m.gguf`	Fast, efficient local inference
`mmproj-english-document-ocr-qwen3.5-0.8b-f16.gguf`	Required multimodal projector (load with any weight above)

Example with llama.cpp:

llama-cli       --model english-document-ocr-qwen3.5-0.8b-q4_k_m.gguf       --mmproj mmproj-english-document-ocr-qwen3.5-0.8b-f16.gguf       --image your_document.jpg

Limitations

Trained exclusively on synthetic data. May degrade on severe real-world scan artifacts outside the training distribution.
No handwriting support, relies on base model zero-shot for cursive or marginalia.
Trained to represent figures and embedded images with [image] tags and to extract chart content using [chart: ...] tags, but performance on complex real-world charts and scientific figures can still be inconsistent.
Supports LaTeX-style formula output as used in the training pipeline, but difficult mathematical layouts may still degrade on dense or low-quality scans.
Optimized for LTR latin scripts. For Arabic/RTL documents, see my OCR models.
May hallucinate or break on very long context from dense pages. If your document is text-heavy, consider splitting it into sections before inference.

Downloads last month: -

GGUF

Model size

0.8B params

Architecture

qwen35

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

16-bit

Model tree for loay/English-Document-OCR-Qwen3.5-0.8B

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Quantized

(89)

this model

Collection including loay/English-Document-OCR-Qwen3.5-0.8B

Open Vision, Layout & OCR Models by Loay

Collection

This collection hosts a series of Vision Language Models (VLMs) fine-tuned for Optical Character Recognition (OCR) and Document Processing. • 5 items • Updated 1 day ago