Instructions to use PediaMedAI/CogSense-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use PediaMedAI/CogSense-8B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="PediaMedAI/CogSense-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("PediaMedAI/CogSense-8B")
model = AutoModelForImageTextToText.from_pretrained("PediaMedAI/CogSense-8B")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use PediaMedAI/CogSense-8B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "PediaMedAI/CogSense-8B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PediaMedAI/CogSense-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/PediaMedAI/CogSense-8B

SGLang

How to use PediaMedAI/CogSense-8B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "PediaMedAI/CogSense-8B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PediaMedAI/CogSense-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "PediaMedAI/CogSense-8B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "PediaMedAI/CogSense-8B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use PediaMedAI/CogSense-8B with Docker Model Runner:
```
docker model run hf.co/PediaMedAI/CogSense-8B
```

CogSense-8B

This repository contains the weights for CogSense-8B, a Multimodal Large Language Model (MLLM) introduced in the paper Toward Cognitive Supersensing in Multimodal Large Language Model.

Project Page | Code | Paper

Introduction

CogSense-8B is trained using Cognitive Supersensing, a novel training paradigm that endows MLLMs with human-like visual imagery capabilities. By integrating a Latent Visual Imagery Prediction (LVIP) head, the model learns sequences of visual cognitive latent embeddings and aligns them with answers, forming vision-based internal reasoning chains. This approach aims to bridge the gap between perceptual recognition and complex cognitive understanding.

CogSense-Bench

The model's cognitive capabilities are evaluated on CogSense-Bench, a comprehensive visual question answering (VQA) benchmark assessing five cognitive dimensions:

Fluid intelligence
Crystallized intelligence
Visuospatial cognition
Mental simulation
Visual routines

Citation

If you find this work useful, please consider citing:

@misc{li2026cognitivesupersensingmultimodallarge,
      title={Toward Cognitive Supersensing in Multimodal Large Language Model}, 
      author={Boyi Li and Yifan Shen and Yuanzhe Liu and Yifan Xu and Jiateng Liu and Xinzhuo Li and Zhengyuan Li and Jingyuan Zhu and Yunhan Zhong and Fangzhou Lan and Jianguo Cao and James M. Rehg and Heng Ji and Ismini Lourentzou and Xu Cao},
      year={2026},
      eprint={2602.01541},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.01541}, 
}

Downloads last month: 15

Safetensors

Model size

9B params

Tensor type

BF16

Paper for PediaMedAI/CogSense-8B

Toward Cognitive Supersensing in Multimodal Large Language Model

Paper • 2602.01541 • Published Feb 2 • 16