-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.15M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 34 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 49 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 21
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.15M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 47.7k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.5k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.15M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 34 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 49 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 21
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.15M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 47.7k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.5k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
models 320
inference-optimization/Qwen3.5-9B-quantized.w4a16
Image-Text-to-Text • 9B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-noise-per-tensor
26B • Updated • 16
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-hybrid-per-tensor
27B • Updated • 13
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-heuristic-per-tensor
27B • Updated • 15
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-noise-per-tensor
25B • Updated • 15
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-hybrid-per-tensor
25B • Updated • 15
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-heuristic-per-tensor
25B • Updated • 16
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-noise-per-tensor
23B • Updated • 16
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-hybrid-per-tensor
23B • Updated • 16
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-heuristic-per-tensor
23B • Updated • 14
datasets 13
inference-optimization/laguna-xs-ultrachat-responses
Viewer • Updated • 208k • 10
inference-optimization/laguna-xs-ultrachat-conversations
Viewer • Updated • 205k • 11
inference-optimization/laguna-xs-magpie-300k-responses
Viewer • Updated • 300k • 13
inference-optimization/laguna-xs-magpie-300k-conversations
Viewer • Updated • 298k • 11
inference-optimization/Qwen3-8b-sharegpt-5k
Preview • Updated • 84
inference-optimization/speculators_benchmarks_tool_call
Viewer • Updated • 4.9k • 67
inference-optimization/speculators-qwen3-30b-a3b-instruct-2507
Preview • Updated • 32
inference-optimization/speculators-qwen3-30b-a3b-instruct
Preview • Updated • 62
inference-optimization/speculators-qwen3-32b-instruct
Preview • Updated • 71
inference-optimization/gpt-oss-20b-nan-hidden-states-repro
Updated • 52