Milos Bijanic
biki96
AI & ML interests
None yet
Recent Activity
updated a collection about 1 month ago
Text2Image updated a collection about 2 months ago
Text2Image updated a collection about 2 months ago
TTSOrganizations
None yet
Face Swap
A2A
Text2Image
IT3D
I2V
diffusion
-
DreamGaussian4D: Generative 4D Gaussian Splatting
Paper • 2312.17142 • Published • 19 -
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper • 2410.05167 • Published • 18 -
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Paper • 2410.04932 • Published • 9 -
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
Paper • 2410.11795 • Published • 18
I2I
Embedding
TTS
LLM
OCR
-
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 3.02M • 3.25k -
allenai/olmOCR-2-7B-1025-FP8
Image-Text-to-Text • 8B • Updated • 264k • 237 -
allenai/olmOCR-2-7B-1025
Image-Text-to-Text • 8B • Updated • 41.3k • 149 -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 9.81k • 1.61k
STT
- Running on CPU UpgradeAgentsFeatured1.36k
Open ASR Leaderboard
🏆1.36kCompare speech‑to‑text models across multiple benchmarks
-
nvidia/canary-qwen-2.5b
Automatic Speech Recognition • 3B • Updated • 75.7k • 431 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • 0.6B • Updated • 46.1k • • 875 -
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 279k • 1.48k
image-text-to-video
I2I
Face Swap
Embedding
A2A
TTS
Text2Image
LLM
IT3D
OCR
-
deepseek-ai/DeepSeek-OCR
Image-Text-to-Text • 3B • Updated • 3.02M • 3.25k -
allenai/olmOCR-2-7B-1025-FP8
Image-Text-to-Text • 8B • Updated • 264k • 237 -
allenai/olmOCR-2-7B-1025
Image-Text-to-Text • 8B • Updated • 41.3k • 149 -
PaddlePaddle/PaddleOCR-VL
Image-Text-to-Text • 1.0B • Updated • 9.81k • 1.61k
I2V
STT
- Running on CPU UpgradeAgentsFeatured1.36k
Open ASR Leaderboard
🏆1.36kCompare speech‑to‑text models across multiple benchmarks
-
nvidia/canary-qwen-2.5b
Automatic Speech Recognition • 3B • Updated • 75.7k • 431 -
nvidia/parakeet-tdt-0.6b-v3
Automatic Speech Recognition • 0.6B • Updated • 46.1k • • 875 -
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 279k • 1.48k
diffusion
-
DreamGaussian4D: Generative 4D Gaussian Splatting
Paper • 2312.17142 • Published • 19 -
Presto! Distilling Steps and Layers for Accelerating Music Generation
Paper • 2410.05167 • Published • 18 -
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction
Paper • 2410.04932 • Published • 9 -
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices
Paper • 2410.11795 • Published • 18