Best open-source Text-to-Speech (TTS) models — SOTA neural voice synthesis, zero-shot cloning, multilingual & expressive speech generation.
Sinapsis AI
community
AI & ML interests
Agentic Platform Completely Open Source, Three Phases: - Agentic - Agents - Tools
Recent Activity
View all activity
Best Open Source models for Audio Classification (emotion, music genre, language ID, etc.)
-
MIT/ast-finetuned-audioset-10-10-0.4593
Audio Classification • 86.6M • Updated • 432k • 352 -
speechbrain/emotion-recognition-wav2vec2-IEMOCAP
Audio Classification • Updated • 544k • 184 -
laion/clap-htsat-fused
Audio Classification • 0.2B • Updated • 17.6M • 78 -
m-a-p/MERT-v1-330M
Audio Classification • Updated • 46k • 83
-
BAAI/bge-m3
Sentence Similarity • Updated • 17.6M • • 2.94k -
sentence-transformers/all-MiniLM-L6-v2
Sentence Similarity • 22.7M • Updated • 211M • • 4.72k -
google/embeddinggemma-300m
Sentence Similarity • 0.3B • Updated • 1.03M • • 1.61k -
Qwen/Qwen3-Embedding-8B
Feature Extraction • 8B • Updated • 1.97M • • 658
Image to Image
Image To Video
Music & Sound Generation - Best Open Source models (MusicGen, Stable Audio, etc.)
Speech to Text (ASR) - Best Open Source models
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.87M • • 5.62k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 6.88M • • 2.96k -
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 170k • 1.46k -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 74.9k • 972
Text to Image
Text to Video
The idea of this Collection is to gather those interesting models that are Open Source and I can use them in the webpage
-
moonshotai/Kimi-K2-Thinking
Text Generation • 1.1T • Updated • 95.2k • • 1.7k -
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
allenai/Olmo-3-32B-Think
Text Generation • 1.05M • Updated • 5.36k • 171 -
allenai/Olmo-3-7B-Instruct
Text Generation • 528k • Updated • 660k • • 128
-
Qwen/Qwen3-VL-235B-A22B-Thinking
Image-Text-to-Text • 236B • Updated • 228k • • 389 -
Qwen/Qwen3-235B-A22B-Instruct-2507
Text Generation • Updated • 164k • • 774 -
Qwen/Qwen-Image-Edit-2509
Image-to-Image • Updated • 186k • • 1.11k -
Qwen/Qwen3-VL-8B-Instruct
Image-Text-to-Text • 9B • Updated • 3.94M • • 881
Best open-source Text-to-Speech (TTS) models — SOTA neural voice synthesis, zero-shot cloning, multilingual & expressive speech generation.
Music & Sound Generation - Best Open Source models (MusicGen, Stable Audio, etc.)
Best Open Source models for Audio Classification (emotion, music genre, language ID, etc.)
-
MIT/ast-finetuned-audioset-10-10-0.4593
Audio Classification • 86.6M • Updated • 432k • 352 -
speechbrain/emotion-recognition-wav2vec2-IEMOCAP
Audio Classification • Updated • 544k • 184 -
laion/clap-htsat-fused
Audio Classification • 0.2B • Updated • 17.6M • 78 -
m-a-p/MERT-v1-330M
Audio Classification • Updated • 46k • 83
Speech to Text (ASR) - Best Open Source models
-
openai/whisper-large-v3
Automatic Speech Recognition • 2B • Updated • 4.87M • • 5.62k -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 6.88M • • 2.96k -
nvidia/parakeet-tdt-0.6b-v2
Automatic Speech Recognition • Updated • 170k • 1.46k -
facebook/seamless-m4t-v2-large
Automatic Speech Recognition • 2B • Updated • 74.9k • 972
-
BAAI/bge-m3
Sentence Similarity • Updated • 17.6M • • 2.94k -
sentence-transformers/all-MiniLM-L6-v2
Sentence Similarity • 22.7M • Updated • 211M • • 4.72k -
google/embeddinggemma-300m
Sentence Similarity • 0.3B • Updated • 1.03M • • 1.61k -
Qwen/Qwen3-Embedding-8B
Feature Extraction • 8B • Updated • 1.97M • • 658
Text to Image
Image to Image
Text to Video
Image To Video
The idea of this Collection is to gather those interesting models that are Open Source and I can use them in the webpage
-
moonshotai/Kimi-K2-Thinking
Text Generation • 1.1T • Updated • 95.2k • • 1.7k -
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
allenai/Olmo-3-32B-Think
Text Generation • 1.05M • Updated • 5.36k • 171 -
allenai/Olmo-3-7B-Instruct
Text Generation • 528k • Updated • 660k • • 128
-
Qwen/Qwen3-VL-235B-A22B-Thinking
Image-Text-to-Text • 236B • Updated • 228k • • 389 -
Qwen/Qwen3-235B-A22B-Instruct-2507
Text Generation • Updated • 164k • • 774 -
Qwen/Qwen-Image-Edit-2509
Image-to-Image • Updated • 186k • • 1.11k -
Qwen/Qwen3-VL-8B-Instruct
Image-Text-to-Text • 9B • Updated • 3.94M • • 881