Granite 4.1 Language Models Collection Efficient language models for multilingual generation, coding, RAG, and AI assistant workflows. • 6 items • Updated Apr 29 • 54
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON Paper • 2503.01151 • Published Mar 3, 2025 • 5
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Paper • 2401.03506 • Published Jan 7, 2024 • 16
Speaker Attributed Automatic Speech Recognition Using Speech Aware LLMS Paper • 2604.11269 • Published Apr 13 • 2
M3SD: Multi-modal, Multi-scenario and Multi-language Speaker Diarization Dataset Paper • 2506.14427 • Published Jun 17, 2025 • 1
MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization Paper • 2601.01554 • Published Jan 4 • 61
SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models Paper • 2508.06372 • Published Aug 8, 2025 • 3
VideoPrism Collection VideoPrism is a foundational video encoder that enables state-of-the-art performance on a large variety of video understanding tasks. • 5 items • Updated Mar 12 • 19
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated Apr 22 • 195
view article Article Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines +2 YiYiXu, OzzyGT, dn6, sayakpaul • Mar 5 • 51
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego • Mar 10 • 158