Balalaika models
Collection
5 items β’ Updated β’ 5
Official model for our INTERSPEECH 2026 paper "A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models" (arXiv:2507.13563). Part of the Balalaika Russian speech data-processing pipeline β code: https://github.com/lab260ru/balalaika. If you use this resource, please cite it.
microsoft/wavlm-base-plus
the best threshold value 0.2442
git clone https://huggingface.co/MTUCI/MusicDetection
cd MusicDetection
pip install -r requirements.txt
from model import WavLMForMusicDetection
from safetensors import safe_open
model = WavLMForMusicDetection(batch_size=32, device='cuda')
with safe_open('music_detection.safetensors', framework="pt") as f:
model.load_state_dict({k: f.get_tensor(k) for k in f.keys()})
probs = model.predict_proba(['audio1.mp3', 'audio2.wav']) # β tensor([0.88, 0.11])
## Contact
- Email: kborodin.research@gmail.com
- Telegram: [@korallll_ai](https://t.me/korallll_ai)
## Citation
If you use this resource, please cite our INTERSPEECH 2026 paper:
```bibtex
@inproceedings{borodin2026balalaika,
title = {A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models},
author = {Borodin, Kirill and Vasiliev, Nikita and Kudryavtsev, Vasiliy and Maslov, Maxim and Gorodnichev, Mikhail and Rogov, Oleg and Mkrtchian, Grach},
booktitle = {Proc. INTERSPEECH 2026},
year = {2026},
note = {arXiv:2507.13563},
url = {https://arxiv.org/abs/2507.13563}
}
Base model
microsoft/wavlm-base-plus