Audio Classification
ONNX
Russian

Music Detection with WavLM

arXiv Conference Code

Official model for our INTERSPEECH 2026 paper "A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models" (arXiv:2507.13563). Part of the Balalaika Russian speech data-processing pipeline β€” code: https://github.com/lab260ru/balalaika. If you use this resource, please cite it.

Detects if audio contains music.
EER: 2.5–3% | Based on microsoft/wavlm-base-plus the best threshold value 0.2442

Quick Start

git clone https://huggingface.co/MTUCI/MusicDetection
cd MusicDetection
pip install -r requirements.txt

Usage

from model import WavLMForMusicDetection
from safetensors import safe_open

model = WavLMForMusicDetection(batch_size=32, device='cuda')
with safe_open('music_detection.safetensors', framework="pt") as f:
    model.load_state_dict({k: f.get_tensor(k) for k in f.keys()})

probs = model.predict_proba(['audio1.mp3', 'audio2.wav'])  # β†’ tensor([0.88, 0.11])

## Contact

- Email: kborodin.research@gmail.com
- Telegram: [@korallll_ai](https://t.me/korallll_ai)

## Citation

If you use this resource, please cite our INTERSPEECH 2026 paper:

```bibtex
@inproceedings{borodin2026balalaika,
  title     = {A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models},
  author    = {Borodin, Kirill and Vasiliev, Nikita and Kudryavtsev, Vasiliy and Maslov, Maxim and Gorodnichev, Mikhail and Rogov, Oleg and Mkrtchian, Grach},
  booktitle = {Proc. INTERSPEECH 2026},
  year      = {2026},
  note      = {arXiv:2507.13563},
  url       = {https://arxiv.org/abs/2507.13563}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for lab260/MusicDetection

Quantized
(2)
this model

Collection including lab260/MusicDetection

Paper for lab260/MusicDetection