Music Detection with WavLM

Official model for our INTERSPEECH 2026 paper "A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models" (arXiv:2507.13563). Part of the Balalaika Russian speech data-processing pipeline — code: https://github.com/lab260ru/balalaika. If you use this resource, please cite it.

Detects if audio contains music.
EER: 2.5–3% | Based on `microsoft/wavlm-base-plus` the best threshold value `0.2442`

Quick Start

git clone https://huggingface.co/MTUCI/MusicDetection
cd MusicDetection
pip install -r requirements.txt

Usage

from model import WavLMForMusicDetection
from safetensors import safe_open

model = WavLMForMusicDetection(batch_size=32, device='cuda')
with safe_open('music_detection.safetensors', framework="pt") as f:
    model.load_state_dict({k: f.get_tensor(k) for k in f.keys()})

probs = model.predict_proba(['audio1.mp3', 'audio2.wav'])  # → tensor([0.88, 0.11])

## Contact

- Email: kborodin.research@gmail.com
- Telegram: [@korallll_ai](https://t.me/korallll_ai)

## Citation

If you use this resource, please cite our INTERSPEECH 2026 paper:

```bibtex
@inproceedings{borodin2026balalaika,
  title     = {A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models},
  author    = {Borodin, Kirill and Vasiliev, Nikita and Kudryavtsev, Vasiliy and Maslov, Maxim and Gorodnichev, Mikhail and Rogov, Oleg and Mkrtchian, Grach},
  booktitle = {Proc. INTERSPEECH 2026},
  year      = {2026},
  note      = {arXiv:2507.13563},
  url       = {https://arxiv.org/abs/2507.13563}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lab260/MusicDetection

Base model

microsoft/wavlm-base-plus

Quantized

(2)

this model

Collection including lab260/MusicDetection

Balalaika models

Collection

5 items • Updated Mar 2 • 5

Paper for lab260/MusicDetection

A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models

Paper • 2507.13563 • Published Jul 17, 2025 • 53

Music Detection with WavLM

Detects if audio contains music.EER: 2.5–3% | Based on microsoft/wavlm-base-plus the best threshold value 0.2442

Quick Start

Usage

Model tree for lab260/MusicDetection

Collection including lab260/MusicDetection

Paper for lab260/MusicDetection

Detects if audio contains music.
EER: 2.5–3% | Based on `microsoft/wavlm-base-plus` the best threshold value `0.2442`