Clear β on-device speech enhancement
48 kHz on-device speech enhancement, trained on real Detail team recordings and optimized for a range of microphones, removing background noise and reverberation to leave the voice warm and present, closer to a podcast studio than a phone call. Two premium-tier variants ship from this repo.
Variants
| Variant | Character | When to use |
|---|---|---|
clear-studio |
Quiet, studio-like β silences near zero | Default. Works across the full range of input quality β phone audio, laptop mic, untreated rooms, USB / XLR podcast captures |
clear-natural |
Room tone, breath, lip texture preserved | Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional |
If your source is already clean and you want the model to stay
invisible, pick clear-natural. Otherwise, clear-studio is the
default.
Files
Both variants ship in two formats. Same architecture, same realtime cost β only the weights differ.
| Variant | File | Format | Download |
|---|---|---|---|
clear-studio |
clear-studio.mlpackage.zip |
Core ML mlpackage (fp16) | ~3.8 MB |
clear-studio |
clear-studio.mlmodelc.zip |
Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
clear-studio |
clear-studio.onnx |
ONNX (fp32) | ~8.5 MB |
clear-natural |
clear-natural.mlpackage.zip |
Core ML mlpackage (fp16) | ~3.8 MB |
clear-natural |
clear-natural.mlmodelc.zip |
Core ML mlmodelc (fp16, precompiled) | ~3.8 MB |
clear-natural |
clear-natural.onnx |
ONNX (fp32) | ~8.5 MB |
Spec
- Architecture: DeepFilterNet 3 (DFN3-half)
- Sample rate: 48 kHz, mono or stereo (per-channel inference)
- Inference contract:
spec/feat_erb/feat_specβspec_enhanced. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin
Performance
Both variants share the architecture and run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:
| Device | Chip | Mono | Stereo |
|---|---|---|---|
| iPhone 15 Pro | A17 Pro | 4.88 s (61Γ realtime) | 6.53 s (46Γ) |
| iPhone 17 Pro | A19 Pro | 3.70 s (81Γ realtime) | 5.16 s (58Γ) |
Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.
Used in
Built on
- DeepFilterNet 3 by Rikorose β MIT. Fine-tuned on Detail's speech corpus.
License
CC BY-NC 4.0. Free
for research, evaluation, and personal use with attribution.
Commercial use requires a separate license β contact
paul@detail.co.
Made by Detail Technologies B.V.