Clear β€” on-device speech enhancement

48 kHz on-device speech enhancement, trained on real Detail team recordings and optimized for a range of microphones, removing background noise and reverberation to leave the voice warm and present, closer to a podcast studio than a phone call. Two premium-tier variants ship from this repo.

Variants

Variant Character When to use
clear-studio Quiet, studio-like β€” silences near zero Default. Works across the full range of input quality β€” phone audio, laptop mic, untreated rooms, USB / XLR podcast captures
clear-natural Room tone, breath, lip texture preserved Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional

If your source is already clean and you want the model to stay invisible, pick clear-natural. Otherwise, clear-studio is the default.

Files

Both variants ship in two formats. Same architecture, same realtime cost β€” only the weights differ.

Variant File Format Download
clear-studio clear-studio.mlpackage.zip Core ML mlpackage (fp16) ~3.8 MB
clear-studio clear-studio.mlmodelc.zip Core ML mlmodelc (fp16, precompiled) ~3.8 MB
clear-studio clear-studio.onnx ONNX (fp32) ~8.5 MB
clear-natural clear-natural.mlpackage.zip Core ML mlpackage (fp16) ~3.8 MB
clear-natural clear-natural.mlmodelc.zip Core ML mlmodelc (fp16, precompiled) ~3.8 MB
clear-natural clear-natural.onnx ONNX (fp32) ~8.5 MB

Spec

  • Architecture: DeepFilterNet 3 (DFN3-half)
  • Sample rate: 48 kHz, mono or stereo (per-channel inference)
  • Inference contract: spec / feat_erb / feat_spec β†’ spec_enhanced. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin

Performance

Both variants share the architecture and run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:

Device Chip Mono Stereo
iPhone 15 Pro A17 Pro 4.88 s (61Γ— realtime) 6.53 s (46Γ—)
iPhone 17 Pro A19 Pro 3.70 s (81Γ— realtime) 5.16 s (58Γ—)

Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.

Used in

  • Detail β€” iOS and macOS video recording.
  • Subwave β€” publish audio and video stories.

Built on

  • DeepFilterNet 3 by Rikorose β€” MIT. Fine-tuned on Detail's speech corpus.

License

CC BY-NC 4.0. Free for research, evaluation, and personal use with attribution. Commercial use requires a separate license β€” contact paul@detail.co.

Made by Detail Technologies B.V.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support