Clear — on-device speech enhancement

48 kHz on-device speech enhancement, trained on real Detail team recordings and optimized for a range of microphones, removing background noise and reverberation to leave the voice warm and present, closer to a podcast studio than a phone call. Two premium-tier variants ship from this repo.

Variants

Variant	Character	When to use
`clear-studio`	Quiet, studio-like — silences near zero	Default. Works across the full range of input quality — phone audio, laptop mic, untreated rooms, USB / XLR podcast captures
`clear-natural`	Room tone, breath, lip texture preserved	Treated podcast studios, USB / XLR captures, voiceover where the original sound is intentional

If your source is already clean and you want the model to stay invisible, pick clear-natural. Otherwise, clear-studio is the default.

Files

Both variants ship in two formats. Same architecture, same realtime cost — only the weights differ.

Variant	File	Format	Download
`clear-studio`	`clear-studio.mlpackage.zip`	Core ML mlpackage (fp16)	~3.8 MB
`clear-studio`	`clear-studio.mlmodelc.zip`	Core ML mlmodelc (fp16, precompiled)	~3.8 MB
`clear-studio`	`clear-studio.onnx`	ONNX (fp32)	~8.5 MB
`clear-natural`	`clear-natural.mlpackage.zip`	Core ML mlpackage (fp16)	~3.8 MB
`clear-natural`	`clear-natural.mlmodelc.zip`	Core ML mlmodelc (fp16, precompiled)	~3.8 MB
`clear-natural`	`clear-natural.onnx`	ONNX (fp32)	~8.5 MB

Spec

Architecture: DeepFilterNet 3 (DFN3-half)
Sample rate: 48 kHz, mono or stereo (per-channel inference)
Inference contract: spec / feat_erb / feat_spec → spec_enhanced. STFT, ERB, and ISTFT are done host-side via vDSP (Swift) or pure Kotlin

Performance

Both variants share the architecture and run at the same speed. Enhancing a 5-minute clip on the Apple Neural Engine:

Device	Chip	Mono	Stereo
iPhone 15 Pro	A17 Pro	4.88 s (61× realtime)	6.53 s (46×)
iPhone 17 Pro	A19 Pro	3.70 s (81× realtime)	5.16 s (58×)

Cold model load is ~0.6 s; later loads are ~100 ms via the system ANE cache.

Used in

Detail — iOS and macOS video recording.
Subwave — publish audio and video stories.

Built on

DeepFilterNet 3 by Rikorose — MIT. Fine-tuned on Detail's speech corpus.

License

CC BY-NC 4.0. Free for research, evaluation, and personal use with attribution. Commercial use requires a separate license — contact paul@detail.co.

Made by Detail Technologies B.V.

Downloads last month: -; Downloads are not tracked for this model. How to track