Instructions to use TheVortexProject/insectnet with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Scikit-learn
How to use TheVortexProject/insectnet with Scikit-learn:
from huggingface_hub import hf_hub_download import joblib model = joblib.load( hf_hub_download("TheVortexProject/insectnet", "sklearn_model.joblib") ) # only load pickle files from sources you trust # read more about it here https://skops.readthedocs.io/en/stable/persistence.html - Notebooks
- Google Colab
- Kaggle
Confidence Thresholds
Per-class confidence guidance for interpreting InsectNet predictions. Thresholds are class-specific — a universal cutoff produces both false positives and false negatives.
Current Guidance
| Class | Recommended Threshold | Status | Notes |
|---|---|---|---|
| cicada_drone | 0.50 | Confirmed | Natural capture at 83%, AC false positive at 92% — use RMS to disambiguate |
| frog | 0.50 | Confirmed | Validated at 51%; chorus peaks at 80-99%. RMS >0.004 increases confidence |
| cricket_katydid | 0.50 | Tentative | Summer chorus data needed for natural threshold |
| grasshopper | 0.80 | Data-limited | Only 183 training clips; most captures likely noise |
| bee | 0.80 | Data-limited | Only 43 training clips; night detections are false positives |
How Thresholds Were Determined
Cicada (83% natural confirmed)
The first natural cicada capture scored 83% with RMS 0.009. Playback tests hit 99-100%. However, an AC window unit also scores 92% on this class with background <2%. The 80%+ range includes both real cicadas and false positives.
Disambiguation: RMS. Real cicada at 83% had RMS 0.009. AC at 92% had RMS 0.02+. A high-confidence cicada detection with high RMS may be mechanical.
Frog (51% natural confirmed)
A frog detected at only 51% was confirmed real by the user. The frog chorus ranged from 55% (early evening, quiet) to 99.97% (peak chorus, loud).
Guidance: Any capture above 50% frog with RMS >0.004 is worth review. Loud captures (RMS >0.015) at 80-99% are highly likely real.
Bee (no natural data)
The bee class has no known real detections. All captures to date are false positives: weed whacker at 98%, night ambient noise at 50-70%. The true bee threshold cannot be established without natural captures.
Guidance: Discard bee detections at night. Treat day detections below 80% as noise.
Production vs Testing Threshold
| Context | Threshold | Rationale |
|---|---|---|
| Production capture | 0.30 default | Favor recall — a few false positives are cheaper than missed detections |
| Automated decision | 0.80 minimum | Only act on high-confidence predictions without human review |
| Research / scanning | 0.30 | Cast a wide net; review all uncertain captures manually |
The 0.30 default is conservative for capture. It produces some uncertain clips but the cost of missed insects is higher than the cost of reviewing a few extra WAVs.
RMS Noise Floor
The sidecar skips inference for WAVs below the RMS noise floor (default 0.002). This was calibrated from Pine Hollow ambient measurements and corresponds to quiet background with no detectable acoustic activity.
Playback sessions may need a lower floor (0.001) for quiet phone speakers.