Oh right, turns out I had it downloaded but hadn’t used it (the bird pack was to...

sdenton4 · on July 23, 2021

/Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird./

This is actually the 'real' research motivation behind the bird classification work: Slap a microphone to the side of a tree, pick it up in a month, and get some accurate picture of what species have been in the area.

Birds are relatively easy to observe, thanks to their vocalizations, which makes them an indicator species. We have a good idea what many species eat, so they end up telling you quite a lot about the surrounding ecosystem.

However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.

We've been encouraging further work on the soundscape problem by hosting the BirdCLEF and Kaggle competitions, and have been seeing steady progress. Improvements in the 'hard' soundscape problem have been driving improvements in the 'consumer' identification algorithms.

https://www.kaggle.com/c/birdclef-2021/overview

[source: I've been working with the BirdNet folks on and off for the last few years, and co-host the Kaggle competitions.]

hasmanean · on July 23, 2021

/“However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.“/

Yes pointing a directional mic introduces a whole new set of mechanical challenges.

Maybe you could build an irregular grid of omnidirectional microphones and use signal processing to direct the beam digitally (similar to radio beam-forming). Now you’ll need more processing horsepower to do FFTs to do phase shifts. Although if you assume the bird calls only occupy discrete frequencies you might be able to save some computation by just computing those.

Perhaps a machine learning model could be trained that does all of this for you. Then you get the benefit of hardware acceleration. Some ML chips can handle DSP tasks.

sdenton4 · on July 23, 2021

Yeah, one has to think of the 'microphone budget.' For microphone arrays, I think it's probably better overall (from the ecosystem management angle) to cover a wider non-overlapping region than getting a more comprehensive picture of a single point...

The quality of the single-source classifier is the obvious scientific bottleneck, though; improve it, and everything else will work better. (We've also got plenty* of existing training data for this case.) So that's where we've been focusing most of the energy.

* - depending on species, of course. See also: xeno-canto.org

ghaff · on July 23, 2021

Saw a talk a couple years ago about similar techniques being used to identify marine mammals. https://research.redhat.com/wp-content/uploads/2020/02/RRQ-V...

sundarurfriend · on July 23, 2021

Entirely Off-topic, but I really like this way of quoting. It's surprisingly satisfying to see the quoted text bounded between a starting and an ending delimiter.