Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh right, turns out I had it downloaded but hadn’t used it (the bird pack was too big to download over data).

The benefit of using hardware accelerated ML built into the phone is that it’s much more lower power. It’s designed for continuous use cases (“Hey Alexa” or Hey Siri). So you don’t have to turn the recording on and off and miss the bird call.

I don’t know if continuous monitoring can be used by third party apps. But having it on all the time, with geolocation would be amazing. You could set alerts etc.

If you could use multiple phones to locate the bird in 3space that would be neat. Then you could tell people where to point their cameras. Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird. This is the future.



/Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird./

This is actually the 'real' research motivation behind the bird classification work: Slap a microphone to the side of a tree, pick it up in a month, and get some accurate picture of what species have been in the area.

Birds are relatively easy to observe, thanks to their vocalizations, which makes them an indicator species. We have a good idea what many species eat, so they end up telling you quite a lot about the surrounding ecosystem.

However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.

We've been encouraging further work on the soundscape problem by hosting the BirdCLEF and Kaggle competitions, and have been seeing steady progress. Improvements in the 'hard' soundscape problem have been driving improvements in the 'consumer' identification algorithms.

https://www.kaggle.com/c/birdclef-2021/overview

[source: I've been working with the BirdNet folks on and off for the last few years, and co-host the Kaggle competitions.]


/“However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.“/

Yes pointing a directional mic introduces a whole new set of mechanical challenges.

Maybe you could build an irregular grid of omnidirectional microphones and use signal processing to direct the beam digitally (similar to radio beam-forming). Now you’ll need more processing horsepower to do FFTs to do phase shifts. Although if you assume the bird calls only occupy discrete frequencies you might be able to save some computation by just computing those.

Perhaps a machine learning model could be trained that does all of this for you. Then you get the benefit of hardware acceleration. Some ML chips can handle DSP tasks.


Yeah, one has to think of the 'microphone budget.' For microphone arrays, I think it's probably better overall (from the ecosystem management angle) to cover a wider non-overlapping region than getting a more comprehensive picture of a single point...

The quality of the single-source classifier is the obvious scientific bottleneck, though; improve it, and everything else will work better. (We've also got plenty* of existing training data for this case.) So that's where we've been focusing most of the energy.

* - depending on species, of course. See also: xeno-canto.org


Saw a talk a couple years ago about similar techniques being used to identify marine mammals. https://research.redhat.com/wp-content/uploads/2020/02/RRQ-V...


Entirely Off-topic, but I really like this way of quoting. It's surprisingly satisfying to see the quoted text bounded between a starting and an ending delimiter.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: