Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can build the next generation bird app. This one requires uploading to the cloud to recognize bird calls.

What if you used the AI hardware in the phone to do the audio recognition?

For efficiency you could even use geolocation and figure out which species are found in a location and download a model just for those. Anything not matched could be uploaded as before.



I work at the Cornell Lab, and we also have an app that is more consumer-oriented, and which DOESN'T require uploading the recordings to the cloud -- see Merlin Bird ID app https://merlin.allaboutbirds.org/

When the Lab’s researchers conceived of BirdNET, there were no reliable bird sound identification tools. BirdNET was built as a rapid prototype, engaging computer science students to build an app for that users to test the machine learning algorithms. BirdNET proved to be a research breakthrough and by 2020 was performing with far better accuracy than five other apps tested.

That success opened the way to apply computer vision to sound identification in the Lab’s outreach and education app, Merlin.

Merlin offers OFFLINE functionality, and multiple ways to help identify birds, including through a user describing the bird, taking a photo of the bird, and now recording a bird song or call. Merlin Bird ID is integrated with the Lab’s systems and resources, including updated taxonomy, bird information from eBird and Birds of the World, rich media from the Macaulay Library, life list building tools integrated with eBird, and more.


Thanks for your work! I'm installing right now but won't be able to try it out for a while. How far away are we from distinguishing between different calls from the same species?

Some birds in my locale mostly repeat themselves, but some seem to have 'vocabularies' of 3-5 different calls, and you can hear pitch and timing inflections within those - might be just random variation in combination with different calls it might yield 30-60 'words'. Sometimes I've been sitting under a tree and heard what seemed to start out as a conversation that degenerated into an argument followed by a physical fight.

Even crows seem to have distinct patterns/variations in their cawing, and given what we know about their tool-using abilities I'm curious to know how they use their voices. I've seen remarkable behaviors like a group of crows harassing a falcon to interfere with its pursuit of a smaller songbird.


Oh right, turns out I had it downloaded but hadn’t used it (the bird pack was too big to download over data).

The benefit of using hardware accelerated ML built into the phone is that it’s much more lower power. It’s designed for continuous use cases (“Hey Alexa” or Hey Siri). So you don’t have to turn the recording on and off and miss the bird call.

I don’t know if continuous monitoring can be used by third party apps. But having it on all the time, with geolocation would be amazing. You could set alerts etc.

If you could use multiple phones to locate the bird in 3space that would be neat. Then you could tell people where to point their cameras. Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird. This is the future.


/Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird./

This is actually the 'real' research motivation behind the bird classification work: Slap a microphone to the side of a tree, pick it up in a month, and get some accurate picture of what species have been in the area.

Birds are relatively easy to observe, thanks to their vocalizations, which makes them an indicator species. We have a good idea what many species eat, so they end up telling you quite a lot about the surrounding ecosystem.

However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.

We've been encouraging further work on the soundscape problem by hosting the BirdCLEF and Kaggle competitions, and have been seeing steady progress. Improvements in the 'hard' soundscape problem have been driving improvements in the 'consumer' identification algorithms.

https://www.kaggle.com/c/birdclef-2021/overview

[source: I've been working with the BirdNet folks on and off for the last few years, and co-host the Kaggle competitions.]


/“However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.“/

Yes pointing a directional mic introduces a whole new set of mechanical challenges.

Maybe you could build an irregular grid of omnidirectional microphones and use signal processing to direct the beam digitally (similar to radio beam-forming). Now you’ll need more processing horsepower to do FFTs to do phase shifts. Although if you assume the bird calls only occupy discrete frequencies you might be able to save some computation by just computing those.

Perhaps a machine learning model could be trained that does all of this for you. Then you get the benefit of hardware acceleration. Some ML chips can handle DSP tasks.


Yeah, one has to think of the 'microphone budget.' For microphone arrays, I think it's probably better overall (from the ecosystem management angle) to cover a wider non-overlapping region than getting a more comprehensive picture of a single point...

The quality of the single-source classifier is the obvious scientific bottleneck, though; improve it, and everything else will work better. (We've also got plenty* of existing training data for this case.) So that's where we've been focusing most of the energy.

* - depending on species, of course. See also: xeno-canto.org


Saw a talk a couple years ago about similar techniques being used to identify marine mammals. https://research.redhat.com/wp-content/uploads/2020/02/RRQ-V...


Entirely Off-topic, but I really like this way of quoting. It's surprisingly satisfying to see the quoted text bounded between a starting and an ending delimiter.


Love the app, thanks for your work. But can you comment on why the Facebook SDK is part of it? (I've seen a request in a proxy on the sign-in screen to graph.facebook.com.)


Thank you for all your work on Merlin. I heavily use that app when I go backpacking, and in fact just last night it helped me identify a red crossbill in Deschutes National Forest.

It would be amazing if BirdNet eventually supported offline.


The off-line version of this (Merlin) worked great in the Boundary Waters in June. Detected visually-verified grouse, vireos, woodpeckers, trumpter swan and bald eagles on Nina Moose and Agnes lakes.


Any chance of making the app available via F-Droid or some non siloed vector


For ID by song, does Merlin sacrifice any accuracy by being offline?


I asked a veterinarian friend who specializes in birds (and is a long time bird watcher), and she knew the app and said that it works pretty well (coming from her, that's definitely high praise)

The only thing she'd improve was exactly what you mentioned: offline recognition! (and also keeping the bird sound recordings to export them later)


see above thread for info on Merlin with offline sound ID capabilities...


Yep, that's great news. Thanks!


Oooh you can maybe geolocate users who don't give you location permissions by listening to the birds around you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: