I had this idea nearly 10 years ago when my wife and I started dating and she introduced me to (very amateur) birding. I even talked about contacting Cornell for their database. But it was always going to be a hobby project.
So when she read about this a few weeks ago, she literally smacked me for not building it. Now if I tell her it's on the front page of Hacker News AND everyone here loves the idea even more, I'm going to get another, harder smack because she knows how HN is full of others like me!
(yes I'm aware that this community more than others will agree that ideas by themselves are a dime a dozen, but nonetheless, it would've been a really fun project)
I had this idea as well, but as you start to build the app you quickly realize that each bird doesn't just have one sound, but many sounds and trying to do this accurately takes much more effort than you're probably expecting.
Download any of the existing bird apps that help you recognize birds by their sound and you'll see that each bird often has 3-4 distinct sounds, each of them different, and these are just partial examples.
You'd have to be extremely dedicated (and very good at machine learning) to see this idea through to completion.
For the 400+ sounds that Merlin (the user-friendly, offline-functional sister app to BirdNET) can now identify with 80-90% accuracy, it took a team of dozens of bird sound ID experts several years to annotate tens of thousands of individual audio spectrograms. On average, they needed about 1,000 recordings per species). Here is a bit more about how they did it: https://www.macaulaylibrary.org/2021/06/22/behind-the-scenes...
True, but Shazam is a testament to what's achievable even without audio source separation. Even going back about 10 years when I first tried it, I was kind of stupefied that it managed to work in a noisy department store.
But in some sense Shazam has it very easy - it is working from a discrete set of specific audio recordings, not, say, the set of artists, or the set of songs. Shazam is useless for live performances, or covers (by different band) of songs (unless those specific recordings are also in Shazam's database). Birdnet is tackling the problem of trying to get some essential properties of one species' song (and each bird is a new live performance, by a different cover artist).
There used to be an an app called Midomi (I think?) that could identify songs by humming or you singing, which was cool, but then I vaguely remember it rebranding itself and being less useful. Does anyone else know of a song-recognition app that is more like BirdNet and less like Shazam?
Google assistant does this. You can hum a song, or sing and it will recognize it with decent accuracy. Though humming camp town races lead me to this gem... https://www.youtube.com/watch?v=QFFykQIrPCk
As I understand it there are even sub-dialects for each of these bird calls/songs. The example shared with me was compared to the English language in the U.S. where you would expect to hear stark differences in the southern states vs the New England area with a greeting like: "hey ya'll".
Unfortunately/fortunately I can't get the visual out of my head of a southern speaking crow looking for trash near my house now.....
Plus, so many birds will improvise their songs, pulling parts of other birds' songs into their songs. And many will even outright imitate other birds (not only parrots, but corvids and some others). Many amateur birders like me often cannot tell imitations from the real ones.
A couple of years ago, my wife had been practising a particular piece of music on the violin for some time in preparation for an exam; after a while, she noticed that the blackbird in our garden began using the opening few bars as part of its song.
I tested the app on one of our more interesting mimics (tooth billed bowerbird, endemic to the area) .. without great results. It is however very successful in identifying much of the morning cacophony here. This is for far north Queensland, Australia where we have a fairly decent array of resident songbirds.
This is actually a pretty difficult problem. I have been trying to learn bird songs for a very long time and it is pretty difficult. Saying that a bird has 3-4 bird sounds is an extreme simplification.
> I had this idea as well, but as you start to build the app you quickly realize that each bird doesn't just have one sound, but many sounds and trying to do this accurately takes much more effort than you're probably expecting.
Effort? Not really so much (stand on the shoulders of giants, etc.)
You can build the next generation bird app. This one requires uploading to the cloud to recognize bird calls.
What if you used the AI hardware in the phone to do the audio recognition?
For efficiency you could even use geolocation and figure out which species are found in a location and download a model just for those. Anything not matched could be uploaded as before.
I work at the Cornell Lab, and we also have an app that is more consumer-oriented, and which DOESN'T require uploading the recordings to the cloud -- see Merlin Bird ID app https://merlin.allaboutbirds.org/
When the Lab’s researchers conceived of BirdNET, there were no reliable bird sound identification tools. BirdNET was built as a rapid prototype, engaging computer science students to build an app for that users to test the machine learning algorithms. BirdNET proved to be a research breakthrough and by 2020 was performing with far better accuracy than five other apps tested.
That success opened the way to apply computer vision to sound identification in the Lab’s outreach and education app, Merlin.
Merlin offers OFFLINE functionality, and multiple ways to help identify birds, including through a user describing the bird, taking a photo of the bird, and now recording a bird song or call. Merlin Bird ID is integrated with the Lab’s systems and resources, including updated taxonomy, bird information from eBird and Birds of the World, rich media from the Macaulay Library, life list building tools integrated with eBird, and more.
Thanks for your work! I'm installing right now but won't be able to try it out for a while. How far away are we from distinguishing between different calls from the same species?
Some birds in my locale mostly repeat themselves, but some seem to have 'vocabularies' of 3-5 different calls, and you can hear pitch and timing inflections within those - might be just random variation in combination with different calls it might yield 30-60 'words'. Sometimes I've been sitting under a tree and heard what seemed to start out as a conversation that degenerated into an argument followed by a physical fight.
Even crows seem to have distinct patterns/variations in their cawing, and given what we know about their tool-using abilities I'm curious to know how they use their voices. I've seen remarkable behaviors like a group of crows harassing a falcon to interfere with its pursuit of a smaller songbird.
Oh right, turns out I had it downloaded but hadn’t used it (the bird pack was too big to download over data).
The benefit of using hardware accelerated ML built into the phone is that it’s much more lower power. It’s designed for continuous use cases (“Hey Alexa” or Hey Siri). So you don’t have to turn the recording on and off and miss the bird call.
I don’t know if continuous monitoring can be used by third party apps. But having it on all the time, with geolocation would be amazing. You could set alerts etc.
If you could use multiple phones to locate the bird in 3space that would be neat. Then you could tell people where to point their cameras. Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird. This is the future.
/Maybe a standalone IoT monitoring device could be placed in forests to count each and every bird./
This is actually the 'real' research motivation behind the bird classification work: Slap a microphone to the side of a tree, pick it up in a month, and get some accurate picture of what species have been in the area.
Birds are relatively easy to observe, thanks to their vocalizations, which makes them an indicator species. We have a good idea what many species eat, so they end up telling you quite a lot about the surrounding ecosystem.
However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.
We've been encouraging further work on the soundscape problem by hosting the BirdCLEF and Kaggle competitions, and have been seeing steady progress. Improvements in the 'hard' soundscape problem have been driving improvements in the 'consumer' identification algorithms.
/“However, it turns out that the 'soundscape problem' where the microphone is just attached to a tree is a bit more difficult than identifying foreground birds only, using a device that can be pointed in the relevant direction by the user.“/
Yes pointing a directional mic introduces a whole new set of mechanical challenges.
Maybe you could build an irregular grid of omnidirectional microphones and use signal processing to direct the beam digitally (similar to radio beam-forming). Now you’ll need more processing horsepower to do FFTs to do phase shifts. Although if you assume the bird calls only occupy discrete frequencies you might be able to save some computation by just computing those.
Perhaps a machine learning model could be trained that does all of this for you. Then you get the benefit of hardware acceleration. Some ML chips can handle DSP tasks.
Yeah, one has to think of the 'microphone budget.' For microphone arrays, I think it's probably better overall (from the ecosystem management angle) to cover a wider non-overlapping region than getting a more comprehensive picture of a single point...
The quality of the single-source classifier is the obvious scientific bottleneck, though; improve it, and everything else will work better. (We've also got plenty* of existing training data for this case.) So that's where we've been focusing most of the energy.
* - depending on species, of course. See also: xeno-canto.org
Entirely Off-topic, but I really like this way of quoting. It's surprisingly satisfying to see the quoted text bounded between a starting and an ending delimiter.
Love the app, thanks for your work. But can you comment on why the Facebook SDK is part of it? (I've seen a request in a proxy on the sign-in screen to graph.facebook.com.)
Thank you for all your work on Merlin. I heavily use that app when I go backpacking, and in fact just last night it helped me identify a red crossbill in Deschutes National Forest.
It would be amazing if BirdNet eventually supported offline.
The off-line version of this (Merlin) worked great in the Boundary Waters in June. Detected visually-verified grouse, vireos, woodpeckers, trumpter swan and bald eagles on Nina Moose and Agnes lakes.
I asked a veterinarian friend who specializes in birds (and is a long time bird watcher), and she knew the app and said that it works pretty well (coming from her, that's definitely high praise)
The only thing she'd improve was exactly what you mentioned: offline recognition! (and also keeping the bird sound recordings to export them later)
How about doing the reverse? Listening to birds from different parts of the world on-demand?
Listening to bird sounds is now recognized to have positive impact on mental health[1], So how about selecting a particular region on the earth and listing to high quality bird sounds? There some good YT playlists[2] but a separate service could be more functional, Tie up with bird zoos to do it live, share a piece of revenue for conservation and you'll have my subscription.
My mom is a master naturalist and has listened to (hours of) frog field recordings to determine which types of frogs are at specific locations for our department of natural resources (IIRC). There is a paper that describes how to calculate minimum adult population from audial surveys. If you still need to scratch that itch, I'm sure there are still some interesting applications along these lines.
I also met a fellow student who was working on this too years ago. Not sure how far e got. to be honest I think he was just to reluctant to graduate and get a real job so kept finding ways to stay at uni.
That's also what I would say "there's no money to be made here – it would be a fun hobby project and I could learn some machine learning but that's about it"
This is exactly the kind of thing that keeps me coming back to HN year after year. 99% may be a quick, mildly entertaining read, but that 1% tends to be empowering or life changing for me. I've had a continuously growing interest in plant and bird identification as a hobby (animals are a bit easier). I've gone so far as to research apps, put Audobon society books on my wishlist, and try to look up some specimens I see in my area. Unfortunately it's, frankly, a steep learning curve and not a habit yet for me to take pictures, remember to look at them later, search their characteristics, etc. This will be the perfect tool to help me jumpstart my newfound interest and get more familiar with the flora and fauna around my home.
I know it's weird to mention when you are talking about all these fancy tools, but I was surprised how often I get decent results when simply taking a photo and using Yandex image-search. First time I did it I didn't even expect anything but random pictures of grass, but it's actually good and it is even sort of helpful that I can see similar, but different plants in the same search (which is how I found out that apparently I was mistaking for its close relative a plant that I'm used to putting into my tea from the very childhood: I just called it what my grand-dad thought it was). For well-known decorative plants it often straight up shows the exact plant variety I'm looking for.
iNaturalist is another good option for plant identification (and other forms of life as well). It has decent machine learning for suggesting IDs and is also backed by its community providing IDs. I've been using it a lot this year and have found it pretty helpful for IDs (though some regions and life forms are more likely to have good identifications than others), as well as for showing me what people are seeing in the area, and feeling like I'm contributing useful data.
You can find some interesting starting points at kaggle.com if you want e.g. a large set of photos of agricultural plants labeled as healthy or diseased (allowing you to immediately start in on building a classification model without all the upfront grunt work).
The original version [1] is Theano based, but the newest one is TF-Lite based [2], probably for supporting mobile.
Unfortunately they don't publish the code of TF version and only a TF-Lite model is available. Probably that doesn't matter for the exports though since the paper and original version are both there.
More interesting thing is that they've been making the dataset available [3] for $20 (even before BirdNet). This can be great source for training your own bird-net like.
BirdNet is great and has helped me to identify some birdsong. If you are also interested in identifying plants, PlantNet is good too. https://identify.plantnet.org/
Any comparison of PlantNet to iNaturalist[0] (regarding quality of the product / size of the community)? I use iNat frequently for identifying native plants and animals around the yard and it's been extremely helpful and active (I typically get at least one verification on each item I post, oftentimes two or three).
The local app here for native plants (Obsidentify) beats PlantNet for me, but I have the impression there might be some 'user experience' into play: I know enough about plants to know what the distinctive features are and I know enough about the app/AI to know that it wants properly cropped pictures with those features, and other users whos observations have the most chance of being validated and accepted do so as well. So the thing is probably very well trained for that, which is less the case for PlantNet. Again, that is just a theory, but when talking with other people the story is similar: the people saying it doesn't work for them are typically uploading non-cropped and/or non-identifying pictures.
Try talking to BirdNet, it'll tell you species = homo sapiens. Most plant recognition apps will stil try to match a plant when feeding it pictures of humans.
That could be one aspect, but I was actually talking about taking pictures of complete persons. Usually turns out you're a beetle, or a worm, or a butterfly :)
These new ML image identifiers are neat and they're sure to become the default method, but I feel like a "20 questions" style of tool could have worked very well for a long time with little technology required.
Just ask a series of questions:
- Is it a plant or animal?
- How large is it?
- Where are you (location permission)
- Does it have green leaves?
- Does it have woody stems?
- Does it have whorled or alternate leaves? (Show images)
- Does it have yellow flowers?
This is how a lot of field guide books work, but they require a lot of flipping back and forth, scanning tables of contents, and memorizing terminology. An app could just be tap-tap-tap-tap-tap, drilling down very quickly and showing images at every step.
The "only" difficulty is getting the database of attributes - maybe it already exists. Maybe an app like this already exists and I haven't been able to find it?
Having used such books, I don't think it is that simple. I mean you could make an app just like the books but it might not add any functionality and as such not be usable for the wide public.
Thing is, the series your questions you show are easy to answer but they don't get you any further than halfway. Problem is in the questions you do not show: there it starts going into details, terminolgy starts to matter (seriously, if you've never read those words it's like a foreign language) and differences become hard to spot. Part of this could perhaps be alleviated with pictures but I doubt it; I've seen websites attempt it but none were really good and they all did a subset of plants. Probably because it's quite the amount of work to do them all, even for a region.
Yep! I just left a comment about having this idea 10 years ago and my wife being annoyed that I didn't build it. But hard to explain "I would've literally received a PhD working on this" when my interest was mostly on the hobby side.
I am very proud to say that I was successfully able to get "Northern Cardinal: Cardinalis cardinalis - Almost certain" by whistle-copying the cardinal that lives in my backyard.
I've been able to spot over 50 types of birds (including Great Hornbill, Rufous Necked Hornbill, Greater Flameback) in Goa this monsoon with the help of this app (it discovered over 120 unique species, but they were hard to spot), even though it was not made for South Asia
Apart from the usefulness of the app, interface and usability is great too
I have the app and do like it, but my phone's camera quality doesn't seem good enough for this to work properly (iPhone 6). I have to get pretty lucky and be able to be super close to the bird so that very little zooming or image enlargement is needed.
BirdNet has been around for several years now, while Merlin just got sound-id recently. Merlin has coverage for about 450 birds of North America, while BirdNet can id around 1000 birds of N Am. and Europe.
With BirdNet you make a recording, highlight the interesting section of the sonogram, and upload that section to the BirdNet servers. With Merlin you start recording and the software ids birds in real time, popping up species as it goes.
My assumption is that, because it runs locally on the device, Merlin is going to be less accurate than whatever BirdNet is able to do on its beefy servers. But it is has the advantage of working without a data connection. Merlin can also id from photos and descriptions.
So the one isn't a replacement for the other. It's great to have options.
The apps run two different models, and mobile phones are plenty powerful now to run some pretty complicated algorithms, so the offline accuracy of Merlin is pretty good.
BirdNET and Merlin Sound ID are such different use cases (real-time classification of a rolling window of audio vs classification of a user selection) that we don't have any comparison metrics between the two, and they are really geared for different purposes (BirdNET’s goals align more with research in bioacoustics, while Merlin’s goals align more with outreach and education.)
God I love this thing. Now I know all the birds in my yard by sight and sound and which SOB is the one that starts the racket 1/2 hour before sunrise (looking at you Catbird)
I love it as well. It's become part of my morning coffee ritual where I listen to the birds and learn what's out there.
What I've noticed is, rather than focus on endless scrolling through Reddit etc., I'm actually very present in listening to the calls, the nuances of them, and getting familiar with the pulse of the nature around me.
This app simply gamifies that experience and let's me play audio pokemon while at the same time tuning out the rest of the world.
I have been using this app and it's awesome. The features are pretty simple but complete. Looking at the list of identified birds makes me think I'm just a two-legged ape living in a bird community, and I like that.
I am also convinced cardinals and blue jays get along great with each other but are "not like other birds" types.
using it for months now and it really makes one more aware of the songs of bird around. hope is properly funded and gains more popularity in the future. great app!
I have used it, and it's VERY good. It also shows you the birds you can expect in your area and the recent sightings nearby. It's a great introduction to a layer of life I didn't pay attention to before.
I love this app. The only thing I find a bit problematic is that if you have a match and fetch more info on the bird (Wikipedia / Macaulay Library / eBird), that none of these fragments have a "Share"-Button.
I use that "Share"-button a lot (it usually looks like a "<") in order to save interesting stuff I later want to look at on my PC.
There is enough free space on the top right for such a button (I think it's called the ActionBar?)
Those who do this the analogue way, often distinguish between bird song and "bird language." The former focuses on identifying species. The latter focuses on understanding the information birds convey to one another. Since a lot of it relates to predators (watch out, a fox!), this might be augmentable to determine the presence of silent animals too.
I'd be interested in something like this to identify "dialects" in certain species of local birds. The bellbird/korimako[1] is well known for having distinct shared motifs within a given area.
The dialectal song differences are readily noticeable, but mapping the dialect boundaries between populations would be really interesting.
How hard would it be for an AI to identify specific information conveyed by a call? Perhaps the various alerts that birds provide local fauna would be helpful to humans too?
My intuition is that it would be no different than human speech recognition. You'd need sufficient data, but the principle is the same. There arn't that many "words."
Humans understanding bird calls isn't new. We've probably forgotten more than we know. That's not unique to us either. Different species often recognize each others' calls, particularly danger calls.
I tried writing an program to do this ~10 years ago (I called it "Tweeter") using a system like Shazam. The problem is, Shazam can detect specific songs based on about a dozen different analysis points (found another use for poles and zeroes). But if the song is sped up or muffled, it differs too much from the source recording. Birdsongs are far more variable, even within the same species.
Ironically, at the time I discovered Cornell was doing this same thing, but it looks like they finally got to a product by throwing an ML classifier at it. Very cool, ML is perfectly suited for this.
The interesting thing I learned during my study was that there is an entire system of describing bird sounds with nonsense words ("skee-dlees chis chis chis") that goes back over a 100 years.
In the Anglo-Saxon world there’s always another system of describing things.
The official system used Latin and Romantic language words (frequency modulation, intercourse, feces). It’s the jargon found in textbooks and research papers and taught in universities.
The underground system uses Anglo Saxon words and is used by lab techs and people in the field (figuratively). Examples would be (tune, warbling, fuck, shit).
One of these languages is considered respectable. The other is vulgar and suppressed to the point of cultural genocide.
This is the way. It has been this way since feudal times.
I haven’t tried using this particular one. It has its work cut out for it. Bird calls are difficult. A mockingbird or catbird can sound exactly like a sparrow or finch.
I remember, in the 1990s, when everyone had Nokia bricks, that mockingbirds would sometimes copy the ringtones.
The mockingbirds and catbirds I’ve heard are easy to identify, not from a particular sound but from how they string sounds together. It seems to me they copy notes, but not songs.
Yep, exactly. Around here we have a bunch of mockingbirds and blue jays, and you'll here the mockingbirds adopt some of the harsher sounds of the blue jays, but they usually surround them with melodic trills. It makes for a pretty dynamic contrast - reminds me of a Jazz solo.
Every app listed here, and the OP BirdNet, are so ineffectual. They have no idea what they are listening to (unless location is on) and they cannot discern a genuine tweet from a washing machine squeak. The posters saying the apps are verging on incredible are completely deluded. I have used these apps, all of them, across Europe and North America, and they are inept and fail at a rate of 100%.
Who are you people?
And the top remark from the person that 'had this idea 10 years ago'... is this a joke?
I whistled to the app and it identified the sound as human. I haven't gotten a chance to try it with an actual bird yet.
Location (as well as date, time, light, air pressure...) is certainly a factor that should be used when classifying something though.
I think the problem is probably related to data collection / data integrity. At least in my case, I'm using an iPhone that has variously placed microphones and speakers. It isn't clear to me how to hold the tool to collect the sound. If it gets wet or grungy, it's not something that is user-serviceable. I don't have an easy way to tell if it even needs to be serviced. If you're relying on the microphone that follows you literally everywhere to remain clean & untarnished - maybe that's the problem?
My phone rides in my pocket, it picks up dust, it gets moist, it gets dropped, etc. Perhaps I'm projecting. Maybe your phone is untarnished and is working fine? Do you have data to back up the 100% fail rate across Europe and North America? It seems you might be projecting as well?
I work with audio, sometimes bird sounds (for interactive encyclopaedias), playing the driest, clearest, sound into the app is often right, but not always. You make a good point about the microphones. Sweeping through a sinewave can broadly tell you something about what the phone mic 'hears'. Most city birdsong ambiences are very complex soundscapes. There's some important characteristics in the higher frequencies of birdsong that will not be 'heard'.
GPS helps to narrow down the the potential bird choices. So over time apps will learn and become more accurate. The open source Spleeter technology, unmixing stereo music, and its ilk, may be helpful run in real time allowing the app to ignore sound that is not birdsong.
FYI for anyone reading at Cornell - the donation page fails with "Fund Code This field requires a value." I wanted to make a contribution because a few months ago I was using the iSeek app[0] while hiking and I thought, "You know, with enough recordings of birds, it should be possible to build a classifier to identify birds based on their bird songs. Someone should make an app for that." And here it is!
Is anyone else weirded out by the fact that the Android app is published from something that that looks like a personal account, as opposed to some kind of organization? Not sure how publishing for Android works - is it possible to hand off an app to a different account when you're no longer affiliated with the organization?
We are lucky to have a number of birds types in our area - this app has been amazing for identifying who's who :) I especially like that when a match is made a simple link to Wikipedia is given. As a result I now know far more about bird migratory patterns than this time last year!
Thanks to everyone who works on it. We've used the app relentlessly for a couple of years in the UK and when you show it to people they are amazed. People thank us for it and all we did was share it with them. Great work!
I always wanted a Shazam for the outdoors ... identifies all the sounds your hearing while in a forest, by a lake, etc. Also, shows an overlay in your camera (AR) that guesses where/how far each creature is from you.
I've found this to be a lovely app. It doesn't seem to have all the birds I'd like, but the way it organizes information and the general smoothness is exemplary.
Thanks for the great app, been using it for years!
Just this week I was in a place without cell coverage and wanting to use it. I'll have to try out Merlin.
iOS app "Requires iOS 13 or later" :-/ . Too bad I can't use this on my still functional-and-receiving-security-upgrades iPhone 6. I suspect an overlap between the population enjoying birds and the population not buying a new phone every year, so I hope they can reconsider and broaden support.
I contacted the authors at the email on the page, asking them to support iOS 12. Maybe email them too if you're also impacted.
Some bird by me always sings the first note of the Ed Edd and Eddy theme, making me start playing it in my head. Maybe I can figure out what bird it is.
A lot of TV shows and movies use the red tailed hawk sounds for when a bald eagle is on screen. The eagle sounds are much cooler but it’s basically like the Wilhelm scream at this point.
So when she read about this a few weeks ago, she literally smacked me for not building it. Now if I tell her it's on the front page of Hacker News AND everyone here loves the idea even more, I'm going to get another, harder smack because she knows how HN is full of others like me!
(yes I'm aware that this community more than others will agree that ideas by themselves are a dime a dozen, but nonetheless, it would've been a really fun project)