I'm wondering if anyone has done something similar, but instead of trying to find similarities in the raw audio, they use tags available from sources like Last.FM, Musicbrainz, Discogs etc? And the ultimate answer to that is probably "those sources kinda suck". Discogs is like a trainspotter on the spectrum, fascinated by release IDs. Musicbrainz is kinda similar (each song will have a dozen matches of wildly different quality). Last.FM tags are used-generated which make some of them amazingly useful, and others amazingly detrimental.
I have a human-powered recommendation service that uses my own tags that I've added to my mp3 library over 25 years. I add instruments (not all, just the ones that stands out, like synth, flute, distortion, violin, piano), vocals (male/female, falsetto, spoken, rap), moods (happy, sad, angry, mellow, dramatic, chillout) and genre (I don't go too deep here, because I hate getting recommendations stuck within some obscure sub-genre). And that's it. I get it to play a random highly rated track with a keyword or two, and then use the tags from the first 10 songs to generate the next. But since, for me, music is a somewhat interactive experience, every 10 songs or so, I'll think of something that I want on the list (maybe reminded of it by another one that just played).
Other things I think might be useful for recommendation is Last.FM histories. Think about it, the are hundreds of thousands of active listeners "scrobbling" their listening history. You could easily parse that and group songs together that have been played within 5 songs of each other as long as they're not by the same artist and the time between the songs is around zero (ie: listened to in order, no pauses). Similarity is higher for songs that were next to each other and score drops.
In fact, ListenBrainz (partner project to MusicBrainz) is doing some stuff similar to what you mention about listening histories. We're using the data to generate similarity based on when songs are listened to each other in "listening sessions" along with other songs.
I've tried with Discogs and found it to work pretty well. Kinda similar to what OP did just the "embedding" vectors was created by the Genre/Styles on Discogs. I didn't have a Vector database though, so it was kinda very slow. On Discogs those tags are per album and not per track. To create a playlist of say 10 songs similar to a song, I'd find the ten closest albums, then search for them on last.fm and pick the most popular track on each to add to the playlist.
A similar embeddings model based on Discogs genre/style data is the Effnet-Discogs model made at the Music Technology Group at Universitat Pompeu Fabra: https://replicate.com/mtg/effnet-discogs
Per-album metadata is useless for a lot of stuff that I like. It's even useless for a lot of The Beatles stuff because they tend to have a range of styles on an album and tended to bring in weird instruments on individual tracks.
Which is unfortunate because it has (on a tiny number of releases) instruments and vocal tags. It's just so unreliable. AllMusic is another decent source for tags, but not instruments. It's the age-old problem with ML/AI: data quality. Garbage in, garbage out. If only we could crowd-sourcev listeners and get them to tag music from a list of available moods, instruments etc. Oh wait .. that's exactly the feature that recommendations services have been removing for the last 10 years.
User tagging isn't a panacea either, because people tag inconsistently, and people who tag a lot are probably not very representative.
For an extreme example of that, see the boorus. Some machine learning people have become interested in those, since they are huge dataset of extensively tagged material ... or maybe it's the booru people who have become more interested in machine learning. Either way, I'm sure they're great, if you're into waifu anime, porn, or waifu anime porn. Both types, country AND western, as they said in the Blues Brothers movie. Any tag remotely subjective (such as "beautiful", God help you) is going to be extremely coloured by the tastes of an extreme fringe.
At least, relying on fanatics to do the work for them, I assume they've got a handle on simple spam on the boorus. Commercial recommender service tagging systems don't have that luxury, and that's probably why they end up eventually removing them.
This is very true. I'd pay for a metadata-only / playlist service that works with Spotify/Tidal/Apple/local music.
And don't allow free-text tags. Instead you give a list of available tags - the lowest number needed to describe most tracks. I mean, let people add their own if they want, but you should ignore those while training the model.
I actually think that instead of trying to tag some specific mood (eg "happy") some sounds be a sliding scale between two opposites:
happy<-------|--->sad
Instrument tags are easier to understand. Give a list of instruments (or instrument types, because the user might not know precisely which woodwind or percussion instrument it is) with checkboxes beside each.
Some users will be experts because they play woodwind. Let those users apply to become experts, pass a test, "identify the instrument", and if they pass, give them half price subscription as long as they moderate X tunes per week.
rateyourmusic.com is an alternative to allmusic.com while chosic.com focuses on finding music.
The later has generated more similar music for me so far. But I welcome every additional project improving the search for music which has been so neglected by most services.
I've been a paying subscriber to rateyourmusic for years because one extra feature you get is per-track ratings. Extremely useful if you're in discovery mode.
I have a human-powered recommendation service that uses my own tags that I've added to my mp3 library over 25 years. I add instruments (not all, just the ones that stands out, like synth, flute, distortion, violin, piano), vocals (male/female, falsetto, spoken, rap), moods (happy, sad, angry, mellow, dramatic, chillout) and genre (I don't go too deep here, because I hate getting recommendations stuck within some obscure sub-genre). And that's it. I get it to play a random highly rated track with a keyword or two, and then use the tags from the first 10 songs to generate the next. But since, for me, music is a somewhat interactive experience, every 10 songs or so, I'll think of something that I want on the list (maybe reminded of it by another one that just played).
Other things I think might be useful for recommendation is Last.FM histories. Think about it, the are hundreds of thousands of active listeners "scrobbling" their listening history. You could easily parse that and group songs together that have been played within 5 songs of each other as long as they're not by the same artist and the time between the songs is around zero (ie: listened to in order, no pauses). Similarity is higher for songs that were next to each other and score drops.