Repetitiveness and compressibility analysis in song lyrics

6stringmerc · on May 12, 2017

Great chart, unfortunate conclusion with some erroneous allusions. Bear with me here: Starting with ABBA, and fully blossomed in the form of Max Martin, Top 40 Pop has been dominated by Swedish composition techniques.

Taylor Swift, Britney Spears, Kelly Clarkson, NSYNC, Bieber, Katy Perry, Demi Lovato...Max Martin's fingerprints are all over the hits. He has a defined style as well. Balanced lines. It's brilliant. Thus, it's not about the performers if you want to study the composition - you have to go to the actual composer(s).

Just saying that this is a sound technique and approach but looking at the data set at the exclusion of pertinent considerations. Revised, it would make for an interesting story.

derefr · on May 12, 2017

You could actually use this analysis technique to fingerprint composers who've ghostwritten for bands, no? Bands pumping out music exactly as compressible are likely the same songwriter, whether they acknowledge that or not.

anigbrowl · on May 12, 2017

Composers' identities are rarely a secret, because there are separate royalty streams for performance and composition. OK, pop fans may lean towards the assumption that their idols' songs are always deep personal confessions but there isn't some grand conspiracy to conceal the facts of authorship from the public. Most people just don't think about it very much.

derefr · on May 12, 2017

Rarely a secret, yes. But it's not like there's a public database of composer:song mappings anywhere†, for you to easily feed your ML algorithm; you'd have to do a bunch of original research to build that dataset. If there's some statistical way to infer the data with nearly the same quality, so that you don't need to go to the effort of building the dataset yourself, that'd be nice.

† An assumption on my part—is there, in fact, a place where you can look up who gets each portion of the royalties for a given song?

anigbrowl · on May 12, 2017

I like to fact-check my assertions prior to making them, unless I'm being metaphysical. Not snark; it's just a huge time saver.

There are a bunch of such datasets, some commercial but accessible for a small fee (like ASCAP), others freely available. I personally like the Discogs one best but some of the commercial offerings are better curated.

https://en.wikipedia.org/wiki/List_of_online_music_databases

6stringmerc · on May 13, 2017

Wikipedia does a great job of listing credited song writers, from what I've seen. Just check out the detail in a Beyonce or Maroon 5 album. It's like a public CV.

Edit: I just looked up a Maroon 5 album, "Overexposed" and found a few credits to M. Martin. So I clicked on "Daylight." M. Martin. Max Martin.

gwern · on May 12, 2017

Looks like interesting work, but the main chart doesn't work for me in Firefox or Chromium - it seems to be yoked to your scroll position (why?!) so by the time you've scrolled down to the '2014' paragraph, which makes it chart the full time-series, you can't even see the graph in the first place... Data viz run amok.

danielsf · on May 12, 2017

I worked on this project. Can you share your screen size, device, and browser version?

gwern · on May 12, 2017

FF 53.0.2 (64-bit), Chromium Version 58.0.3029.96 Built on Ubuntu , running on Ubuntu 16.04 (64-bit); using a big Dell in portrait mode. Screenshots: https://imgur.com/a/vFhNc By the time I've scrolled down far enough to activate the animation, the graph has disappeared.

danielsf · on May 12, 2017

ah I think it's due to the high viewport height that we didn't account for (it's rare, but in your case, it broke the code). thanks!

jiaweihli · on May 12, 2017

Like @Asdfbla, I'm on 2560x1440 and have the same issue. The viewport is likely the issue =)

Asdfbla · on May 12, 2017

I have the same problem on Firefox a 2560x1440 monitor. (Still very nice presentation though.)

marzell · on May 12, 2017

Works for me on Chrome 58 and Firefox 53 on Windows 7. When it displays correctly, the way the animations tie to the text as you scroll is pretty nice.

geluso · on May 12, 2017

Where'd they get the lyric data for this analysis? In my experience this data in bulk is all incredibly locked down!

IgorPartola · on May 12, 2017

Next: lossy lyrics compression where using words that sound the same could yield higher ratios! I wonder how well that would work for Sting where nobody can understand anything anyways.

stuffedBelly · on May 12, 2017

Interesting analysis and great visual presentation. Would also be interested to see analysis on repetitiveness of intervals and rhythmic patterns used among popular songs. In many occasions people tend not to care about lyrics much in presence of addictive grooves/riffs, "Get Lucky" by Daft Punk being a good example.

woliveirajr · on May 12, 2017

I like the use of compression to find out about repetitions.

There is some theory out there called Kolmogorov Complexity [0]. It says that something is as complex as how much information you need to express it. In your case, lyrics are as complex as how many symbols (letters? words? bytes?) you need to represent it.

And one good way to calculate it is as you done: compress it. If you're using the same compression method for all the lyrics, you'll find that the ones that are more simple (and more repetitive) are the ones that have a great reduction on their sizes. In that case, the choice of which compression method you use is somehow irrelevant. Had you used Bzip, PPMD, etc., the results probably would be similar.

In case you want to extend your research, for example, as 6stringmerc said, you might consider that the composer matters more than the actual artist.

And, for that, you can use Normalized Compression Distance (NCD) [1]. That way you can measure how two lyrics are similar. Basicaly, you compress those lyrics together. When they are similar, clues from one are used by the compression to also compress the second one, so similar lyrics get more compression than lyrics that aren't related.

And by doing that you can even discover who was the composer of the songs, i.e., the authorship of the lyrics, since each person usually has the same writing style... [2]

[0] https://en.wikipedia.org/wiki/Kolmogorov_complexity

[1] https://en.wikipedia.org/wiki/Normalized_compression_distanc...

[2] https://link.springer.com/chapter/10.1007%2F978-3-642-34475-...

marzell · on May 12, 2017

The visualization animations as you scroll through this article are fantastic, and a great way to implement storytelling with data. Also, the content is pretty interesting too, I like the emphasis on aligning data metrics with intuition to really make the point.

twiss · on May 12, 2017

According to the main chart, songs in the top 10 are more likely to be repetitive, and that discrepancy has been growing. That raises the question, is there a causal link between being repetitive and reaching the top 10? If so, the answer to "Who's responsible for this madness?" is: the listeners.

marzell · on May 12, 2017

I think this is partly true. However, there is a significant phenomenon where there is financial/political pressure to play songs (pay to play) which then creates demand for songs since they are then familiar to listeners. So companies/firms can pay stations to play otherwise unremarkable music (perhaps simple songs or those with the highest profit margins/potential available to stakeholders) and in a general sense this influences listeners to want to keep hearing those now familiar tunes. The whole pop-ephemerality of music-as-a-commodity feels like the new opiate of the masses - in general people are more content as long as they have a meaningless tune as the background soundtrack for their day-to-day repetitive tasks.

I wonder what the correlation is between acceptance of repetitive music and the repetitiveness of a listener's daily tasks.

anigbrowl · on May 12, 2017

Well of course it's the listeners. The most popular product in any market is virtually guaranteed to be a lowest common denominator because that's what will have the broadest appeal.

rayuela · on May 12, 2017

What was used to make these visualizations? These are beautiful.

danielsf · on May 12, 2017

sn9 · on May 12, 2017

I would love to see the source code for how this post was created! (Or at least pointers for resources on how to create something similar.)

mynewtb · on May 12, 2017

Ctrl-u

sn9 · on May 13, 2017

Oh right. Thanks.

ashark · on May 12, 2017

Some data messiness in that final chart. Maroon 5 versus Maroon5, Surfin' U.S.A but also Surfin' U.s.a

cttet · on May 15, 2017

Music itself is about repetition, without repetition is just noise..

marzell · on May 12, 2017

Any idea how to do the downres transition at the top? It's really cool.