Great chart, unfortunate conclusion with some erroneous allusions. Bear with me here: Starting with ABBA, and fully blossomed in the form of Max Martin, Top 40 Pop has been dominated by Swedish composition techniques.
Taylor Swift, Britney Spears, Kelly Clarkson, NSYNC, Bieber, Katy Perry, Demi Lovato...Max Martin's fingerprints are all over the hits. He has a defined style as well. Balanced lines. It's brilliant. Thus, it's not about the performers if you want to study the composition - you have to go to the actual composer(s).
Just saying that this is a sound technique and approach but looking at the data set at the exclusion of pertinent considerations. Revised, it would make for an interesting story.
You could actually use this analysis technique to fingerprint composers who've ghostwritten for bands, no? Bands pumping out music exactly as compressible are likely the same songwriter, whether they acknowledge that or not.
Composers' identities are rarely a secret, because there are separate royalty streams for performance and composition. OK, pop fans may lean towards the assumption that their idols' songs are always deep personal confessions but there isn't some grand conspiracy to conceal the facts of authorship from the public. Most people just don't think about it very much.
Rarely a secret, yes. But it's not like there's a public database of composer:song mappings anywhere†, for you to easily feed your ML algorithm; you'd have to do a bunch of original research to build that dataset. If there's some statistical way to infer the data with nearly the same quality, so that you don't need to go to the effort of building the dataset yourself, that'd be nice.
† An assumption on my part—is there, in fact, a place where you can look up who gets each portion of the royalties for a given song?
I like to fact-check my assertions prior to making them, unless I'm being metaphysical. Not snark; it's just a huge time saver.
There are a bunch of such datasets, some commercial but accessible for a small fee (like ASCAP), others freely available. I personally like the Discogs one best but some of the commercial offerings are better curated.
Wikipedia does a great job of listing credited song writers, from what I've seen. Just check out the detail in a Beyonce or Maroon 5 album. It's like a public CV.
Edit: I just looked up a Maroon 5 album, "Overexposed" and found a few credits to M. Martin. So I clicked on "Daylight." M. Martin. Max Martin.
Looks like interesting work, but the main chart doesn't work for me in Firefox or Chromium - it seems to be yoked to your scroll position (why?!) so by the time you've scrolled down to the '2014' paragraph, which makes it chart the full time-series, you can't even see the graph in the first place... Data viz run amok.
FF 53.0.2 (64-bit), Chromium Version 58.0.3029.96 Built on Ubuntu , running on Ubuntu 16.04 (64-bit); using a big Dell in portrait mode. Screenshots: https://imgur.com/a/vFhNc By the time I've scrolled down far enough to activate the animation, the graph has disappeared.
Works for me on Chrome 58 and Firefox 53 on Windows 7. When it displays correctly, the way the animations tie to the text as you scroll is pretty nice.
Next: lossy lyrics compression where using words that sound the same could yield higher ratios! I wonder how well that would work for Sting where nobody can understand anything anyways.
Interesting analysis and great visual presentation. Would also be interested to see analysis on repetitiveness of intervals and rhythmic patterns used among popular songs. In many occasions people tend not to care about lyrics much in presence of addictive grooves/riffs, "Get Lucky" by Daft Punk being a good example.
I like the use of compression to find out about repetitions.
There is some theory out there called Kolmogorov Complexity [0]. It says that something is as complex as how much information you need to express it. In your case, lyrics are as complex as how many symbols (letters? words? bytes?) you need to represent it.
And one good way to calculate it is as you done: compress it. If you're using the same compression method for all the lyrics, you'll find that the ones that are more simple (and more repetitive) are the ones that have a great reduction on their sizes. In that case, the choice of which compression method you use is somehow irrelevant. Had you used Bzip, PPMD, etc., the results probably would be similar.
In case you want to extend your research, for example, as 6stringmerc said, you might consider that the composer matters more than the actual artist.
And, for that, you can use Normalized Compression Distance (NCD) [1]. That way you can measure how two lyrics are similar. Basicaly, you compress those lyrics together. When they are similar, clues from one are used by the compression to also compress the second one, so similar lyrics get more compression than lyrics that aren't related.
And by doing that you can even discover who was the composer of the songs, i.e., the authorship of the lyrics, since each person usually has the same writing style... [2]
The visualization animations as you scroll through this article are fantastic, and a great way to implement storytelling with data. Also, the content is pretty interesting too, I like the emphasis on aligning data metrics with intuition to really make the point.
According to the main chart, songs in the top 10 are more likely to be repetitive, and that discrepancy has been growing. That raises the question, is there a causal link between being repetitive and reaching the top 10? If so, the answer to "Who's responsible for this madness?" is: the listeners.
I think this is partly true. However, there is a significant phenomenon where there is financial/political pressure to play songs (pay to play) which then creates demand for songs since they are then familiar to listeners. So companies/firms can pay stations to play otherwise unremarkable music (perhaps simple songs or those with the highest profit margins/potential available to stakeholders) and in a general sense this influences listeners to want to keep hearing those now familiar tunes. The whole pop-ephemerality of music-as-a-commodity feels like the new opiate of the masses - in general people are more content as long as they have a meaningless tune as the background soundtrack for their day-to-day repetitive tasks.
I wonder what the correlation is between acceptance of repetitive music and the repetitiveness of a listener's daily tasks.
Well of course it's the listeners. The most popular product in any market is virtually guaranteed to be a lowest common denominator because that's what will have the broadest appeal.
Taylor Swift, Britney Spears, Kelly Clarkson, NSYNC, Bieber, Katy Perry, Demi Lovato...Max Martin's fingerprints are all over the hits. He has a defined style as well. Balanced lines. It's brilliant. Thus, it's not about the performers if you want to study the composition - you have to go to the actual composer(s).
Just saying that this is a sound technique and approach but looking at the data set at the exclusion of pertinent considerations. Revised, it would make for an interesting story.