IMO, this is a great example for how the policy of “owning your own data” actual...

joshuamorton · on Feb 19, 2020

> If Spotify leveraged my phone to calculate these statistics of my listening history (owned and stored locally), this article would have been written about an app update.

Then the article would be about the challenges of battery life on users' phones, and trying to coordinate listening history on PC vs. phone.

gen220 · on Feb 19, 2020

To be clear, I’m not a data ownership nut, I just find the problem space interesting and underrated. Apologies for the hyperbole in the last paragraph, it was more tongue in cheek than serious.

The article on coordinating and compressing listening history (the particular challenges of distributed schema evolution at the “edge”), would have been a much more interesting article to read, IMO.

Also, I know you probably weren’t very serious about it, but I don’t think that a few SQL queries against “thousands of data points” (temporal rows, reading between the lines) would be a significant battery life drain! It would have still been interesting to see that benchmarked. But “big data” is cooler, I guess. :)

foota · on Feb 19, 2020

Fwiw you can clock a few hundred listens a day for 30,000 a year or 300,000 over a decade, which is approaching non trivial levels for a phone, especially if you're doing anything more than an index scan.

gen220 · on Feb 19, 2020

Oh for sure. I was just going off the article’s own phrasing, which I agree sounds strange (seems too small). But if you think about it, very few people probably listen to 30k different songs on Spotify in a single year, so maybe it does make sense.

Of course this all depends on the level of detail they want to store, it could be a uuid, a tstzrange, and some Booleans about whether the song was liked, downloaded, etc.

Every year (or once you reach some storage threshold) you could “compress” this information by aggregating rows by song, and throwing away precision on the time stamps, until you’re just left with a uuid, full/partial play counters, and dates that the song was liked/unliked, downloaded/removed, etc. You could give users the option to modulate the level of detail in the records, to trade off storage constraints against recommendation UX.

It’s a set of constraints that differs greatly from a huge ETL job, but my point is that this kind of edge work leads in interesting directions, too :)