Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO, this is a great example for how the policy of “owning your own data” actually leads to objectively “better” Engineering solutions.

If Spotify leveraged my phone to calculate these statistics of my listening history (owned and stored locally), this article would have been written about an app update.

No need for a massive ad-hoc job with high-bandwidth round trips, just a simple app update.

It’s funny to imagine how engineers of the future might look back on our pride over this kind of computing similar to how we look back in horror on how wasteful we once were with mining oil back in the 1910s, etc.



> If Spotify leveraged my phone to calculate these statistics of my listening history (owned and stored locally), this article would have been written about an app update.

Then the article would be about the challenges of battery life on users' phones, and trying to coordinate listening history on PC vs. phone.


To be clear, I’m not a data ownership nut, I just find the problem space interesting and underrated. Apologies for the hyperbole in the last paragraph, it was more tongue in cheek than serious.

The article on coordinating and compressing listening history (the particular challenges of distributed schema evolution at the “edge”), would have been a much more interesting article to read, IMO.

Also, I know you probably weren’t very serious about it, but I don’t think that a few SQL queries against “thousands of data points” (temporal rows, reading between the lines) would be a significant battery life drain! It would have still been interesting to see that benchmarked. But “big data” is cooler, I guess. :)


Fwiw you can clock a few hundred listens a day for 30,000 a year or 300,000 over a decade, which is approaching non trivial levels for a phone, especially if you're doing anything more than an index scan.


Oh for sure. I was just going off the article’s own phrasing, which I agree sounds strange (seems too small). But if you think about it, very few people probably listen to 30k different songs on Spotify in a single year, so maybe it does make sense.

Of course this all depends on the level of detail they want to store, it could be a uuid, a tstzrange, and some Booleans about whether the song was liked, downloaded, etc.

Every year (or once you reach some storage threshold) you could “compress” this information by aggregating rows by song, and throwing away precision on the time stamps, until you’re just left with a uuid, full/partial play counters, and dates that the song was liked/unliked, downloaded/removed, etc. You could give users the option to modulate the level of detail in the records, to trade off storage constraints against recommendation UX.

It’s a set of constraints that differs greatly from a huge ETL job, but my point is that this kind of edge work leads in interesting directions, too :)




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: