The unresolved foreign keys are indeed unfortunate, I wondered about these myself when I got my takeouts in the past. I explained them to myself as something that is not actually available in the same datastore to query or join, but maybe a constant or some other system that does not include personal data. Still not nice of course.
I think the wait time and many download buttons were discussed extensively in other comments here. With cold storage as explanation for the duraiton, and just no legal need to make the takeout _convenient_, those also have a pretty good explanation I would say.
Yup. I agree. The wait time doesn't make sense. They should be able to spin up extra servers from the spot market in seconds. Even if they're using Glacier, that should only be a few hours.
I wonder if they execute the 74 data queries in serial to drag it out.
And the multiple downloads is just bogus.
That being said, I agree with the general point that the article is a bit overly dramatic. Amazon does a pretty good job with the request. It just takes too long.
I helped build a system for privacy compliance at a large non-faang tech company. Honestly 19 days seems crazy but this is what we dealt with:
It’s 2018 and you have to bolt this mass export/delete on every stateful service in your company. Many of these are “critical” services that are not actively worked on and have a very limited maintenance budget. That is, some team with a lot of existing responsibilities absorbed it along the way and they have no bandwidth for it.
So in some cases their mechanisms for retrieval/deletion were pretty egregious and so we agreed on a rate limit and we would queue these requests up and handle all of the paperwork. You get 30 days to comply and if you need another 30 all you have to do is send an update within the first 30.
So, quite possibly, they have a rate limit and a queue on at least a handful of backend services and it truly truly does not matter as long as the queue is under 60 days.
I've worked at an organization with a similar timeframe for some types of data requests (B2B, not GDPR-style ones). There were many parts of the organization which were mismanaged, but that wasn't one of them. That type of data request ("get all my data") involved walking through all the data we had. It wasn't indexed in a way which made it easy to grab.
This was an expensive batched job we ran monthly. We spun up a cluster of cloud machines. A map-reduce style operation would organize the data by customer. We'd ship it off to all the customers who requested it that month.
Adding appropriate indexes or similar would have been man-years of engineering work. This involved, for example, walking through server logs line-by-line and seeing which ones were associated with which customer.
There wasn't a compelling business case to do that. For normal operations, once a month was fine. If a customer had a particular need,, we could hypothetically do a one-off request out-of-line, but customers used the data for types of analytics where a one-month delay wasn't an issue.
I know of other pipelines with similar delays, for example, due to lack of automation. A person runs a task once a month, and automation would cost more than a person.
I won't chalk this up to dark patterns, so much as speeding things up having zero business value to Amazon. I just walked through the process, and at least the first two steps seemed very normal. Amazon sometimes does outrageous things, but here, I saw nothing to get outraged about.
I wouldn't be entirely surprised if there was a human involved in gathering some of the data. If requests for data are rare enough, it might be more economical to pay someone in a customer support farm to collect some data than to pay for developing and maintaining an automated process. At least in the short term. Otoh, not automating something like this seems out of character for Amazon.
Wait time could be explained by some data requiring manual work. Maybe there is an offline hard disc out there that is labeled "Users A-D - 2003 April"
What I thought were valid points from the article:
- Unclear data: "cryptic strings of numbers like '26,444,740,832,600,000” for various search queries." This is easily the worst offender IMO.
- A wait time of 19 days
- Separating the download into 74 buttons