I think Amazon uses an internal platform called Dynamo as a KV store, it’s different than DynamoDB, so im thinking the outage could be either a dns routing issue or some kind of node deployment problem.
Both of which seem to prop up in post mortems for these widespread outages.
They said the root cause was DNS for dynamoDB. inside AWS relying on dynamoDB is highly encouraged so it’s not surprising that a failure there would cascade broadly. The fact that EC2 instance launching is effected is surprising. Loops in the service dependency graph are known to be a bad idea.
It's not a direct dependency.
Route 53 is humming along... DynamoDB decided to edit its DNS records that are propagated by Route 53... they were bogus, but Route 53 happily propagated the toxic change to the rest of the universe.
DynamoDB is not going to set up its own DNS service or its own Route 53.
Maybe DynamoDB should have had tooling that tested DNS edits before sending it to Route 53, or Route53 should have tooling to validate changes before accepting them. I'm sure smart people at AWS are yelling at each other about it right now.
Both of which seem to prop up in post mortems for these widespread outages.