> I find that pretty embarrassing for a company like Cloudflare, which powers su...

marcinzm · on Nov 4, 2023

> Complete and utter management failure. And customers apparently are sold what Cloudflare internally considers to be alpha quality software?

This has been my experience with AWS and GCP as well. Assume anything that's under 3 years old is not really GA quality no matter what they say publicly.

arrakeenrevived · on Nov 4, 2023

I've been involved with some new service launches at AWS, and it's a strict requirement that everything goes through some rigorous operational and security reviews that cover exactly these issues before the service can be launched as GA. Feature-wise people might consider them "alpha", but when it comes to the resilience and security of the launched features, they are held to much higher standards than what is being described in this post-mortem.

phan · on Nov 4, 2023

Your operational reviews must be lacking at AWS then (surprise surprise) then because there are so many instances where something will be released in alpha yet the documentation will still be outdated, stale and incorrect LOL.

arrakeenrevived · on Nov 4, 2023

I think you misunderstand what's being talked about in this thread. "Operations" in this context has nothing to do with external-facing documentation, and instead refers to the resilience of the service and ensuring it doesn't for example, stop working when a single data center experiences a power outage.

marcinzm · on Nov 4, 2023

"It stopped working because you did XYZ which you shouldn't have done despite it not being documented as something you shouldn't do" isn't different to a customer than a data center going down. For example, I'm sure the EKS UI was really resilient which meant little when random nodes dropped from a cluster due to the utter crap code in the official CNI network driver. My point wasn't that every cloud provider released alpha level software by the same definition but that by a customer's definition they all released alpha level software and label it GA.

sofixa · on Nov 6, 2023

> This has been my experience with AWS and GCP as well. Assume anything that's under 3 years old is not really GA quality no matter what they say publicly.

GCP run multi-year betas of services and features, so I'm doubtful there were still things not ironed out for GA. Do you have some examples?

organsnyder · on Nov 4, 2023

Having worked at companies with varying degrees of autonomy, in my experience a more flexible structure allows for building systems that are ultimately more resilient. Of course, there are ways to do it poorly, but that doesn’t mean it’s a “complete and utter management failure”.

brookst · on Nov 4, 2023

> Complete and utter management failure

Too strong. A failure certainly, but painting this as the worst possible management failure is kind of silly.

isbvhodnvemrwvn · on Nov 4, 2023

To be honest if you take the circumstances and them spending half of their post-mortem blaming the vendor, it does look like a total shitshow.

davewritescode · on Nov 4, 2023

I’m going to leave out some details but there was a period of time where you could bypass cloudflare’s IP whitelisting by using Apple’s iCloud relay service. This was fixed but to my knowledge never disclosed.

belter · on Nov 4, 2023

There was a time when they were dumping encryption keys into search engine caches for weeks, and had the audacity to claim here, the issue was "mostly" solved. Until they were called out on it by Google Project Zero team...

"Cloudflare Reverse Proxies Are Dumping Uninitialized Memory" - https://news.ycombinator.com/item?id=13718752

byteknight · on Nov 4, 2023

There still exist many bypasses that work in a lot of cases. There's even services for it now. Wouldn't be surprised if that or similar was a technique employed.

sidikmat · on Nov 5, 2023

Saw.t