> all of the things your company should be doing and didn't
Processes need to match the potential risk.
If your company is doing some inconsequential social app or whatever, then sure, go ahead and move fast and break things if that's how you roll.
If you are a company, let's call them Crowdstrike, that has access to push root privileged code to a significant percentage of all machine on the internet, the minimum quality bar is vastly higher.
For this type of code, I would expect a comprehensive test suite that covers everything and a fleet of QA machines representing every possible combination of supported hardware and software (yes, possibly thousands of machines). A build has to pass that and then get rolled into dogfooding usage internally for a while. And then very slowly gets pushed to customers, with monitoring that nothing seems to be regressing.
Anything short of that is highly irresponsible given the access and risk the Crowdstrike code represents.
> A build has to pass that and then get rolled into dogfooding usage internally for a while. And then very slowly gets pushed to customers, with monitoring that nothing seems to be regressing.
That doesn't work in the business they're in. They need to roll out definition updates quickly. Their clients won't be happy if they get compromised while CrowdStrike was still doing the dogfooding or phased rollout of the update that would've prevented it.
> That doesn't work in the business they're in. They need to roll out definition updates quickly.
Well clearly we have incontrovertible evidence now (if it was needed) that YOLO-pushing insufficiently tested updates to everyone at once does not work either.
This is being called in many places (righfully) the largest IT outage in history. How many billions will be the cost? How many people died?
Processes need to match the potential risk.
If your company is doing some inconsequential social app or whatever, then sure, go ahead and move fast and break things if that's how you roll.
If you are a company, let's call them Crowdstrike, that has access to push root privileged code to a significant percentage of all machine on the internet, the minimum quality bar is vastly higher.
For this type of code, I would expect a comprehensive test suite that covers everything and a fleet of QA machines representing every possible combination of supported hardware and software (yes, possibly thousands of machines). A build has to pass that and then get rolled into dogfooding usage internally for a while. And then very slowly gets pushed to customers, with monitoring that nothing seems to be regressing.
Anything short of that is highly irresponsible given the access and risk the Crowdstrike code represents.