They say it was caused by a faulty channel file. I don't know what a channel file is, and they claim to not rely on virus signatures, but typically anti virus product need the latest signatures all the time and poll them probably once an hour or so. So I'm not surprised that an anti virus product wants to stay hyper updated and updates are rolled out immediately to everyone globally.
No, I'm not surprised either. But if you're operating at this kind of scale and with this level of immediate roll-out, what I would expect are:
* A staggered process for the roll-out, so that machines that are updated check-in with some metrics that say "this new version is OK" (aka "canary deployment") and that the update is paused/rolled back if not.
* Basic smoke testing of the files before they're pushed to any customers
* Validation that the file is OK before accepting an update (via a checksum or whatever, matched against the "this update works" automated test checksums)
* Fuzz tests that broken files don't brick the machine
Literally any of the above would have saved millions and millions of dollars today.