I love k8s, but bringing back up a single app that crashed is a very different problem from "our k8s is down" - because if you think your k8s won't go down, you're in for a surprise.
You can view a single k8s also as a single host, which will go down at some point (e.g. a botched upgrade, cloud network partition, or something similar). While much less frequent, also much more difficult to get out of.
Of course, if you have a multi-cloud setup with automatic (and periodically tested!) app migration across clouds, well then... Perhaps that's the answer nowadays.. :)
> if you think your k8s won't go down, you're in for a surprise
Kubernetes is a remarkably reliable piece of software. I've administered (large X) number of clusters that often had several years of cluster lifetime, each, everything being upgraded through the relatively frequent Kubernetes release lifecycle. We definitely needed some maintenance windows sometimes, but well, no, Kubernetes didn't unexpectedly crash on us. Maybe I just got lucky, who knows. The closest we ever got was the underlying etcd cluster having heartbeat timeouts due to insufficient hardware, and etcd healed itself when the nodes were reprovisioned.
There's definitely a whole lotta stuff in the Kubernetes ecosystem that isn't nearly as reliable, but that has to be differentiated from Kubernetes itself (and the internal etcd dependency).
> You can view a single k8s also as a single host, which will go down at some point (e.g. a botched upgrade, cloud network partition, or something similar)
The managed Kubernetes services solve the whole "botched upgrade" concern. etcd is designed to tolerate cloud network partitions and recover.
Comparing this to sudden hardware loss on a single-VM app is, quite frankly, insane.
If you start using more esoteric features the reliability of k8s goes down. Guess what happens when you enable the in place vertical pod scaling feature gate?
Because single hosts will always go down. Just a question of when.