> No one has ever explained the point of it to me either.
It makes sense if you use docker. Docker containers need somewhere to live. If you want two copies of your service alive at all times, K8s is the thing which will listen for crashes, and restart them, etc.
The ecosystem isn't really anything more than the sum of its features.
I already mentioned K8s as an automatic container runner/restarter. But if you run two copies of a service, you need a load balancer to route traffic to them. You can program your own (more work), or download & run someone else's (less work). Or you can see what K8s provides [0] and do even less work than that.
If your services talk to one another, they could talk by hard-coded IP (maintenance nightmare), or by hostname. If they talk by hostname, then they need DNS to resolve those host names. Again, you can roll DNS yourself, or you can see what K8s gives you [1].
And on and on. Firewalls, https, permissions, password/secrets management.
There's one more thing to say about K8s which is that it has become a bit of a defacto standard. So you don't need to relearn a completely new way of doing this stuff if you decide to switch jobs / cloud providers.
K8s gives you a lot for free, until it doesn’t. I’m not saying the old way is better, but if it is better — it’s easier to fix when shit hits the fan. A bad day on k8s will take you completely offline, while a bad day on a single server may or may not take you completely offline (depending on your backup situation and how good your devops is).
You’re not making an apples to apples comparison. You can run k8 on a single server or run 1,000 bare metal servers. Number of servers and how you deploy to them are tangential things and not mutually exclusive.
You seem to also be implying that by running a single bare metal server you have eliminated any chance of downtime which isn’t true
For example if your process crashes on bare metal, you go down, unless you have some kind of supervisor that watched and restarts the process, if youre not using kubernetes as a supervisor then you need to set one up using some other tool. At the end of the day you can’t eliminate all tooling/downtime.
I was just saying that no matter what, all your eggs are in one basket. K8s is program that can fail like any other program. If it does fail (like etcd getting corrupt or even the process itself crashing for some reason) you can end up with a collection of servers that can’t do anything (I’m actually in this position right now). It’s exceedingly rare that this can happen, but it’s also exceedingly rare with regular servers. The difference is cost, right?
If a single server fails, you may be offline but there are well-tread paths to come back online. Your material cost is the cost of that single server. If k8s goes down, oh boy. Not only is it very complex, requiring knowledge of how it works to diagnose and recover from, but there can be zero documentation on how to recover. You are now also paying for a cloud of bricks.
A random example from $dayjob: vendors like ESRI ship products that are actually a dozen spread across five sets of servers with certificates and load balancers everywhere. My customer has 7 sets of them due to acquisitions, each with dev, test, and prod instances. That’s 21 sets of a dozen servers or so. Just keeping up with OS updates and app patching is nearly a full time job!
Or just apply their official helm chart… and you’re pretty much done. You’ll also get better efficiency because the various environments will all get bin-packed together.
Is it perfect? No, but it’s better than doing to yourself!
Consider the alternative in conditions where you need various forms of scalability in a cloud agnostic way. Especially when you have a complicated system of many services.
It makes sense if you use docker. Docker containers need somewhere to live. If you want two copies of your service alive at all times, K8s is the thing which will listen for crashes, and restart them, etc.