Hacker News new | past | comments | ask | show | jobs | submit login

For fun, I run a bare metal k8s cluster with my blog and other projects running on it. My last three nights have been fighting bugs. Bugs with volumes not attaching, nginx magically configuring itself incorrectly, and a whole bunch of other crap. This just magically started happening, but crap like this seems to happen at least once a month. It’s to the point where I spend at least one night a week babysitting the cluster.

I don’t have to pay someone else to handle this, but if did, I would get rid of k8s in a heartbeat. I’ve seen a devops team of only a few people manage tens of thousands of traditional servers, but I doubt such a small team could handle a k8s cluster of the same size.

I’m considering moving back to traditional architecture for my blog and other projects. K8s has been fun, but there’s too much magic everywhere.




No one has ever explained the point of it to me either.

I’ve heard it’s supposed to solve the problem of programs running differently on different machines. That’s a problem I’ve never encountered in my 12 years of experience.

But the types of issues you describe are very real and very time consuming.


> No one has ever explained the point of it to me either.

It makes sense if you use docker. Docker containers need somewhere to live. If you want two copies of your service alive at all times, K8s is the thing which will listen for crashes, and restart them, etc.


Well, sure, but what’s the point of the whole ecosystem?


The ecosystem isn't really anything more than the sum of its features.

I already mentioned K8s as an automatic container runner/restarter. But if you run two copies of a service, you need a load balancer to route traffic to them. You can program your own (more work), or download & run someone else's (less work). Or you can see what K8s provides [0] and do even less work than that.

If your services talk to one another, they could talk by hard-coded IP (maintenance nightmare), or by hostname. If they talk by hostname, then they need DNS to resolve those host names. Again, you can roll DNS yourself, or you can see what K8s gives you [1].

And on and on. Firewalls, https, permissions, password/secrets management.

There's one more thing to say about K8s which is that it has become a bit of a defacto standard. So you don't need to relearn a completely new way of doing this stuff if you decide to switch jobs / cloud providers.

[0] https://kubernetes.io/docs/concepts/services-networking/ingr... [1] https://kubernetes.io/docs/concepts/services-networking/dns-...


K8s gives you a lot for free, until it doesn’t. I’m not saying the old way is better, but if it is better — it’s easier to fix when shit hits the fan. A bad day on k8s will take you completely offline, while a bad day on a single server may or may not take you completely offline (depending on your backup situation and how good your devops is).


You’re not making an apples to apples comparison. You can run k8 on a single server or run 1,000 bare metal servers. Number of servers and how you deploy to them are tangential things and not mutually exclusive.

You seem to also be implying that by running a single bare metal server you have eliminated any chance of downtime which isn’t true

For example if your process crashes on bare metal, you go down, unless you have some kind of supervisor that watched and restarts the process, if youre not using kubernetes as a supervisor then you need to set one up using some other tool. At the end of the day you can’t eliminate all tooling/downtime.


I was just saying that no matter what, all your eggs are in one basket. K8s is program that can fail like any other program. If it does fail (like etcd getting corrupt or even the process itself crashing for some reason) you can end up with a collection of servers that can’t do anything (I’m actually in this position right now). It’s exceedingly rare that this can happen, but it’s also exceedingly rare with regular servers. The difference is cost, right?

If a single server fails, you may be offline but there are well-tread paths to come back online. Your material cost is the cost of that single server. If k8s goes down, oh boy. Not only is it very complex, requiring knowledge of how it works to diagnose and recover from, but there can be zero documentation on how to recover. You are now also paying for a cloud of bricks.


A random example from $dayjob: vendors like ESRI ship products that are actually a dozen spread across five sets of servers with certificates and load balancers everywhere. My customer has 7 sets of them due to acquisitions, each with dev, test, and prod instances. That’s 21 sets of a dozen servers or so. Just keeping up with OS updates and app patching is nearly a full time job!

Or just apply their official helm chart… and you’re pretty much done. You’ll also get better efficiency because the various environments will all get bin-packed together.

Is it perfect? No, but it’s better than doing to yourself!


Consider the alternative in conditions where you need various forms of scalability in a cloud agnostic way. Especially when you have a complicated system of many services.


It makes redundancy, self healing, scaling, rolling updates, rollbacks and the like error easy, assuming that your services are stateless.

If you do not need this features k8s is not the thing to use, unless you have the skill set anyways.

Things get messy when state and persistence is involved, I’d prefer to habe my backend DB not on k8s and link the services against it.


I think some uses cases might be - running/testing software on a variety of hardware configurations, and sharing a limited pool of machines among people/projects.


K8s was never meant to be used for running a blog :) It was built to support Google-scale deployment, with probably dozens of engineers just supporting the live clusters as they stumble into various bizzare states.


Well, yeah. I just stuck my blog on there to reduce infrastructure costs for myself. The cluster runs much bigger things beside my blog :)


> I don’t have to pay someone else to handle this, but if did, I would get rid of k8s in a heartbeat. I’ve seen a devops team of only a few people manage tens of thousands of traditional servers, but I doubt such a small team could handle a k8s cluster of the same size.

This has been my experience with a lot of the "we need to be cloud native! containers!" mantra in the enterprise. Some exec gets it in their head it's a good idea (and probably gets non-trivial "referral agent fees") this is a must do, and all of the young, hip developer types are happy to cheerlead it.

Two years later OpEx is exploding, most of the processes haven't yet been converted to be in the cloud, and the environment isn't noticeably better or different. It sucks, just sucks in a new and more expensive way that gives you less control of your data.

Seen this at 3 x F500 orgs and with multiple cloud providers, including the big 3 + one of the well known second tiers.


Hey, you can try nomad. It works nice for small-med projects.

Works well with terraform and extending with nomad clients it's a breeze.

I set my personal bare metals with nomad infra and never looked back.


How do you solve logging and storage? Those were two issues that caused me to leave it behind. With k8s, there is longhorn for storage so I can move databases around and have volumes replicated to deal with disk failures. Is there anything like that for Nomad?


Do you mean this? https://developer.hashicorp.com/nomad/tutorials/stateful-wor...

Honestly I didn't got into the position where this to become critical and always managed to stay ahead.

So far(2yrs) so good. I am mostly one man show and I found k8s a bit too much, there is always something.

Maybe I was doing smth wrong or didn't knew how to plan better, idk :)

Edit//typos


Nowadays when i try to scaffold quick ideas, I just start a cloudflare worker. You get a url, cron, key-value store and express-like JS server going with a click of a button. I don't even have to npm -i.

I can definitely see the appeal of tinkering with 'advance tech' for personal hobby tho. Because now I am pretty sure you know more about K8 than me :)


> My last three nights have been fighting bugs

> I spend at least one night a week babysitting the cluster

...

> K8s has been fun

This is why everything sucks now.


Well, for a counter anecdote, my bare metal K8S cluster haven't bugged out on me for months, living through multiple versions upgrade.

To each his own, I guess.


I’m quite jealous. How big is your cluster? I’ve got several hundred cpus, nodes just for storage, and multi-region services. It’s quite a beast.


I've 80 nodes, totaling around ~700 CPU cores, all in the same DC though.


Use docker compose or portainer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: