More

llama052 · 2024-12-18T21:11:54 1734556314

Agreed, this feels like the kubernetes descheduler as a service or something. Wild.

llama052 · 2024-09-09T03:00:10 1725850810

So you had auto update enabled on your cluster and didn’t keep your apiversions up to date?

Sounds like user error.

rvense · 2024-09-09T06:20:56 1725862856

One of my main criteria for evaluating a platform would be how easy it is to make user errors.

psini · 2024-09-09T08:54:46 1725872086

To be honest the API versions have been a lot more stable recently but back in ~2019 when I first used Kube in production, basic APIs were getting deprecated left and right, 4 times a year; in the end yes the problems are "on you" but it so easy to miss and the results so disastrous for a platform whose selling points are toughness resilience and self-healing

llama052 · 2024-08-01T21:08:35 1722546515

Security compliance requires all sorts of "invasive" tooling to ensure your client workstations and servers are "safe". Sadly it's mostly a checkmark and often times has dated and arbitrary requirements. As far as I know CrowdStrike was one of the easier ones to setup albeit expensive.

acchow · 2024-08-01T21:12:17 1722546737

Ticking off checkmarks to deflect liability seems to be the point of the Security products market

llama052 · 2024-08-01T21:34:36 1722548076

I think it's mostly driven by a few things..

A. Doing security is expensive and viewed as a cost burden at a lot of non-technical focused companies. Lots of businesses hedge their bets hoping that a security incident won't be as expensive or detrimental as having a great security posture. Sadly often times they aren't wrong either.

B. Security compliance standards are dated and opinionated, requiring rigid solutions to complex ever changing security threats.

Both of those can drive the narrative of pushing for tooling that offers the least amount of resistance to implement and be able to claim "secure".

Additionally IT and Operations teams are constantly getting more duties and can be some of the first teams to get rightsized and viewed as "cost centers" in some companies. I've seen teams reduced 50-80% over the years with expectations higher and security compliance becoming the last on the list and then gets the least amount of energy and attention.

llama052 · 2024-07-12T18:00:27 1720807227

I assume https://www.talos.dev/

Basically a small OS that will prop itself up and allow you to create/adopt into a Kubernetes cluster. Seems to work well from my experience and pretty easy to get set up on.

llama052 · 2024-07-12T17:56:56 1720807016

100% this.. There's also exciting projects like Talos, Rancher, and the like for self-hosting Kubernetes that makes it entirely more manageable.

So much saturation in this space of people trying to create one off solutions, which on some level I admire. However the further off the main path you go the more you lock yourself into problems you can't troubleshoot or edge cases that aren't supported.

Abstraction these days is alluring, and it's cool! However you want something well known, well supported, (from multiple companies ideally) and documented. The hate for understanding kubernetes is just hate for having to understand layers of orchestration, or worse the layers behind the application.

If it's too complicated then you might not need it. Any platform you use will have those same layers, it just depends on how much is assumed or exposed to you. If you don't want to see any dials or options then use a managed solution, not a roll your own platform tool. That's of course assuming a few virtual machines managed by hand doesn't satisfy your needs, but if that's the case you don't need a platform solution (and hopefully it's not production).

llama052 · 2024-06-29T01:26:29 1719624389

Yeah I agree, it's unfortunate that security compliance is often just making things as hard as possible in most organizations. Often times devops get to be the face of that and trust me.. they sure as hell don't want to have to hold hands even more.

llama052 · 2024-06-29T01:20:26 1719624026

Can you clarify what you mean by "leaks implementation details everywhere"?

I like to think of kubernetes as a big orchestration platform that you can choose to use what you need. If an ingress and pods work then use that, otherwise extend an throw an operator up for what you need (it likely already exists).

Cilium for instance is great for that, so is Istio and the like. They aren't hard you just have to understand networking... which is nearly the same energy of running it on another orchestration tool or raw on a network device.

weitendorf · 2024-06-29T02:11:03 1719627063

You shouldn’t have to think about all the implementation details of your deployment target. There shouldn’t be platform engineers or Kubernetes experts. Nobody should be writing YAML, or getting paid to set up Istio. Nobody should have to learn the Kubernetes architecture or know about the kubelet or EBPF. The tools should be simple enough with good defaults that application developers just click a button and have their code run somewhere. Right now platform engineers fill that gap, because the underlying tech doesn’t.

IMO you are thinking too much like an engineer if you are saying you “just have to understand networking”. Why? It’s always better when the problem gets solved in a way that allows you to not think of it too much. Right now SREs and the platform team do that for application developers because Kubernetes only does it halfway.

Infrastructure/devops/SRE is a pure means to an end of getting actual applications (the ultimate source of all the value in software) to run. It’s an obstacle. Right now the obstacle is bigger than it needs to be

llama052 · 2024-06-29T04:06:07 1719633967

So there should be magic on every layer except for the application?

That will never happen, the only thing you can do is pay someone like heroku to take care of that for you. Or if your project is small and plain enough you run “serverless” which is just routing to another platform team, as I’m sure you’re familiar.

It’s complicated because there’s a lot that goes on, kubernetes or not.

weitendorf · 2024-06-29T06:58:25 1719644305

You literally use magic like that all the time with Linux, compilers, language runtimes. Magic is perfectly good when it works. Being able to do anything you want with a quick “presto” is amazing. Nobody should have to learn arcane spells, study the ancient tomes, and communicate with beings of the ethereal plane - somebody just needs to figure out the next “presto”. And historically speaking, usually somebody does when there’s an incentive to do so.

In other words - Serverless is another platform team in the same way Linux is another OS team or Java is another language team. And for 99.9% of companies a language team or OS team would be absurd. There’s a big incentive for platform work to go the same way.

stackskipton · 2024-06-29T03:14:02 1719630842

Ops here, In a simple Kubernetes environment, you don't have to know networking. However, few environments are simple and abstracting X away becomes extremely difficult job once business requirements collide with abstraction.

Obstacle is big most of time because most applications are not easy to run. Most DevOps mostly came around because Devs flinging balls of mud over the wall and landing on the Ops side with a splat and then screaming when we can't build nets to catch the mud ball and Ops is covered with mud. Sure it failed just like DevSecOps fails because most of time, Devs don't care about anything other than closing Jira tickets and going home.

osigurdson · 2024-06-29T04:04:37 1719633877

I think that devs really should have a decent understanding of Kubernetes. It is essentially the operating system for any app that needs more than one computer.

klooney · 2024-06-29T03:28:30 1719631710

You don't need any of that, EKS works out of the box with Fargate.

But companies don't want that, they want to support EKS and data centers, which means supporting the "implementation" side of all of the interfaces, which means getting down into the details.

The real problem is that every platform team, deep down, wants to rewrite EKS and they often do, which I would describe as "a giant money pit".

Whitespace · 2024-06-29T03:38:55 1719632335

Is it fair to say that you are in favor of Heroku, fly.io, ECS, etc?

weitendorf · 2024-06-29T07:05:46 1719644746

I think those products are much better than k8s for many use cases, but I don’t think that anybody has really delivered the right thing in that space yet.

nunez · 2024-06-29T16:36:08 1719678968

Just look at the release notes for every major release. They focus on what's new within Kubernetes architecture instead of what's new that benefits users.

Things like having to remove finalizers appended onto resources when you delete them and they hang.

Damn near everything about Custom Resources.

This isn't a dig at Kubernetes; I love using it. However, I agree that it leaks implementation everywhere.

llama052 · 2024-06-29T01:18:09 1719623889

Kubernetes is actually extremely popular all around the world. Chickfila if I recall correctly deploy it in every single store!

A lot of big dinosaur corporations are implementing it actively. Unfortunately VMs or Kubernetes or whatever tooling is still going to suck if you have shitty people using them.

chris_wot · 2024-06-29T03:05:56 1719630356

VMs are going to suck if you use VMWare and they hike the price four fold.

llama052 · 2024-06-29T01:16:45 1719623805

I think that's the issue, it's not the tooling. Kubernetes is great... unfortunately people are awful at unpacking the underlying layers.

llama052 · 2024-06-29T01:14:38 1719623678

I'll start by saying that I think knowledge for knowing layers underneath the application is fading in some circles, and that makes me sad.

Having been a frontend guy some 10+ years ago, into a network engineer, then infrastructure engineering and now SRE. The amount of people on both sides of the developer circle and operations circle that do not want to understand what's going on is mind boggling.

I was around when VMs were hot, when treating them as long living pets was just toil that operations dealt with. The collection of shell scripts to make that toil go away was nice. Then puppet, ansible and the like.

Now we are in the golden ages of Kubernetes and orchestration platforms. We have a set of standards for how things can be operated. The terms are obfuscated sure, but the core concepts are still the same underneath the abstraction.

I agree that platform engineering is a good place to be, and honestly it needs to be understood more by all parties including executives. They were bought and sold cloud on the idea that it's all managed, but that cannot be further from the truth, wrinkles will show as scale grows and your use cases progress in any environment, at home or in the cloud.

Unfortunately good platform teams often aren't seen. A good platform just works, metrics just exist, logs just work, tracing just works out of the box. Things don't often go down. It's really only visible when things fail. If you do a great job implementing a self service platform you're often met with executives wondering why you're there because the cloud does it all!

Applications are highly visible to all, but so are the layers underneath and they all work together if done correctly, I wish that was more understood.

For context, I'm currently running multiple environments of Kubernetes, on premise and in cloud. Our team prides itself on using open source solutions utilizing the operator model. Prometheus, Thanos, Loki, Tempo, Istio, Cert-Manager, Strimzi Kafka, Flink operator, Otel collector etc. We do billions of requests a month and TBs of bandwidth with microservices. Have at a minimum 4 9's of uptime, and our cost footprint is extremely small. This comes from a 4 man platform team that also handles on call for all applications, security, cloud budget, and operations. It's not impossible.

I guess I can't emphasize enough that understanding what the orchestration systems, the tooling and the stack are trying to do makes everything easier. As a developer you can understand your constraints and limitations. You can build off of known barriers. As an operations or platform engineer you can build things that don't require constant babysitting or toil.. you can save hundreds of thousands of dollars not offloading your observability to data dog or the like, you can make an impact. The technology is already here.

hitsurume · 2024-06-29T04:13:37 1719634417

I'm interested in knowing more about how you guys implemented the operator model and decided on those tools. Was there a book or anything that was helpful in all of this?

llama052 · 2024-06-29T17:10:51 1719681051

Nothing specific I've read that helped us with this, our goal was finding the easiest path to entry to get the tooling we desire in our platform.

I've debated writing it up and posting it somewhere, maybe I should. There's so many ways of doing things now that it's quite overwhelming sometimes.

hitsurume · 2024-06-29T17:59:51 1719683991

Yea exactly, thats why I asked haha.

wyclif · 2024-06-30T01:25:12 1719710712

I came here to ask this question as well!