First, K8S doesn't force anyone to use YAML. It might be idiomatic, but it's cer...

delusional · 2025-06-19T20:37:51 1750365471

> One of the primary idioms in Kubernetes is looping

Indeed, working with kubernetes I would argue that the primary architectural feature of kubernetes is the "reconciliation loop". Observe the current state, diff a desired state, apply the diff. Over and over again. There is no "fail" or "success" state, only what we can observe and what we wish to observe. Any difference between the two is iterated away.

I think it's interesting that the dominant "good enough technology" of mechanical control, the PID feedback loop, is quite analogous to this core component of kubernetes.

p_l · 2025-06-19T23:30:44 1750375844

PID feedback loop, OODA loop, and blackboard systems (AI design model) are all valid metaphors that k8s embodies, with first two being well known enough that they were common in presentations/talks about K8s around 1.0

globular-toast · 2025-06-20T08:28:42 1750408122

What you're describing is a Controller[0]. I love the example they give of a room thermostat.

But the principle applies to other things that aren't controllers. For example a common pattern is a workload which waits for a resource (e.g. a database) to be ready before becoming ready itself. In a webserver Pod, for example, you might wait for the db to become available, then check that the required migrations have been applied, then finally start serving requests.

So you're basically progressing from a "wait for db loop" to a "wait for migrations" loop then a "wait for web requests" loop. The final loop will cause the cluster to consider the Pod "ready" which will then progress the Deployment rollout etc.

[0] https://kubernetes.io/docs/concepts/architecture/controller/

tguvot · 2025-06-19T22:39:19 1750372759

i developed a system like this (with reconciliation loop, as you call it) some years ago. there is most definitely failed state (for multiple reasons). but as part of "loop" you can have logic to fix it up in order to bring it to desired state.

we had integrated monitoring/log analysis to correlate failures with "things that happen"

vbezhenar · 2025-06-20T14:08:33 1750428513

> Or, crash, in which case, the ReplicaSet controller will restart the app for you.

This does not work good enough. Right now I have an issue, where keycloak takes a minute to start and dependent service which crashes on start without keycloak, takes like 5-10 minutes to start, because repicaset controller starts to throttle it and it'll wait for minutes for nothing, even after keycloak started. Eventually it'll work, but I don't want to wait 10 minutes, if I could wait 1 minute.

I think that proper way to solve this issue is to develop an init container which would wait for dependent service to be up before passing control to the main container. But I'd prefer for Kubernertes to allow me to explicitly declare start dependencies. My service WILL crash, if that dependency is not up, what's the point to even try to start it, just to throttle it few tries later.

Dependency is dependency. You can't just close your eyes, pretending it does not exist.

otterley · 2025-06-20T14:45:31 1750430731

I’d contend that you’re optimizing for initial deployment speed rather than overall resilience. Backing off with increasing delays before retrying a dependent service call is a best practice for production services. The fact that you’re seeing this behavior on initial rollout is inconvenient, but it’s also self healing. It might take a bit longer than you like, but if you’re really that impatient, there are workarounds like the one you described.

cbarrick · 2025-06-19T23:20:07 1750375207

Big +1 to dependency failure should be recoverable.

I was part of an outage caused by a fail-closed behavior on a dependency that wasn't actually used and was being turned down.

Dependencies among servers are almost always soft. Just return a 500 if you can't talk to your downstream dependency. Let your load balancer route around unhealthy servers.

Arrowmaster · 2025-06-19T20:36:49 1750365409

You say supposed to. That's great when building your own software stack in house but how much software is available that can run on kubenetes but was created before it existed. But somebody figured out it could run in docker and then later someone realized it's not that hard to make it run in kubenetes because it already runs in docker.

You can make an opinionated platform that does things how you think is the best way to do them, and people will do it how they want anyway with bad results. Or you can add the features to make it work multiple ways and let people choose how to use it.

otterley · 2025-06-20T00:38:22 1750379902

The counter argument is that footguns and attractive nuisances are antithetical to resilience. People will use features incorrectly that they may never have needed in the first place; and every new feature is a new opportunity to introduce bugs and ambiguous behaviors.

LudwigNagasena · 2025-06-20T02:01:10 1750384870

YAML is a JSON superset. Of course anything that supports YAML supports JSON.

bboreham · 2025-06-20T05:06:38 1750395998

But it’s also true on the output side. Kubectl doesn’t output yaml unless you ask it to. The APIs only support json and protobuf, not yaml.

eddythompson80 · 2025-06-20T07:54:45 1750406085

> One of the primary idioms in Kubernetes is looping: if the dependency's not available, your app is supposed to treat that is a recoverable error and try again until the dependency becomes available.

This is gonna sound stupid, but people see the initial error in their logs and freak out. Or your division's head sees your demo and says "Absolutely love it. Before I demo it though, get rid of that error". Then what are you gonna do? Or people keep opening support tickets saying "I didn't get any errors when I submitted the deployment, but it's not working. If it wasn't gonna work, why did you accept it"

You either do what one of my collogues does, add some weird ass logic of "store error logs and only display them if they fire twice, (no three, 4? scratch that 5 times) with 3 second delay in between except for the last one, that can take up to 10 seconds, after that, if this was a network error, sleep for another 2 minutes and at the very end make sure to have a `logger.Info("test1")`

Or you say "fuck it" and introduce a dependency order. We know that it's stupid, but...

otterley · 2025-06-20T14:59:13 1750431553

This sounds like an opportunity to educate your colleagues and introduce a higher level of functionality to your deployment mechanisms. There’s a difference between Kubernetes stating a deployment of a given component is successful and the CI/CD pipeline confirming the entire application’s deployment is successful. The former frequently happens before—sometimes long before—the latter does. If the boss is seeing errors, it’s because the deployment hasn’t finished and someone or something is falsely suggesting to them that it is.

wutwutwat · 2025-06-24T08:50:33 1750755033

try to use a crd before its definition is installed. hard failure. no graceful retries. sometimes dependency order is forced

otterley · 2025-06-25T05:23:39 1750829019

Surely there’s a way to fix this problem other than introduce dependency ordering.

wutwutwat · 2025-06-28T17:54:19 1751133259

one thing depends on another. how else would you solve for ordered dependencies other than ordering dependencies?

darqis · 2025-06-20T10:40:53 1750416053

OMFG really "don't have to use yaml, json also works" are you seriously bringing that argument right now?

You're sniping at some absolutely irrelevant detail no one, absolutely no one cares about at all. Unbelievable

otterley · 2025-06-20T15:26:58 1750433218

Nearly 1/5th of the article is dedicated to criticizing YAML as the de facto language people use to work with it, and implicitly blaming Kubernetes for this fault.

mardifoufs · 2025-06-20T17:47:47 1750441667

What? It's one of the most often repeated arguments against kubernetes. Even in the article, that this entire thread is about, yaml is mentioned repeatedly.