More

zbentley · 2026-06-16T01:06:06 1781571966

I think the best supported and most mature pattern on most big cloud providers is precisely

> do stuff in parallel either by hand or by terraform

…specifically by terraform. Making k8s own the provisioning and management of external infrastructure on principle (as opposed to when that makes sense, e.g. load balancers/gateway/CSI providers) is not a good approach. Sure, it feels unified, but the cost of unification is incredibly not worth it.

mschuster91 · 2026-06-16T13:53:03 1781617983

> Sure, it feels unified, but the cost of unification is incredibly not worth it.

That's the cost I was talking about. It is indeed annoying and time-consuming to get it set-up once, but once it works... it is amazing for developers to have the ability to spin up a completely identical to prod environment for a hotfix branch to test stuff out, with no involvement from ops or anyone else.

And also, it's much easier IMHO to get a mental image of how a system is constructed when it's one architecture - no matter if it's k8s/helm or Terraform. But as soon as you have both in the mix, you get friction issues, you have to pass stuff from Terraform to Helm or vice versa... and may God have mercy upon you if you also have Ansible in the mess, I had to do that once for a piece of proprietary dependency that would not have been supported by the vendor in any place other than a SLES bare metal server.

JohnMakin · 2026-06-16T15:21:43 1781623303

Yea, I used to believe this too, and still sort of agree - I got so tired of the argument in maintaining k8s infra in terraform I gave up and wrote what is essentially a terraform wrapper module around helm. The charts break terraform quite a bit sometimes, so you have to keep it simple, and god help you if you want to use CRD's, hashicorp providers have the notion no one actually needs those.

I had dismal hopes of it working for very long but it's remained mostly untouched going on 3 years now which really surprised me, and it's been easy to work with. I think if you run EKS resources like node groups, autoscalers, LB type of resources in the same state file as helm deployments you're going to have a very bad time though.

mschuster91 · 2026-06-16T21:15:07 1781644507

> I think if you run EKS resources like node groups, autoscalers, LB type of resources in the same state file as helm deployments you're going to have a very bad time though.

There's no alternative to that anyway... otherwise even a terraform apply -refresh=false will quickly take well over 10 minutes.

JohnMakin · 2026-06-17T01:23:49 1781659429

seperate applies in different state files? I establish hard loosely coupled separations here and it’s been fine as terraform wrapper around helm. I’d rather run ci jobs around gitops + charts using whatever your preferred flavor but current terraform providers seem fine with it as long as you arent overly relying on crd’s that like to track state via timestamps, terraform doesnt like that, but someone might depending on their use case.

zbentley · 2026-06-16T01:00:19 1781571619

The linked article discusses very different reasons for preferring kube. CTOs and hiring managers like it for reasons totally different from the cargo cult/hype-driven engineers.

zug_zug · 2026-06-16T01:06:07 1781571967

Yeah I read the article and I saw that.

And I do think there is a way to use kubernetes with minimal damage, but it requires making firm rules about not focusing on things that aren't needed yet (e.g. istio) and making firm hiring choices about only people who understand that such optimizations are complete wastes of time for a series A startup.

zbentley · 2026-06-16T00:57:26 1781571446

Spot on. I have a lot of trouble convincing cloud folks that for durable state, you probably don’t want kubernetes. It’s not that e.g. the CSI drivers and operators for clustered databases aren’t top notch—they are; the era of “avoid stateful kube services” is long behind us—it’s that the cloud provider managed services for e.g. blob stores or databases are so much more reliable. The S3s and Auroras of the world are expensive for a reason: no matter how good your kube native database operator is, it still doesn’t assume responsibility for a ton of the failure points that managed services do. And that’s true even at modest scale (e.g. upgrades are just harder when you’re running your own DB) and in cost conscious environments (sure, the Elasticache bill is steep, but the salary and velocity cost of fixing memory-leak-caused kube memcached crashes is steeper).

portly · 2026-06-16T05:05:33 1781586333

Also cluster migrations are required pretty often in my company. Having state on a cluster means migrating that as well, which is a complex and time consuming operation. Having your state in S3 or external database makes migrations a breeze.

zbentley · 2026-06-16T00:49:08 1781570948

Is it analogous to portainer with a git-pull-compose-apply loop?

zbentley · 2026-06-07T18:56:41 1780858601

Kind of. Those exist, but because Linux’s formal ABI is syscalls and not libraries that combine them in known-safe ways, the clone speedups that make fork faster are a confusing and fragile API for low-level programmers to use.

That, and even those clone-without-pagetable-copy improvements leave a lot of slowness on the table. Being able to skip even disable-able functionality intended for fork would simplify code. Also, for programs that launch the same subprocess many times, a better API might allow caching away some of the pre-entrypoint initialization of exec.

zbentley · 2026-06-05T21:29:44 1780694984

> I will acknowledge industrialization improved people's access to wealth and materialism.

And reduced illness, increased education, increased access to better nutrition, increased lifespan, increased able lifespan (knees/back/teeth don’t give out as early), and lots more.

Like, even if I grant that this replaced human connection (and I’m not sure that’s true, nor am I sure if it is meaningfully true—access to water replaces thirst, too), some very substantial benefits were acquired in return.

nashashmi · 2026-06-06T01:46:37 1780710397

The theory that improved health and safety and lifespan will shrink the urge to procreate is so far fetched I find it hard to imagine. The longer you live, the more likely you seek connection. It would be easier to imagine that long lifespan and better health makes people less attached to their spouse.

zbentley · 2026-06-06T02:27:08 1780712828

> improved health and safety and lifespan will shrink the urge to procreate

Not what I said at all. Note the “even if I grant … (which I don’t…)”.

nashashmi · 2026-06-07T03:48:05 1780804085

Oh, I misunderstood. You were only talking about the greater positives of industrialization, to counter ideas against it. That is true without debate. Material wealth going up, plus improved health conditions, are all positives. Material wealth replacing human relationships is not.

zbentley · 2026-06-07T14:29:24 1780842564

Thanks for clarifying and amending.

toasty228 · 2026-06-06T07:21:45 1780730505

We went too far though, the main causes of death in the west are now all due to overconsumption, more than 50% of westerners are overweight, etc.

zbentley · 2026-06-05T21:21:54 1780694514

> there's a reason authoritarians across the world are banning abortion and targeting birth control

I don’t think that’s because of birth rate decline. While authoritarians give lip service to that occasionally, it’s never their primary cited reason (which is usually some combo of religion, purported return to “traditional” prosperity via reduced promiscuity, aggression against feminist political opponents, etc). Also, most authoritarians aren’t that long-term in their goals.

zbentley · 2026-06-05T21:11:36 1780693896

Eh, that argument works on any claim and is nonfalsifiable-ish, so I think it can be ignored.

People buying more chocolate ice cream than vanilla? Could be changing preferences or Hersheys marketing, or it could be undetected brain worms. People voting for one political party over others? Could be that party is campaigning/governing in a more popular way, could be brain worms.

If there’s evidence of contaminants or whatever influencing behavior strongly enough to change large scale demographic trends, then present it. Otherwise, your best chance at good data is to take people at their word when they say why they do things.

trumpdong · 2026-06-05T23:09:08 1780700948

We know some of the pharmaceutical residues in our sewage turn frogs gay (that really happened, that wasn't AJ making something up). We know pharmaceuticals can greatly affect people's sex drive, general mood, and other psychological factors. It's definitely not a stretch to guess we might be doing it to ourselves.

zbentley · 2026-06-05T15:42:09 1780674129

A few nice things about doing this in no particular order:

Embedding would make local dev/CI integration testing convenient.

Embedding replicated Redis with each application instance would give you HA benefits while infra-management complexity.

Embedded redis (even via local RPC) is still going to be faster than a lot of languages or frameworks’ built-in data structures. Large array operations in, say, Python are gonna slower than RPCing to Redis (assuming that the data structures are built gradually and not built all at once); to beat Redis you’d have to use numpy or something—-which is definitely preferable, but is extra work if your app already uses Redis for other things.

Just like choosing SQLite over e.g. LMDB or RocksDB, embedded Redis would be a nice future proofing option for small apps during the prototype phase; less would have to be changed to move Redis out of the app than if a different cache or persistence service were chosen.

zbentley · 2026-06-03T17:53:44 1780509224

Not the case; good abstractions are valuable, but the performance differences between runtimes are very real.

Take the example of some simple HTTP<->blob store service gets slammed with millions of requests when someone using the API does a backfill via some framework on their end that aggressively scales request volume up and out.

Something like, say, async Python/starlette with a coroutine per request is gonna perform slightly worse than Erlang, which in turn is gonna perform much worse than Go.

You're right that those differences are sometimes marginal when the latency of whatever IO the backend's doing dominates the equation. However, in my experience huge volume surges show issues with the runtime (the thing managing/launching multiplexed request handler routines) or the ecosystem (the backend IO libraries' ability to work with the runtime's IO multiplexing and make things like request coalescing easy or automatic) more often than you'd think.

It really takes surprisingly little volume to cripple a return-hello-world Phoenix app that indirects the "hello world" behind way too much middleware and message passing; it takes even less to kick over, say, a Gunicorn instance returning "hello world" at the bottom of the Django middleware stack. Golang with Gin, on the other hand, is surprisingly hard to cripple in the same way. And I say that as someone who likes Elixir and Python a lot more than I like Go!

pdimitar · 2026-06-03T18:07:09 1780510029

Thank you. As a guy who made a career out of Elixir (and begins to regret it recently but oh well) I agree that Elixir's throughput is not amazing. However, it can get very far and we should always optimize for the most common usages.

I've personally rewritten one hobby and one professional projects from Elixir to Golang and loved the result; as you said, extremely difficult to bring down a Golang service to its knees.

One clarification: Phoenix server behind Caddy/nginx fairs better btw. But, details. Your point stands.

I am yet to see a Rust web/API service I wrote to _ever_ buckle under pressure and just crash. It was either an application bug (like the famous Cloudflare's `.unwrap()` error from the last weeks/months) or the Linux OOM killer. Literally never crashed. But I did witness it brutally murder a MySQL cluster because it couldn't serve it fast enough. That was both fun and terrifying to watch on the dashboards.

zbentley · 2026-06-03T18:17:36 1780510656

> I did witness it brutally murder a MySQL cluster because it couldn't serve it fast enough. That was both fun and terrifying to watch on the dashboards.

Haha yep. In my experience, everyone running CGI/process-per-request application servers is bullish on switching to a concurrent or cooperative runtime...until they realize they just removed the primary ratelimiter on downstream DB/service accesses.

The converse war stories are also amusing: people rewrite their whole app in a concurrent/asynchronous framework and nothing changes, because the DB driver is still farming out all queries to a tiny fixed-size threadpool of connections that was the bottleneck all along.

pdimitar · 2026-06-03T18:20:55 1780510855

Oh yeah, definitely. If your DB server (or any storage backend) cannot have like 200+ connections alive at all times then it's absolutely pointless rewriting your app in Elixir or Golang. You'll just serve DB timeouts in your responses.

midnight_eclair · 2026-06-03T18:53:17 1780512797

> You're right that those differences are sometimes marginal when the latency of whatever IO the backend's doing dominates the equation. However, in my experience huge volume surges show issues with the runtime (the thing managing/launching multiplexed request handler routines) or the ecosystem (the backend IO libraries' ability to work with the runtime's IO multiplexing and make things like request coalescing easy or automatic) more often than you'd think.

fair enough, although at this point we start talking about LB in front of the thing, consumption mechanics, autoscaling signals

i will still maintain that my simple advice for a dev worrying about scale, is that they should focus their efforts on ensuring downstream IO doesn't get overwhelmed (db read replicas, caching, etc) before optimizing runtime performance or autoscaling out unnecessarily.

zbentley · 2026-06-03T19:25:48 1780514748

> focus their efforts on ensuring downstream IO doesn't get overwhelmed (db read replicas, caching, etc) before optimizing runtime performance or autoscaling out unnecessarily.

All good advice, but the choice of runtime can affect the point at which autoscaling and load balancing even need to enter the conversation at all. Optimizing, say, a mostly in-memory cache service and writing it in Golang may yield results like "we can run a single instance of this and serve three orders of magnitude of business growth; slap it behind a DNSRR or a k8s NodePort for update/replacement/fast failover if it crashes, no complex load balancer needed", where writing the same thing in, say, PHP might require discussing orchestration/load balancing/memory/worker process recycling/autoscaling early on in the service's lifetime. Being able to skip those conversations (entirely or for a long time) is a very significant business benefit.