So do people containerize databases in production these days? I thought a couple...

imcritic · 2025-08-18T01:05:29 1755479129

Depends on the scale. Something small is okay to keep in containers. If you want to push performance to the limits - you definitely will run your DBMS outside a container.

fillest · 2025-08-19T10:16:23 1755598583

Services should be decoupled from OS distro dependencies as much as possible, otherwise you will be bitten at an unexpected moment (e.g. upgrading your distro packages) by some problem like this https://wiki.postgresql.org/wiki/Locale_data_changes

This can be solved by building statically (or using something like Nix) or by at least using containers.

lukaslalinsky · 2025-08-19T12:41:17 1755607277

I do, but I take a very cautious approach. I run a custom image with PostgreSQL and Patroni on Kubernetes, no operator, each replica has it's own StatefulSet tied to a specific node. There is very little automation, but it still better than running PostgreSQL outside of Kubernetes. I get the benefit of simplified monitoring, log handling, request routing, while still having very static resource assignments.

tekno45 · 2025-08-18T04:12:02 1755490322

To host it in an orchestrator your cluster has to be more available than your DB.

you want 3 9s of availability for your DBs maybe more.

Then you need 4 9s for your cluster/orchestrator.

If your team can make that cluster, then it makes more sense to put all under one roof then develop a whole new infrastructure with the same level of reliability or more.

GauntletWizard · 2025-08-18T04:44:45 1755492285

This is a persistent myth that is just flat out wrong. Your k8s cluster orchestrator does not need to be online very often at all. The kube proxies will gladly continue proxying traffic as last they best know. Your containers will still continue to run. Hiccups, or outright outages, in the kubi API server do not cause downtime, unless you are using certain terrible, awful, no good, very bad proxies within the cluster (istio, linkerd).

tekno45 · 2025-08-18T05:28:25 1755494905

Your CONTROL PLANE doesn't immediately cause outages if it goes down.

But if your workloads stop and can't be started on the same node you've got a degradation if not an outage.

lukaslalinsky · 2025-08-19T13:16:05 1755609365

What alternatives do you have? No matter which system you are using, database failovers will require external coordination. We are talking about PostgreSQL, so that normally means something like Patroni with an external service (unless you mean something manual). I find it easier to manage just one such service, Kubernetes, and using it for both running the database process as well as coordinating failovers via Patroni.

GauntletWizard · 2025-08-18T16:20:31 1755534031

Yes, but that's workloads || operator, not workloads && operator - you don't need four nines for your control plane just to keep your workloads alive. Your control plane can be significantly less reliable than your workloads, and the workloads will keep serving fine.

In real practice, it's so cheap to keep your operator running redundantly, that it's probably going to have more nines than your workloads, but it doesn't need to be

tekno45 · 2025-08-18T16:48:17 1755535697

You're assuming a static cluster.

In my world scaling is required. Meaning new nodes and new pods. Meaning you need a control plane.

Even in development, no control plane means no updates.

In production, no scaling means im going to have a user facing issue at the next traffic spike

GauntletWizard · 2025-08-18T17:04:05 1755536645

I am 100% certain I live more in that world than you; You can check my resume if you want to get into a dick waving contest.

What I'm saying is that the two probabilities are independent, possibly correlated, but not dependent. You need some number of nines in your control plane for scaling operations. You need some number of nines in your control plane for updates. These are very few, and they don't overly affect the serving plane, so long as the serving plane is itself resilient to the errors that happen even when a control plane is running, like sudden node failure.

Proper modeling of these failure conditions is not as simple as multiplying probabilities. The chance of failures in your serving path goes up as the time between control plane readiness goes up. You calculate (Really, only ever guesstimate, but you can get some good information for those guesses) the probability of a failure in the serving plane (incl. increases in traffic to the point of overload) before the control plane has had a chance to take actions again, and you worry about MTTF and MTBR of the control plane more than the "Reliability" - You can have a control plane with 75% or less "uptime" by action failure rate but that still takes actions on a regular cadence and never notice.

You can build reliable infrastructure out of unreliable components. The control plane itself is an unreliable component, and you can serve traffic at massive scale with control planes faulty or down completely - Without affecting serving traffic. You don't need more nines in your control plane than your serving cluster - That is the only point I am addressing/contesting. You can have many, many less and still be doing right fine.

qskousen · 2025-08-18T00:19:05 1755476345

I also would like to know this, I was just told that databases should be outside the cluster a couple days ago by someone with a decade of K8s experience.

5Qn8mNbc2FNCiVV · 2025-08-19T02:06:40 1755569200

Well, CloudnativePG exists and it works really really well. At some point if you can afford to have someone manage your databases separately from your applications, you can think about putting it outside the cluster but I'd wager at some point you've got enough experience with running your DB with an operator that you can keep running it in the cluster.

Atotalnoob · 2025-08-18T02:37:31 1755484651

Generall, yes.

Unless you have a dedicated team to do the stuff for you.

Crunchydata is a good starting point