> K8s is just too expensive, complicated, and time consuming to survive. As soon...

0xbadcafebee · on Aug 24, 2022

It's basically a distributed system already! All you need for feature parity with K8s is a parent node sending child nodes commands via its existing API. Everything else is already there.

TakeBlaster16 · on Aug 24, 2022

Yes... kind of. A distributed system all on one machine skips many of the hard parts. How does systemd handle a network partition? How does systemd mount storage or direct network connections to the right nodes? How do you migrate physical hosts without downtime?

0xbadcafebee · on Aug 24, 2022

> How does systemd handle a network partition?

The same way K8s does... it doesn't. K8s management nodes just stop scheduling anything until the worker nodes are back. When they're back the job controller just does exactly what it was doing before, monitoring the jobs and correcting any problems.

> How does systemd mount storage

Systemd has a Mount unit configuration that manages system mounts.

> or direct network connections to the right nodes

The same way K8s does... use some other load balancer. Most people use cloud load balancers pointed at an Ingress (Nginx & IPTables) or NodePort. The latter would simply be an Nginx service (Ingress) listening on port 80/443 and load-balancing to all the other nodes that the service was on, which you can get by having the services send Nginx an update when they start via dynamic config.

> How do you migrate physical hosts without downtime?

The same way K8s does.... stop the services on the node, remove the node, add a new node, start the services. The load balancer sends connections to the other node until services are responding on the new node.

teraflop · on Aug 24, 2022

> The same way K8s does... it doesn't. K8s management nodes just stop scheduling anything until the worker nodes are back

This is not entirely accurate. K8s makes it possible to structure your cluster in such a way that it can tolerate certain kinds of network partitions. In particular, as long as:

* there is a majority of etcd nodes that can talk to each other

* there is at least one instance of each of the other important daemons (e.g. the scheduler) that can talk to the etcd quorum

then the control plane can keep running. So the cluster administrator can control the level of fault-tolerance by deciding how many instances of those services to run, and where. For instance, if you put 3 etcd's and 2 schedulers in different racks, then the cluster can continue scheduling new pods even if an entire rack goes down.

If you assign the responsibility for your cluster to a single "parent" node, you're inherently introducing a point of failure at that node. To avoid that point of failure, you have to offload the state to a replicated data store -- which is exactly what K8s does, and which leads to many of the other design decisions that people call "complicated".

cpuguy83 · on Aug 24, 2022

Like https://github.com/coreos/fleet?

zozbot234 · on Aug 24, 2022

So, like systemd socket activation (which is already there, too)?

0xbadcafebee · on Aug 24, 2022

More like the job controller: https://github.com/kubernetes/kubernetes/blob/master/pkg/con...

It's a loop that schedules jobs. When you create a job it starts them, when one crashes it schedules another, when you delete one it stops them. Systemd already does this for a single node's service units, so all that's lacking is doing this across nodes.

qbasic_forever · on Aug 24, 2022

Oh and what if that parent node dies, now all the children are orphaned and without a leader to tell them what to do? Kind of breaks down.

The problem is a distributed state and consensus. It's not as simple as "just have a leader send nodes commands". There are well known algorithms and systems for dealing with distributed consensus. It might annoy you to find out kubernetes uses one of them (etcd)! Use kubernetes!

0xbadcafebee · on Aug 24, 2022

You don't need it. A single synchronous redundant storage mechanism (e.g. SQL database, S3, SAN/NFS) works fine. But you could use etcd (or Consul) if you wanted. (You also didn't mention what happens when your etcd gets corrupted or loses consensus or its TLS certs expire... :)

K8s is a very "opinionated" system. People think this is the only architecture that works, but of course it isn't true.

qbasic_forever · on Aug 24, 2022

You do need distributed consensus--your 'single synchronous redundant storage mechanism' cannot work without it. You're basically just offloading the distributed system state to this magic system you're inventing. That's fine, and that's exactly what k8s does with etcd. If you want to boil the ocean and reinvent k8s because you don't like its yaml or whatever, that's cool and I hope your employer supports you in those endeavors.

0xbadcafebee · on Aug 25, 2022

This is a very low-effort and dismissive comment, but also incredibly wrong. You do not need consensus to coordinate a distributed system. A master/slave network architecture is designed so that a single node can maintain consistency, or multiple nodes using a state storage mechanism with exclusive locking. Almost all distributed systems for the past... oh, 60 years, have not required consensus for consistency. Only modern ones that utilize Paxos etc made distributed consensus practical.

Then again, 'consistency' is a vague and ill-defined term in distributed computing theory and maybe you were only talking about a consistent distributed decentralized network with consensus.

qbasic_forever · on Aug 25, 2022

How does your state store stay consistent, and scale to a large load of readers and writers?

Yes if you have the trivial case of a single node state store then there is no problem. You scale it by chucking more CPU and resources at it until it can't go any faster.

But in the real world that will only take you so far. How does your state store scale out to multiple machines? How does it handle potentially thousands or in extreme cases hundreds of thousands of writers a second? How do you keep multiple nodes in the state store consistent?

There's your distributed consensus problem.