Hacker News new | past | comments | ask | show | jobs | submit login

As I said, it's not only kubectl that has the problem. None of the services implemented by Kubernetes are HA: kubelet, proxy, and the scheduler. For a robust deployment you need these replicated.

Using DNS might work for the K8s services, but at least in version 1.2, SkyDNS was an add-on to Kubernetes. This should really be part of the deployed K8s services. Hopefully newer versions fixed that, I didn't check.

Preferably, the base K8s services implement HA natively. Deploying a separate load balancer is just a workaround around the problem.

FYI Google's Borg internal services implement HA natively. Seems to me the Kubernetes team just wanted to build something quick, and never got around to doing the right thing. But I think it's about time they do it.




I think this was true around kubernetes 1.2, but is no longer the case. etcd is natively HA. kube-apiserver is effectively stateless by virtue of storing state in etcd, so you can run multiple copies for HA. kube-scheduler & kube-controller-manager have control loops that assume they are the sole controller, so they use leader-election backed by etcd: for HA you run multiple copies and they fail-over automatically. kubelet & kube-proxy run per-node so the required HA behaviour is simply that they connect to a different apiserver in the event of failure (via load-balancer or DNS, as you prefer).

kube-dns is an application on k8s, so it uses scale-out and k8s services for HA, like applications do. And I agree that it is important, I don't know of any installations that don't include it.

I think the right things have been built. We do need to do a better job documenting this though!


Great, thanks for the update! I'll update my deployment towards the end of spring, hopefully that's not going to be too painful.


etcd itself cannot be horizontally scaled because of the architecture. etcd's leader model cannot allow you to go beyond a certain number of nodes in cluster. The leader would be overloaded.


I think federation allows to scale horizontally above the limitation of a single etcd cluster. OTOH The fact that zk/etcd/consul are all leader-based is probably the reason flynn "simply" uses postgres




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: