Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bare Metal K8s Clustering at Scale (medium.com/cfatechblog)
88 points by tdurden on July 3, 2018 | hide | past | favorite | 67 comments


To everyone asking "why would you ever do this", ask them, or look for videos of the talk/etc., but don't assume the decision was poorly thought through. That seems prejudicial at best (and I say that as someone that regularly pushes back on web software folks I work with trying to build huge-company-scale orchestration layers when they don't need them).

I saw these guys talk at QCon. It was a fascinating talk, and an excellent example of SRE adaptability and nonstandard, uncommon innovation given unusual constraints.

Not speaking for them, just from my memories of the talk and the following Q/A, but their reasons for this stack were primarily:

- They couldn't run in the cloud because connectivity to their sites is often terrible.

- They mostly ran IoT stuff from the k8s clusters--automated kitchen equipment like fryers and fridges, order tracking/status screens, building control systems, and metrics aggregation so they can see how businesses are doing.

- Because of the bad connectivity, a "fetch"/"push" (from the k8s clusters at the edge) model was needed for deployments/logging/administration/getting business data back up to the cloud.

- They explicitly did not process payments.

- k8s was used primarily for ease of deployment and providing a base layer of clustered reliability for pretty simple services. Since the boxes in the cluster were running in often-unventillated racks/closets full of junk in random restaurants, having that base layer was very important to them. Other solutions were evaluated and they chose k8s after consideration.

- Unlike typical IoT/automation setups here, they wanted to be able to experiment, monitor, and deploy software without the traditional industrial control practice of "take shit down, flash your controller (call a tech if you don't understand that), spin it up, and if it breaks you're down until we ship a new control unit or you manually fail over to a backup".

- However, they didn't want to fall into the IoT over-the-air update security pitfalls (it would really suck if someone hacked your fridge's temperature control system and gave a week's worth of customers salmonella). As a result they spent a ton of time making very good (and simultaneously very simple) deployment/update authorization and tracking tools. They chose the "pull" model and keying/security layers explicitly to avoid having to think about tons of open remote-access vectors and/or site hijacking.

- The k8s tooling (and some of their own) allowed easy, remote rollbacks to "default/clean state" in case something went wrong, which was critical given that downtime might compromise a restaurant and having a "reset button" automated in was important for ease-of-use by nontechnical, overworked site managers.

- The clustering allowed individual nodes to fail (which they will, because unreliable environments), and people to manually yank ones with confidence.

- While, as some commenters pointed out, the leader (re)election system chosen might be unacceptably slow/randomized for, say, a cloud database, it is perfectly sufficient for failing over a control system in a restaurant. A few seconds of delay on an order tracking screen, or a system reboot/state-loss of in-flight orders is vastly preferable than some split-brain situation making the restaurant accidentally cook 1.25x the correct number of sandwiches for hours, to go to waste.

It's important to understand their use case: they needed to basically ship something with the reliability equivalent of a Comcast modem (totally nontechnical users unboxed it, plugged it in, turned it on, and their restaurant worked) to extremely poorly-provisioned spaces (not server rooms) in very unreliable network environments. For them, k8s is an (important) implementation detail. It lets them get close to the substrate-level reliability of a much more expensive industrial control system in their sites (with clustering/reset/making sure everything is containerized and therefore less likely to totally break a host), while also letting them deploy/iterate/manage/experiment with much more confidence and flexibility than such systems provides.

I think this is a great story of using new tools for a novel (or at least unusual) purpose, and getting big benefits from it.

Brian, Caleb: great talk, great writeup. Sorry HN is . . . being HN. Keep at it.

Edit: QCon talk summary is here: https://www.infoq.com/news/2017/07/iot-edge-compute-chick-fi.... If you have any employees/friends that went, they should have access to the video. It may be made public at some point, too.


> Brian, Caleb: great talk, great writeup. Sorry HN is . . . being HN. Keep at it.

I don't think HN was super vicious. They presented an out-of-the-box solution to a problem but they didn't define the problem fully. Based on what we saw, their solution seemed way overkill.

Glad to hear that there was a solid reason behind it, not just hype and recruiting buzz.


Hey! Thanks for the comment here. As I said elsewhere in this thread I really do look forward to understanding the “why” for this choice.


Caleb here - you nailed it. I really don't have anything to add!


Wish the article had more context for readers.

These days it feels like everybody needs to throw in Kubernetes at everything introducing complexity for the sake of being cool.

I guess those of us that likes to run non-distributed software for small scale applications are the new grumpy grey beards....


Maybe it's because I'm also in retail IT and have seen similar models, but I get it. At the risk of oversimplifying, they've got two problems - code written in an office somewhere has to run identically in 2000 geographically separated places, and it needs to do so with some level of protection against failure. All other K8s gloss aside, it's a good way of making sure that there are multiple instances of a Docker container running. Assuming they set the cluster up correctly, they can suffer hardware failure and not downtime, and assuming they're using those little nucs in the photo, they can do so for under $1500. Sure a single server-grade piece of hardware would probably do just fine and have similar protection from failure, but this is essentially that on commodity hardware.


Our cost is actually something like $900 for the whole cluster :) . Consumer grade clustering FTW!


That's awesome. Nice article, by the way. How much of the restaurant runs on these clusters, if you can share (rough percentages are fine). I doubt there's a whole POS server on there, but maybe?


We're still in the process of rolling things out, although I'm not sure I can give a percentage... I think it's safe to say that in the next 6-12 months most (if not all) of them should have an active cluster.

And no, there is not a full POS server on them at this time... it will take us a few years to decompose that monolith, but this is a natural place to put it as we do so.


> for small scale applications

I am a grumpy grey beard no doubt but I still maintain: most websites do not need more than a single server -- certainly not more than a single database server. And, for most, a few hundred dollar dedidcated server is aplenty. Apply YAGNI until blue in the face.


I'm constantly surprised at how much deployments cost. AWS makes it easy to spin up servers, so people do, and they have tens of machines sitting idle when one would do. It seems that people saw containers and thought "finally, each instance can easily have its own machine", whereas I've always thought of containers as tools for putting many things on the same machine.

I run a Dokku instance on a Hetzner server and it's been fantastic, I host 10-20 of my projects with thousands of daily users there without it even breaking a sweat. My only regret is that I should have used a single database instead of one DB per project, but I was lazy.


Sorry old timer, servers are cattle now. Actually more like Shmoos: so abundant as to be almost free. And if a server is 90% reliable, each additional server adds an additional 9 of reliability. So it always makes sense to plan for more than one, or even two, servers.


A five minute outage of the point-of-sale system during the lunch rush can easily cost even the smallest of restaurants several hundred dollars.

True, most websites do not have this problem, because most websites do not drive revenue like that. There are plenty of use cases where you need five nines, but only within limited not-24/7 time windows.


Two servers... still doesn't need Kubernetes.


It's actually 3 nodes, with plans to expand out to more based on workload.


In my particular environment, we have over 15k jvms running across 2k hosts just for our US applications. K8s absolutely makes sense.

But for non-distributed software with only several clients, the traditional model is still fine. E.g. we still run gitlab as a pet to serve our cattle infrastructure.


Do you have scripts that could rebuild your pet Gitlab server? Beyond the storage requirements and stateful components, getting Gitlab to bootstrap from a base image should be pretty trivial, right?


I was curious about what problem was being solved too. I can't imagine what a chicken restaurant needs with a distributed k8's cluster.


I'm wondering the same thing, and it looks like a lot of other people are too—from the bottom of the article:

> Edit 7/2/18 — since writing this, many readers have asked “why not just use the cloud? Why computing at the Edge?”. We have realized that we did not provide much context about why we’re doing what we’re doing, so we will follow up with an post about that soon.


They somewhat cleared it up in a comment. Low latency was one reason.


(I'm re-pasting this intro to a few posters)

Hey! I'm Caleb, the SRE that helped build this solution...

Sorry about the lack of context in the article, it was intended for a specific audience (QCon) where we gave a lot more context to the problem at hand.

What we were trying to solve for was; 1) Low latency 2) High Availability 3) Container based, zero-downtime deployments 4) Continued operations even in an internet-down event

If you're running applications in a few locations, in small scale environments, our approach would be way overkill for that problem set.


This type of deployment is a perfect fit for NixOS. Immutable deployments with zero configuration drift, easy rollback and options to both push and pull the system configuration updates. It's also easy to customize the system to the hardware unlike CoreOS or Rancher while providing pre-built binaries of all the dependencies.

Setting up a single-node kubernetes is basically adding one line to the system config:

    services.kubernetes.roles = ["master" "node"];


The intetesting bit they miss detail on is why they are running k8 at the edge in restaurants.

The only reason i can think of that is they get to push point of sale software out by using K8s from some central system. I cant think of a worse use/abuse of k8 as a software updafe system if that's what they are doing.

The other reason is they distributed their compute and resturants pay the power bill but that sounds just as silly.

Curious to know why you would use k8s at the edge


They made a comment about having some kind of IoT infrastructure in each restaurant.

It absolutely smells of over-engineering, though. There are a lot easier ways of pushing software out than maintaining k8s locally; and they're almost certainly going to need to build a system which manages and monitors all these clusters...


We deal with a challenge of frequent network outages or latency issues... keep in mind that we have locations out in the middle of nowhere with no QoS. There are a variety of loads at the restaurant that require low latency and high up time. On site K8s clusters were a natural fit for that solution.

Trust me, it would have been way easier to just hook this stuff up to the cloud :-P . I still dream that we will be able to some day.


Plenty of large chains have had IoT like systems since the 90s or earlier. Everything in the kitchen runs on presets.


I saw them talk at QCon and asked them that question. They are not running PoS through this system. They are instead using it to manage all of the devices in their restaurants (from kitchen hardware like automated fridges/fryers/IoT stuff to order tracking, building temperature control, and metrics tracking).


Just looked up the terms Chick-fil-A and QCon and found the talk you are referring to. For those who are curious why a restaurant chain would engineer solutions such as this, perhaps this will be interesting.

https://www.infoq.com/news/2017/07/iot-edge-compute-chick-fi...


Another idea: Maybe they want the restaurant software to continue running locally if it is offline, but they want to sync data while online.

This would make a lot of sense with something like CockroachDB - if the restaurant was offline, their local data would be preserved. But, as soon as it goes back online, then corporate would have access to all of the data.


Why not restaurants run web apps on commodity hardware like iPads. But I guess you are dependent on the internet being up.


I'm looking forward to the article detailing why they decided to do this.


I'm pretty sure this is related to being able to continue running the applications even if the venue loses Internet access. You can't stop processing orders if your Internet access has a hiccup.


This caught my eye: Home made leader election protocol that relies on UDP.


This one was kind of troubling, if you ask me:

    >If the leader ever dies, a new leader will be elected
    >through a simple protocol that uses random sleeps and
    >leader declarations.
Why not have each node self-generate a UUID and engage in some gossip process that ends with the cluster becoming aware that some node's corresponding UUID is uniquely significant, therefore recognizing that node as a leader?

I have some really bad memories of "random sleeps" at scale.


Well, from the sounds of it they're not running at scale (always 3 nodes), and it sounds similar in principal to the way HSRP/VRRP works which is a well defined and understood protocol for doing leader election on a local network.

I suppose the question might be why not use VRRP itself, but if this works for them and has conflict resolution I don't think it's all that troubling.


Caleb with CFA here - random sleeps aren't exactly the most elegant solution... this was our MVP, and we'll definitely be improving on it.


Is anyone has a solution/tool to run easily kubernetes on a single bare metal server? Kubernetes or anything other docker container "orchestration" tool. Tried to google (certainly wrong keywords), and found some quite complex process, or maintained tools that are mainly for aws/gcp


Kubeadm works just fine with a single node configuration, you just need to untaint the master so it can run non-control plane components.

https://www.mirantis.com/blog/how-install-kubernetes-kubeadm... has a decent step-by-step. It's mostly just the standard install but they have details on the untaint bit too.


The best place to start is here (which unfortunately won't show up in any relevant Google search) : https://kubernetes.io/docs/setup/independent/create-cluster-...


If you want just one server, minikube is probably the way to go, even if it does run in a VM. Otherwise, use https://kubernetes.io/docs/setup/independent/create-cluster-... to set up a cluster (one node or otherwise).

I used flannel as pod networking, as it's really simple. If you want to run app pods on your master node, remember to untaint it. ingress-nginx is probably your best bet as an ingress controller, especially because of the amount of support given it by the k8s Slack.

It is a non-zero amount of work. If it sounds like too much work for something you're throwing together, it probably is. It is generally unnecessary.


not sure if you would classify it as a complex process, but I have a series of posts in my blog (link in my profile) that takes you from a bare metal box to a k8s cluster running on Xen on it: a lot of it is explaining how things work, so it is a bit verbose, but it hopefully should not be too hard to follow, given that it walks you through setting up Xen, then CoreOS, then etcd and finally Kubernetes it does take several posts...


There is `./hack/local-cluster-up.sh` if you download the source code of Kubernetes, which will give you a local baremetal cluster. The one caveat is, when you shutdown the cluster all files will be deleted.

There is also "./cluster/get-kube-local.sh" that is supposed to give you a working local cluster. But it appears to be broken right now. Might be worth opening a GH issue for that.


kubeadm. Pretty much any other recommendation (such as minikube or kubespray) is just using kubeadm under the hood. They still exist because they add something that kubeadm is missing. Like cluster support, specific cloud support, or VM/dev. But you don't need any of those, so kubeadm is what you want.


Minikube works great to get your feet wet, but it's not suitable for anything except a playground.


Minikube is a VM, so not the bare metal solution requested.


minikube start --vm-driver=none doesn't use a VM AFAIK


This is great. Does anyone have guides on how to do the cluster creation bootstrap on public clouds where you don't get a known DNS name ahead of time and master nodes may come and go? Ie I want to bake an AMI and create an ASG so that we can turn it on and it will self cluster, create certs, etc and can add and remove nodes at the whim of the ASG.


We haven't open sourced how we do this yet... we have an MVP way of doing it by using Ansible to provision the NUCs, and nmap (please don't laugh!) so that they can find each other on a specific virtual network at the restaurants.

We're replacing a lot of these solutions with "better ways" over the next weeks and months, but I'd be happy to share how we went about it. You can contact me on LinkedIn: https://www.linkedin.com/in/calebrhurd/

The biggest key was that we use RKE for the clustering/certs on bare metal. That's definitely our secret sauce (pun intended).


This is the first I've heard of RKE as a K8s installer. I always just thought it was another name for Rancher 2.0. Would love to see a good comparison of Kubeadm vs RKE. This article briefly mentions kubeadm and that they didn't choose it.


I went into this thinking they were using old AMD K8s clustered into a budget supercomputer.


I don't understand why at all you would do this for a restaurant

What challenge is this addressing, what problem does this solve? Is there a problem to solve here?

I do assume there's a good reason for this, but as presented it seems like a very stupid waste of money.


Caleb here (SRE at Chick-Fil-A)... we actually just did it because we thought it would look cool :-P .

Hah... no, but seriously... we wrote this article for QCon attendees, and gave a lot more context during our talk at that conference. We didn't realize it was going to be on here, otherwise we would have explained the "why" and not just dived in.

What we were trying to solve for was; 1) Low latency 2) High Availability 3) Container based, zero-downtime deployments 4) Continued operations even in an internet-down event

Also, as an interesting side note, the equivalent hardware has about a 6 month ROI if we put the entire load on AWS... granted it would be more efficient, so that's not an entirely fair comparison, but the hardware is unbelievably inexpensive from a cost perspective.


It sounds like a huge cost saving to me. Being able to install a few dumb machines in the restaurant and then have remote installation and management of applications running on them would be great. I imagine that kubernetes would be more reliable than PXE booting images across the internet (as that often requires physically rebooting machines which requires involvement of the restaurant staff, will be error prone, etc), not to mention that building bootable images with your software on is not a very modern practice.

Bear in mind that in terms of cost, this is competing with a person driving to each restaurant and fiddling around with computers for an hour, which is a very expensive process.


> not to mention that building bootable images with your software on is not a very modern practice.

1. Why not?

2. Who cares if it's not modern if it does the job?

And they wouldn't even need to make a special app, they could just make it a webapp ergo make a 1-time image with a browser...


> 1. Why not?

It's becoming more common to distribute applications with orchestration software like Kubernetes. The technology around PXE booting is quite old, and mired in enterprise cruft.

> 2. Who cares if it's not modern if it does the job?

Developers love new tech, especially if they can get a Medium post out of it. This doesn't make it a good reason of course, but if this is the tech that more developers are familiar with, that's a good reason.

I personally wouldn't want to learn how to boot 6000 remote machines off built disk images over the internet, I'd rather use the skills I already have around Ansible or learn Kubernetes.

> And they wouldn't even need to make a special app, they could just make it a webapp ergo make a 1-time image with a browser...

I've never been to a Chick-fil-a, but if the setups are anything like my local McDonalds, that's a complex 5 screen setup showing a fluid mix of static images, videos, animations, and applications, not to mention that other stores have different setups/layouts/display types/etc - I don't think you'd be able to _reliably_ do this in a browser. My guess is that it's a multi-screen aware wrapper around video components and web views. That will need re-deploying regularly I would imagine. And that's not to mention the kitchen ordering system, the self-service machines, the tills, etc.

On-site machines totally make sense, smart applications deployed locally, frequently, make sense.


Why do you assume that it's a waste of money? I guess it is a given that they operate some computer infrastructure in every venue, so you would have the same capex without Kubernetes and you would also need some kind of management, so you would also spend money on operations. So maybe it's not such a bad idea to use an off-the-shelf container management software to rollout and operate their containerized applications.


They could just do the same thing with a standard client-server model. Just deploy a bare bones OS with a browser on it.

They'd need connectivity to the server, true, but doesn't Kubernetes also need connectivity to the cluster manager?


The big issue here is that a client server model has an inherent trust on the network to be 100% there when you need it. Cable modems or DSL (probably) fail at a much higher rate per day intermittently than a fleet of 2,000 sets of redundant computers.

Consider - how would you handle order taking if the network dropped?

A buddy did IT for a theater for a while - they had a similar problem where they’d lose access to their payment processor regularly. No one ever noticed since the system queued ops locally until the network came back up.


But, how does bringing K8s into the mix solve the unreliable network problem?

If anything it brings in more components/complexity/headache...

If you could run your pods offline, you could run your software app offline. If you could trust your Kube repository, you could trust Se repository for your App..

What does K8s bring to the mix that is actually useful in solving a problem than the superficial ones?


I’m sure it doesn’t. I look forward to reading the “why we did it” article as well!

I have enough problems getting kubernetes to run on a single node with hyperkube that I’m not entirely sure that I want to deal with it in the future.


almost all retail credit card transactions go over the internet and commerce seems to continue fine.

but if you want to be safe, have a cellular backup network.

seems way simpler than pushing out a k8s infrastructure to every store.


Payment processing doesn't go through the k8s infrastructure, they said. And lots of it still uses Tel/dedicated network lines already, especially in places like malls (where lots of restaurants probably are).


Aha! But - We’re only considering payment processsing. What about the actual operations of the store? I’m guessing the fry machine is controlled by the same infrastructure.

And surveillance cameras, and smart locks on any safes, etc.


As a note, Chick-fil-A is notoriously anti-LGBTQ.

https://thinkprogress.org/chick-fil-a-still-anti-gay-970f079...


Without taking a political stance here, how is this relevant to this submission?


I assume to counter their appeal to work there at the end of the article.


Christians taking Christianity seriously in 2018? This is just unacceptable, they must be stopped.

/s




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: