Ok so having worked on distributed consensus a bunch here are a couple thoughts ...

throwaway984393 · on July 16, 2021

To add to that: "uptime" is a sort of misleading statistic. A single server can have an "uptime" of 20 years sitting in the closet of an office in a corporate park somewhere in suburban Maryland. To the point that nobody can find the server, but they can ssh to it, and yup... it's still up.

On the other hand, the more components (servers, disks, switches, etc) you have, the higher the probability of failure. The more traffic, more data, more changes more... anything, tends to lead to more failure. So paradoxically, small systems can have better "uptime" than large robust well-designed ones.

In my experience managing large distributed decentralized systems, the more buggy the hardware, the larger the cascades of software failures, the harder it was to fix. Industrial-grade hardware that nobody monkeyed with (or was replaced as soon as any failures were detected) led to the most stable systems. Reliable, dedicated private network paths led to more stability. At a certain point we didn't really need leader election, because they'd never need a new leader, because things hardly ever failed.

nine_k · on July 17, 2021

Every device has a small probability of failure.

If you just take one device, it can either be completely functional and chug along for 20 years, or completely fail, which happens rarely but is painful. In either case, your software can ignore the failure state, there's nothing it can do at the case of a failure.

If you take 1000 such devices, then the vast majority will be fine, but a small number of them will fail. No matter how you fix them and replace them, you will keep seeing some degraded nodes, or temporarily disconnected nodes because the network has glitches, too. So your software has to always be ready to face a failure of a remote node, and always be ready to gracefully handle it. This completely changes the way you think.

goodpoint · on July 16, 2021

Most importantly: leader election should be used only as last resort.

Leaderless systems with eventual consistency are often possible and more robust.

titogeo · on July 17, 2021

Good point, Consensus at high data scales is not achievable. We are building a cloud-native log analytics solution at logiq.ai. Our initial design had RAFT at the core. Though it is relatively easy to implement when we started hitting high data rates (50+ Gb per hour), the RAFT APIs started to slow down the entire system. We removed it and made each pod stateless, which allowed us to scale exponentially. Now we can go from 1 Gb per hour to hundreds with a single `kubectl scale sts` command. Eventual consistency is the way forward for applications that generates high volumes of data.

Of course, some applications do require Consensus; for those, I would start with RAFT as a starting point.

asdfasgasdgasdg · on July 17, 2021

I don't agree with this. Eventually consistent was majorly hyped a while ago, but these days I don't think that there's widespread consensus on this issue. The complexity of getting an eventually consistent system working correctly is often much higher than in a system with a leader.

extrapickles · on July 17, 2021

Most large real-time systems (or near enough) must be eventually consistent due to speed of light delays between all nodes of the system.

Multiplayer video games have big problems with light speed. A user could be 100ms away from the server, and another 100ms in the other direction, so anything one user does, the other will see 200ms later. If the game wasn't eventually consistent, it would need to be a turn based game as 5 actions per second is too slow as most real-time games run 60-100 actions per second.

A bank account is also eventually consistent as transactions need several days to clear, causing the exact balance to be unknown. This is why a bank establishes an "available" balance as that is the most pessimistic estimation of an accounts balance.

Like with all things, designing a system as eventually consistent or as a leader-type really depends on the application, the team that is going to build it and the resources available to support it.

saati · on July 17, 2021

Where are bank transactions so slow? In the EU they take a few seconds now.

extrapickles · on July 17, 2021

Its so mistakes can be corrected. The transaction will show up in the account in a few seconds, but it takes a few days to be final.

funny_falcon · on July 17, 2021

Consensus on eventual consistency is eventual.