Ok so having worked on distributed consensus a bunch here are a couple thoughts in no particular order:
* In the real world, servers misbehave, like, a lot, so you have to deal with that fact. All assumptions about network robustness in particular will be proven wrong on a long enough timeline.
* Leader election in a world without a robust network is an exercise in managing acceptable failure tolerances. Every application has a notion of acceptable failure rates or acceptable downtime. Discovering that is a non-trivial effort.
Jeff Dean and others at Google famously came to the conclusion that it was ok for some parts of the Gmail service to be down for limited periods of time. Accepting that all self-healing/robust systems will eventually degrade and have to be restored is the first step in building something manageable. The AXD301 is the most robust system ever built by humans to my knowledge (I think it did 20 years of uptime in production). Most other systems will fail long before that. Managing systems as they fail is an art, particularly as all systems operate in a degraded state.
In short, in a lab environment networks function really well. In the real world, it's a jungle. Plan accordingly.
To add to that: "uptime" is a sort of misleading statistic. A single server can have an "uptime" of 20 years sitting in the closet of an office in a corporate park somewhere in suburban Maryland. To the point that nobody can find the server, but they can ssh to it, and yup... it's still up.
On the other hand, the more components (servers, disks, switches, etc) you have, the higher the probability of failure. The more traffic, more data, more changes more... anything, tends to lead to more failure. So paradoxically, small systems can have better "uptime" than large robust well-designed ones.
In my experience managing large distributed decentralized systems, the more buggy the hardware, the larger the cascades of software failures, the harder it was to fix. Industrial-grade hardware that nobody monkeyed with (or was replaced as soon as any failures were detected) led to the most stable systems. Reliable, dedicated private network paths led to more stability. At a certain point we didn't really need leader election, because they'd never need a new leader, because things hardly ever failed.
If you just take one device, it can either be completely functional and chug along for 20 years, or completely fail, which happens rarely but is painful. In either case, your software can ignore the failure state, there's nothing it can do at the case of a failure.
If you take 1000 such devices, then the vast majority will be fine, but a small number of them will fail. No matter how you fix them and replace them, you will keep seeing some degraded nodes, or temporarily disconnected nodes because the network has glitches, too. So your software has to always be ready to face a failure of a remote node, and always be ready to gracefully handle it. This completely changes the way you think.
Good point, Consensus at high data scales is not achievable. We are building a cloud-native log analytics solution at logiq.ai. Our initial design had RAFT at the core. Though it is relatively easy to implement when we started hitting high data rates (50+ Gb per hour), the RAFT APIs started to slow down the entire system. We removed it and made each pod stateless, which allowed us to scale exponentially. Now we can go from 1 Gb per hour to hundreds with a single `kubectl scale sts` command. Eventual consistency is the way forward for applications that generates high volumes of data.
Of course, some applications do require Consensus; for those, I would start with RAFT as a starting point.
I don't agree with this. Eventually consistent was majorly hyped a while ago, but these days I don't think that there's widespread consensus on this issue. The complexity of getting an eventually consistent system working correctly is often much higher than in a system with a leader.
Most large real-time systems (or near enough) must be eventually consistent due to speed of light delays between all nodes of the system.
Multiplayer video games have big problems with light speed. A user could be 100ms away from the server, and another 100ms in the other direction, so anything one user does, the other will see 200ms later. If the game wasn't eventually consistent, it would need to be a turn based game as 5 actions per second is too slow as most real-time games run 60-100 actions per second.
A bank account is also eventually consistent as transactions need several days to clear, causing the exact balance to be unknown. This is why a bank establishes an "available" balance as that is the most pessimistic estimation of an accounts balance.
Like with all things, designing a system as eventually consistent or as a leader-type really depends on the application, the team that is going to build it and the resources available to support it.
* In the real world, servers misbehave, like, a lot, so you have to deal with that fact. All assumptions about network robustness in particular will be proven wrong on a long enough timeline.
* Leader election in a world without a robust network is an exercise in managing acceptable failure tolerances. Every application has a notion of acceptable failure rates or acceptable downtime. Discovering that is a non-trivial effort.
Jeff Dean and others at Google famously came to the conclusion that it was ok for some parts of the Gmail service to be down for limited periods of time. Accepting that all self-healing/robust systems will eventually degrade and have to be restored is the first step in building something manageable. The AXD301 is the most robust system ever built by humans to my knowledge (I think it did 20 years of uptime in production). Most other systems will fail long before that. Managing systems as they fail is an art, particularly as all systems operate in a degraded state.
In short, in a lab environment networks function really well. In the real world, it's a jungle. Plan accordingly.