Interesting article, here are some notes from my reading of it.
First, in the "The same application with an Eventually Consistent store", you don't actually define the semantics of the system, so there is no way to say how it relates to CAP. Eventually consistent stores, like the original Dynamo, are designed to be AP systems, which is why they have things like read-repair. As long as you don't have k failures (i.e., all replicas are down or partitioned), you can always read from a replica, but you may not get consistent data. You can argue here that k-safety is not the same as the A in CAP, but that isn't the generally taken approach (and, if true, it means there's no such thing as an A system, of course, since you can always postulate that ALL networking in the world has simultaneously been cut).
Second, your argument about two-phase commit is combining two different arguments. First, the standard 2PC algorithm does not allow for heuristic commits. That algorithm is a CP algorithm (it is, in fact, referred to as the "unavailable protocol" due to it's inability to make progress in any failure conditions). If you add in heuristic commits, then it becomes a disaster! I can imagine some situations where such a protocol would be useful, but not if you care at all about your data consistency. In that case, why use 2PC at all? Your conclusion is correct in that case: the resulting algorithm is not C or A.
Third, your confusion about CA is that you are trying to apply it to systems, like datacenters full of commodity servers and switches, that don't really fit it. Imagine, instead, a system of two computers connected by multiple, redundant NICs and cables, located next to each other and communicating by sending messages. It isn't hard to believe that you could build a system, like that, in which you could add algorithms that were CA: in other words, if someone cuts the cable, then all bets are off, but as long as those cables remain uncut, the algorithm guarantees consistency and availability. A better example is cache coherency protocols in modern processors, which are incredibly complex distributed algorithms working on CPUs that communicate over tightly integrated channels. Cache coherency protocols need to be CA, of course! If you somehow managed to severe the communication links without destroying the motherboard, you could break the algorithms assumptions, but that wouldn't make it any less of a CA algorithm.
EDIT: Just to be completely clear: the CAP theorem only gives you a very small amount of information about a distributed system. As the author notes, multiple times, just because a system does not meet the "A" in CAP, does not mean it isn't available. It could be 99.99999999% available, and still not meet the A in CAP. The same is true of "C". That's what makes CAP less useful for actually designing distributed systems: many of the choices you have to make come down to what you do when the system is not available, or not consistent.
> Second [...] In that case, why use 2PC at all?
Heuristic decisions are in the XA standard since its first version (1991). YMMV, but they are very often used (to say the least) in 2PC production systems. See for example how Mark Little describes them: http://planet.jboss.org/post/2pc_or_3pc. Not really presented as an optional thing.
It shows traditional databases are not that 'CP at all cost' when it comes to managing failure.
> Your conclusion is correct [...] the resulting algorithm is not C or A
Yeah... I see 2PC as not partition-tolerant as a partition breaks acid-atomicity. Once partition intolerance is accepted, CA fits well: 2PC is consistent & available until there is a partition. Saying '2PC is not consistent and not available but is partition tolerant' is not false technically but it's a much less accurate description.
> Third [...] datacenters full of commodity servers [...]
> redundant NICs and cables, located next to each other
> [...] It isn't hard to believe that you could build a system, like that,
> in which you could add algorithms that were CA
I just totally agree with you here. CAP as a categorization tool is used for all types of distributed systems, but there is a huge difference between an application built on commodity hardware running in a multi-dc config and a 2 nodes system running on redundant hardware in a single rack.
Typically, 2PC is historically used on very specific applications: few nodes, a limited number of operations between the nodes, expensive redundant hardware all over the place, limited scaling out needs (if any). Not your typical big data system.
Apologies, as usual with the CAP theorem, I made an error here. The Gilbert/Lynch paper specifically says that availability only counts for servers that actually receive requests, so if all the network links in the world are cut, you will still be available by this definition. So you can ignore the last two sentences of my first paragraph. That paragraph should say:
First, in the "The same application with an Eventually Consistent store", you don't actually define the semantics of the system, so there is no way to say how it relates to CAP. Eventually consistent stores, like the original Dynamo, are designed to be AP systems, which is why they have things like read-repair. If all your nodes have failed, or all network links have been cut, you are available because there are no requests to which to respond. Otherwise, you can always get a response, and are therefore available, but that response may not be the "latest" response, so it may not be consistent (see Herlihy on Linearizability for the full definition here).
First, in the "The same application with an Eventually Consistent store", you don't actually define the semantics of the system, so there is no way to say how it relates to CAP. Eventually consistent stores, like the original Dynamo, are designed to be AP systems, which is why they have things like read-repair. As long as you don't have k failures (i.e., all replicas are down or partitioned), you can always read from a replica, but you may not get consistent data. You can argue here that k-safety is not the same as the A in CAP, but that isn't the generally taken approach (and, if true, it means there's no such thing as an A system, of course, since you can always postulate that ALL networking in the world has simultaneously been cut).
Second, your argument about two-phase commit is combining two different arguments. First, the standard 2PC algorithm does not allow for heuristic commits. That algorithm is a CP algorithm (it is, in fact, referred to as the "unavailable protocol" due to it's inability to make progress in any failure conditions). If you add in heuristic commits, then it becomes a disaster! I can imagine some situations where such a protocol would be useful, but not if you care at all about your data consistency. In that case, why use 2PC at all? Your conclusion is correct in that case: the resulting algorithm is not C or A.
Third, your confusion about CA is that you are trying to apply it to systems, like datacenters full of commodity servers and switches, that don't really fit it. Imagine, instead, a system of two computers connected by multiple, redundant NICs and cables, located next to each other and communicating by sending messages. It isn't hard to believe that you could build a system, like that, in which you could add algorithms that were CA: in other words, if someone cuts the cable, then all bets are off, but as long as those cables remain uncut, the algorithm guarantees consistency and availability. A better example is cache coherency protocols in modern processors, which are incredibly complex distributed algorithms working on CPUs that communicate over tightly integrated channels. Cache coherency protocols need to be CA, of course! If you somehow managed to severe the communication links without destroying the motherboard, you could break the algorithms assumptions, but that wouldn't make it any less of a CA algorithm.
EDIT: Just to be completely clear: the CAP theorem only gives you a very small amount of information about a distributed system. As the author notes, multiple times, just because a system does not meet the "A" in CAP, does not mean it isn't available. It could be 99.99999999% available, and still not meet the A in CAP. The same is true of "C". That's what makes CAP less useful for actually designing distributed systems: many of the choices you have to make come down to what you do when the system is not available, or not consistent.