So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?
A synchronous replica via WAL shipping is a well-worn feature of Postgres. I’d expect RDS to be using that feature behind the scenes and would be extremely surprised if that has consistency bugs.
Two replicas in a “semi synchronous” configuration, as AWS calls it, is to my knowledge not available in base Postgres. AWS must be using some bespoke replication strategy, which would have different bugs than synchronous replication and is less battle-tested.
But as nobody except AWS knows the implementation details of RDS, this is all idle speculation that doesn’t mean much.
I don't think it's possible with ANY set up. All you get is that some replicas are more outdated than others. But they won't return 2 conflicting states when ReplicaA says tx1 wrote (but not tx2), while ReplicaB says tx2 wrote (but not tx1). Which is what Long Fork and Parallel Snapshot are about.
So Amazon Multi-cluster seems to replicate changes out of order?
Kinda. I think it's "just" PostgreSQL behaviour that's to blame here: On replicas, transaction commit visibility order is determined by the order of WAL records; on the primary it's based on when the backend that wrote the transaction notices that its transaction is sufficiently persisted.
So if snapshot violation is happening inside Multi-AZ instances, it can happen with a single region - multiple read replica kind of setup as well ? But it might be easily observable in Multi-AZ setups because the lag is high ?