Unmentioned: there are serious security issues with memory cloning code not designed for it.
For example, an SSL library might have pre-calculated the random nonce for the next incoming SSL connection.
If you clone the VM containing a process using that library, now both child VM's will use the same nonce. Some crypto is 100% broken open if a nonce is reused.
It's important to refresh entropy immediately after clone. Still, there can be code that didn't assume it could be cloned (even though there's always been `fork`, of course). Because of this, we don't live clone across workspaces for unlisted/private sandboxes and limit the use case to dev envs where no secrets are stored.
VMware even has a vSphere Fault Tolerance product that creates a "live shadow instance" of a VM that mirrors the primary virtual machine (with up to 4 vCPUs). So you can do a quick failover in case of an "immediate planned" failover case, but apparently even when the primary DB goes down. I guess this might work when some external system (like a storage array) goes down in the primary, you can just switch to the other VM (with latest memory/CPU state) and replay that I/O there and keep going... But if there's a hard crash of the primary, if it actually does work, then they must be doing lots of reasoning about internal state change ordering & external device side-effect (somewhat like Antithesis, but for a different purpose). Back in the day, they supported only uniprocessor VMs (with something called vLockstep) and later up to 4 vCPUs with something called Fast Checkpointing.
I've always wanted to test this out for fun, by now 15 years have gone by and I've never got to it...
Live migration had some very cool demos. They would have an intensive workload such as a game playing and cause a crash and the VM would resume with 0 buffering.
Isn't this still a concern even if you're not pre-calculating way ahead of time? If you generate it when needed, it could still catch you at the wrong time (e.g. right before encryption, but right after nonce generation)
Unless your encryption and transport protocols are 100% stateless only 1 connection will actually be able to form, even if you duplicate the machine during connection creation.
The problem with pre-computing a bunch and keeping them in memory is brand new connections made post cloning would use the same list of nonces.
Sounds like it would simply be inappropriate to clone & use a VM that's assuming it's data is unique. This would also be true of other conditions, e.g. if you needed to spoof a MAC or IPv6 address & picked one randomly.
The problem is modern software is so fiendishly complicated there almost certainly is stuff like that in the code. The question is where, and does it matter?
I don't really follow, what's the issue with that? The two nodes will encrypt using the same key, so they can snoop at each other's traffic that they send out? Doesn't sound that big of a deal per se.
A nonce is not a key, it's a piece of random that is meant to be used at most once.
If an attacker sees valid nonces on a VM, and knows of another VM sharing the same nonces, then your crypto on both* VMs becomes vulnerable to replay attacks.
For example, an SSL library might have pre-calculated the random nonce for the next incoming SSL connection.
If you clone the VM containing a process using that library, now both child VM's will use the same nonce. Some crypto is 100% broken open if a nonce is reused.