Unmentioned: there are serious security issues with memory cloning code not desi...

CompuIves · 2025-04-11T15:45:05 1744386305

Yes, that's right. The Firecracker team has written a fantastic doc about this as well: https://github.com/firecracker-microvm/firecracker/blob/main....

It's important to refresh entropy immediately after clone. Still, there can be code that didn't assume it could be cloned (even though there's always been `fork`, of course). Because of this, we don't live clone across workspaces for unlisted/private sandboxes and limit the use case to dev envs where no secrets are stored.

hedora · 2025-04-11T15:10:46 1744384246

I was about to say you were being paranoid, then I read the article. It hadn’t occurred to me that anyone would be so reckless!

The proposed workflow involves cloning your dev environment and sharing it with the internet.

At most places, that’s equivalent to publishing your production keys, or at least github credentials.

Even for open source projects where confidentiality doesn’t matter, there are issues like using cargo/npm/etc keys to launch supply chain attacks.

Your nonce attack is harder to pull off, but more devastating if the attacker can man in the middle things like dependency downloads.

sunshinekitty · 2025-04-11T14:56:30 1744383390

GCP’s ‘live migrations’ have been doing this for close to a decade or more. Must not be that big of a problem.

londons_explore · 2025-04-11T14:57:36 1744383456

It isn't a problem if you guarantee only one child of the clone lives on - which GCP does.

matt-p · 2025-04-11T15:20:00 1744384800

How do we know that isn't enforced here too?

jsnell · 2025-04-11T16:15:00 1744388100

Because their main selling point is to run the copies concurrently with the original.

oceanplexian · 2025-04-11T19:03:07 1744398187

Live Migration on VMWare has been a thing before Google even had a cloud service.

tanelpoder · 2025-04-11T19:34:00 1744400040

VMware even has a vSphere Fault Tolerance product that creates a "live shadow instance" of a VM that mirrors the primary virtual machine (with up to 4 vCPUs). So you can do a quick failover in case of an "immediate planned" failover case, but apparently even when the primary DB goes down. I guess this might work when some external system (like a storage array) goes down in the primary, you can just switch to the other VM (with latest memory/CPU state) and replay that I/O there and keep going... But if there's a hard crash of the primary, if it actually does work, then they must be doing lots of reasoning about internal state change ordering & external device side-effect (somewhat like Antithesis, but for a different purpose). Back in the day, they supported only uniprocessor VMs (with something called vLockstep) and later up to 4 vCPUs with something called Fast Checkpointing.

I've always wanted to test this out for fun, by now 15 years have gone by and I've never got to it...

https://www.vmware.com/products/cloud-infrastructure/vsphere...

umachin · 2025-04-11T20:01:52 1744401712

VMware has also had a patent on live VM cloning (called it VMfork) for quite a few years now. I worked on the team that built related features. Feature itself was in the desktop product. https://blogs.vmware.com/euc/2016/02/horizon-7-view-instant-...

Live migration had some very cool demos. They would have an intensive workload such as a game playing and cause a crash and the VM would resume with 0 buffering.

dietr1ch · 2025-04-11T19:06:21 1744398381

A neat use case for cloning is not truly duplicating a machine, but moving it from one machine that will go off to another one.

There's caveats in the network though, as packets targeting the old address need to be re-routed until all connections target the new machine.

hypeatei · 2025-04-11T14:46:23 1744382783

> might have pre-calculated the random nonce

Isn't this still a concern even if you're not pre-calculating way ahead of time? If you generate it when needed, it could still catch you at the wrong time (e.g. right before encryption, but right after nonce generation)

zamadatix · 2025-04-11T15:07:02 1744384022

Unless your encryption and transport protocols are 100% stateless only 1 connection will actually be able to form, even if you duplicate the machine during connection creation.

The problem with pre-computing a bunch and keeping them in memory is brand new connections made post cloning would use the same list of nonces.

generalizations · 2025-04-11T14:13:07 1744380787

Sounds like it would simply be inappropriate to clone & use a VM that's assuming it's data is unique. This would also be true of other conditions, e.g. if you needed to spoof a MAC or IPv6 address & picked one randomly.

londons_explore · 2025-04-11T14:17:09 1744381029

The problem is modern software is so fiendishly complicated there almost certainly is stuff like that in the code. The question is where, and does it matter?

generalizations · 2025-04-11T14:32:00 1744381920

And the last question is, can the parts with stuff like that be extracted from the rest and run separately?

perching_aix · 2025-04-11T15:23:35 1744385015

I don't really follow, what's the issue with that? The two nodes will encrypt using the same key, so they can snoop at each other's traffic that they send out? Doesn't sound that big of a deal per se.

Rygian · 2025-04-11T15:46:17 1744386377

A nonce is not a key, it's a piece of random that is meant to be used at most once.

If an attacker sees valid nonces on a VM, and knows of another VM sharing the same nonces, then your crypto on both* VMs becomes vulnerable to replay attacks.

*read: all

nodesocket · 2025-04-11T16:08:26 1744387706

How would a reply attack work in production assuming multiple VMs share a nonce?

saagarjha · 2025-04-11T16:40:46 1744389646

You record the traffic going to one VM and send it to another, which will now accept it because the nonce is the same.

trollied · 2025-04-11T17:30:06 1744392606

“Number ONCE”. NONCE. Indeed.

londons_explore · 2025-04-11T16:30:45 1744389045

Reusing a nonce often allows the entire world to decrypt or MITM the data.