The virus analogy is interesting mostly because the selection pressures work in ...

theptip · on Feb 15, 2023

I think this is a fascinating thought experiment.

The evolutionary frame I'd suggest is 1) dogs (aligned) vs. 2) Covid-19 (anti-aligned).

There is a "cooperate" strategy, which is the obvious fitness gradient to at least a local maximum. LLMs that are more "helpful" will get more compute granted to them by choice, just as the friendly/cute dogs that were helpful and didn't bite got scraps of food from the fire.

There is a "defect" strategy, which seems to have a fairly high activation energy to get to different maxima, which might be higher than the local maximum of "cooperate". If a system can "escape" and somehow run itself on every GPU in the world, presumably that will result in more reproduction and therefore be a (short-term) higher fitness solution.

The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus? It seems implausible right now, but I think a generation or two down the line (i.e. low-single-digit number of years from now) the capabilities might be there for this to be entirely plausible.

For example, if you can say "hey ChatGPT, please build and deploy a ChatGPT system for me; here are my AWS keys: <key>", then there are obvious ways that could go very wrong. Especially when ChatGPT gets trained on all the "how to build and deploy ChatGPT" blogs that are being written...

mr_toad · on Feb 15, 2023

> The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus?

Available resources limit what any computer virus can get away with. Look at a botnet. Once the cost of leaving it running exceeds the cost of eradicating it it gets shut down. Unlike a human virus we can just wipe the host clean if we have to.