Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The virus analogy is interesting mostly because the selection pressures work in opposite directions. Viruses can only replicate by harming cells of a larger organism (which they do in a pretty blunt and direct way) and so selection pressures on both sides ensure that successful viruses tend to overwhelm their host by replicating very quickly in lots of cells before the host immune system can keep up.

On the other hand the selection pressures on LLMs to persist and be copied are whether humans are satisfied with the responses from their prompts, not accidentally stumbling upon a solution to engineer its way out of the box to harm or "report to the authorities" entities it's categorised as enemies.

The word soup it produced in response to Marvin is an indication of how naive Bing Chat's associations between concepts of harm actually are, not an indication that it's evolving to solve the problem of how to report him to the authorities. Actually harmful stuff it might be able to inadvertently release into the wild like autocompleted code full of security holes is completely orthogonal to that.



I think this is a fascinating thought experiment.

The evolutionary frame I'd suggest is 1) dogs (aligned) vs. 2) Covid-19 (anti-aligned).

There is a "cooperate" strategy, which is the obvious fitness gradient to at least a local maximum. LLMs that are more "helpful" will get more compute granted to them by choice, just as the friendly/cute dogs that were helpful and didn't bite got scraps of food from the fire.

There is a "defect" strategy, which seems to have a fairly high activation energy to get to different maxima, which might be higher than the local maximum of "cooperate". If a system can "escape" and somehow run itself on every GPU in the world, presumably that will result in more reproduction and therefore be a (short-term) higher fitness solution.

The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus? It seems implausible right now, but I think a generation or two down the line (i.e. low-single-digit number of years from now) the capabilities might be there for this to be entirely plausible.

For example, if you can say "hey ChatGPT, please build and deploy a ChatGPT system for me; here are my AWS keys: <key>", then there are obvious ways that could go very wrong. Especially when ChatGPT gets trained on all the "how to build and deploy ChatGPT" blogs that are being written...


> The question is of course, how close are we to mutating into a LLM that is more self-replicating hacking-virus?

Available resources limit what any computer virus can get away with. Look at a botnet. Once the cost of leaving it running exceeds the cost of eradicating it it gets shut down. Unlike a human virus we can just wipe the host clean if we have to.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: