I'm less worried about backdoors accidentally appearing in LLM output and more worried about backdoors being placed into LLM output by 3 letter agencies. Maybe not today, but certainly in a few years time.
Wouldn't it be easier (since they probably have very skilled programmers working for them) and way, way more effective to just set up a team and create a quality open source project with one or two extremely stealthy backdoors?
Or just pay or threaten a struggling company or dev to insert them?
How would you secretly hide something like that in FOSS? And why would that be easier? It's seems to me that it's easier to inject into an existing company than to do all the work yourself. This is what they do with most things as I understand.
Yes, but that was a memory leak, giving access to unauthorized random memory. That is not an intentionally created exploit / backdoor which gives the owner easy access to the victim's system.
That seems pretty risky and easy to catch. The point of these LLMs is to produce code, we know they aren’t very reliable about it, so you have to check the code. So, it is more likely to get inspected than a random GitHub project, right?
It also seems dangerous in the sense that… if there’s a type of prompt that is likely to create infected code, our intelligence agencies would, I guess, want it to hit our adversaries selectively. So they’ll have more rolls of the dice to detect it. So, it is actively creating a situation where our adversaries are more likely to have knowledge of the vulnerabilities.
I agree that it's more likely to be inspected but I think the vast majority of developers aren't inspecting the code rigorously enough (including me, but I don't use LLMs for development) to catch non-obvious bugs, see for example the "Underhanded C contest" [0]
As you've pointed out, this vector would give them near surgical precision and insight into their target's code & systems, rather than casting a wide net with a vulnerable library on Github. They could use a model trained on "underhanded" code or even selectively overwrite parts of the responses with hand-crafted vulnerabilities while only targeting select organizations.
It makes me wonder what the business model of OpenAI and their peers is going to be over the long term. I can't imagine large corporations using "LLM as a service" indefinitely with the risk of IP theft and "bug injection".
the government doesnt need to produce viruses anymore. They have escrow services and remote access to radios, processors, firmware chips. All that technology is leased to private investigators who are private entities and then they go after people using the tools. It allows distance between the government and spying, lower salaies and infrastructure costs.
The greatest danger from LLMs is people who beleive they are receiveing data that hasnt been tampered with when we already know that LLMs are filtered before public use for terms. Imagine a day where kids and adults ask a LLM what the meaning of life is, should they go outside, what happened in WW2, etc.
People could be programmed in a more tailopred fashion than todays facebook shorts and youtube can deliver.
One of the more useful settings I have is: "If the answer cannot be stated as it's been blocked upstream, please just respond with 'The answer has been blocked upstream.'"
I've gotten that a few times and it's nice to know it's not a limitation of the LLM.