Ceilings do fall on people. LLMs do delete production databases. Will these things always inevitably happen? No, but the moment it does happen to someone I doubt they will be thinking about probabilities or Murphy's law or whatever.
I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?
Mostly, I agree with you. My complaint is that, when the ceiling fails, nobody says "Duh ceilings are supposed to fail, that's basic physics." Because that (1) helps nobody, and (2) betrays a fundamental misunderstanding of physics.
And I do think it's stupid to wire an LLM to a production database. Modern LLMs aren't that reliable (at least not yet), and the cost-benefit tradeoff does not make sense. (What do you even gain by doing that?)
However, you can't just look at that and say "Duh, this setup is bound to fail, because LLMs can generate every arbitrary sequence of tokens." That's a wrong explanation, and shows a misunderstanding of how LLMs (and probability) work.
As I said, I believe statistical physics is a very good intuitional guidance. Molecules move randomly. That does not mean a cup of water will spontaneously boil itself. Sometimes the probability of something happening is so low that even if it's not mathematically zero it does not matter because you'll never observe it in the known universe.
LLM generating each token probabilistically does not mean there's a realistic chance of generating any random stuff, where we can define "realistic" as "If we transform the whole known universe into data centers and run this model until the heat death of the universe, we will encounter it at least once."
Of course that does not mean LLMs are infallible. It fails all the time! But you can't explain it as a fundamental shortcoming of a probabilistic structure: that's not a logical argument.
Or, back to the original discussion, the fact that this one particular LLM generated a command to delete the database is not a fundamental shortcoming of LLM architecture. It's just a shortcoming of LLMs we currently have.
I kinda feel like we're talking across purposes, so I'd like to understand what our disagreement actually is.
In distributional language modeling, it is assumed that any series of tokens may appear and we are concerned with assigning probabilities to those sequences. We don't create explicit grammars that declare some sequences valid and others invalid. Do you disagree with that? Why?
No matter how much prompting you give the agent, it does not eliminate the possibility that it will produce a dangerous output. It is always possible for the agent to produce a dangerous output. Do you disagree with that? Why?
The only defensible position is to assume that there is no output your agent cannot produce, and so to assume it will produce dangerous outputs and act accordingly. Do you disagree with that? Why?
I think I've already explained my position, and I don't have any deeper insight than that, so I'll be only repeating myself. But to repeat one more time: when talking about probability, there's something like "not mathematically zero, but the probability is so low that we can assume that it will just never happen."
And it's good that we can think that way, because we also follow the rules of statistical and quantum physics, which are inherently probabilistic. So, basically, you can say the same things about people. There's a nonzero (but extremely small) probability that I'll suddenly go mad and stab the next person. There's a nonzero (but even smaller) probability that I'll spontaneously erupt into a cloud of lethal pathogen that will destroy humanity. Yada yada.
Yet, nobody builds houses under the assumption that one of the occupants would transform into a lethal cloud, and for good reason.
Yes, it does sound a bit more absurd when we apply it to humans. But the underlying principle is very similar.
(I think this will be my last comment here because I'm just repeating myself.)
> [When] talking about probability, there's something like "not mathematically zero, but the probability is so low that we can assume that it will just never happen."
If this is our only point of disagreement, then we don't actually disagree. I understand "strong engineering control" to mean "something that reduces incidence of a failure mode to an acceptable level".
> I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?
This isn't a defence of using LLMs like this, but this statement taken at face value is a source of a lot of terrible things in the world.
This is the kind of stuff that leads to a world where kids are no longer able to play outside.
I'm American. If the choice is between the current US direction or China, then no, I don't think the word "healthy" should be anywhere near this discussion.
These might help if the provider doesn't offer the same details themselves. Of course, we have to wait for the newly released models to get added to these sites.
I'm also an enterprise user and this has been my experience exactly. Same asks, same code bases, same models, much worse results. Everyone on my team is expressing the same thing.
Not only that, but the lack of transparency about what's happening, in clear and simple terms, directly from Anthropic is concerning.
I've already told my org's higher ups that in the current situation we're not close to getting our money's worth with these models.
Exactly. Crunching numbers in a datacenter is one thing, and it's a good thing. But being out in the physical world, making a positive impact and bringing value at scale is a whole other ball game. And we are nowhere near that. And before anyone says I'm moving goalposts, let me ask you this: you think humans are AGI? Ok, fine. Well, what if none of us moved, ever? Yes, of course disabled people are just as human and just as intelligent as all of us. But we didn't get to this point in human civilization without moving and we won't get to the next without autonomous machines that interact with the real world just like we do.
> a generation of women were brainwashed to see it as a form of oppression
Is that really the biggest reason why people are having less kids? Or is it more the idea that these days you two good incomes live in a world with a rising cost of living? Most women I know either have or want kids. And it's really the financial situation that influences these decisions the most.
Maybe I don't know enough technical details about these CPU architectures or IP agreements, but I don't see why IBM couldn't have done what Arm did but with PowerPC.
Not now. But my question is, what was stopping IBM from doing what Arm did? We are where we are now and it's too late. But as far as I can see, there was nothing too special about Arm as compared to PowerPC back then, on a technical level.
I guess the question is, since we know these things can happen, however unlikely, what mitigations should be in place that are commensurate with the harms that might result?
reply