I wonder if one could extract a "surprisedness" value out of the AI, basically, "the extent to which my current input is not modeled successfully by my internal models". Giving the model a metaphorical "WTF, human, come look at this" might be pretty powerful for those walking cardboard boxes and trees, to add to the cases where the model knows something is wrong. Or it might false positive all the darned time. Hard to tell without trying.
English breaks down here, but the model probably does "know" something more like "If the tree is here in this frame, in the next frame, it will be there, give or take some waving in the wind". It doesn't know that "trees don't walk", just as it doesn't know that "trees don't levitate", "trees don't spontaneously turn into clowns", or an effectively infinite number of other things that trees don't do. What it can do possibly do is realize that in frame 1 there was a tree, and then in frame 2, there was something the model didn't predict as a high-probability output of the next frame.
It isn't about knowing that trees don't walk, but that trees do behave in certain ways and noticing that it is "surprised" that they fail to behave in the predicted ways, where "surprise" is something like "this is a very low probability output of my model of the next frame". It isn't necessary to enumerate all the ways the next frame was low-probability, it is enough to observe that it was logically-not high probability.
In a lot of cases this isn't necessarily that useful, but in a security context having a human take a look at a "very low probability series of video frames" will, if nothing else, teach the developers a lot about the real capability of the model. If it spits out a lot of false positives, that is itself very informative about what the model is "really" doing.
The parent comment spelt this out: because the training data likely included only few instances of walking trees (depending on how much material from the lord of the rings movies was used)
There is no "knowing" in LLMs, and it doesn’t matter for the proposed solution either. Detecting a pattern that is unusual by the certainty of having seen something previously does not require understanding of the pattern itself, if the only required action is reporting the event.
In simple terms: The AI doesn’t need to say, "something unusual is happening because I saw walking trees and trees usually cannot walk", but merely "something unusual is happening because what I saw was unusual, care to take a look?"
The challenge with these systems is that everything is unusual unless trained otherwise, so the false positive rate is exceptionally high. So the systems get tuned to ignore most untrained/unusual things.
I bet they’d have similar luck if they dressed up as bears. Or anything else non-human, like a triangle.