The point here is they are strapping a supposedly non-agentic LLM into a new test rig and are able to observe agentic behaviors.
It’s very obviously not claiming that this is impressive from a gaming SOTA perspective. It’s just surprising that ChatGPT can do this sort of thing.