They interviewed the agents to ask them about their day, goals, observations, et...

They interviewed the agents to ask them about their day, goals, observations, etc. They then asked a human to watch an agent through the simulation and then answer interview questions as the agent. The human performed worse than the agent in the interview, they didn't compare a human roleplaying against an agent.