And using humans as 'the benchmark' is risky in itself as it can leave us with blind spots on AI behavior. For example we find humans aren't as general as we expected, or the "we made the terminator and it's exterminating mankind, but it's not AGI because it doesn't have feelings" issues.