I'm surprised they wasted the time and effort to test this instead of just deducing the outcome. Most human jobs that we think we can solve with AI actually require AGI and there is no way around that.
You kinda need different perspectives and interactions to help build something.
E.g. the DARPA engineers thought they had their problem space solved but then some marines did some unexpected stuff. They didn't expect the unexpected, now they can tune their expectations.