> Multi agent collaboration is quite likely the future
Autogen from ms was an early attempt at this, and it was fun to play with it, but too early (the models themselves kinda crapped out after a few convos). This would work much better today with how long agents can stay on track.
There was also a finding earlier this year, I believe from the swe-bench guys (or hf?), where they saw better scores with alternating between gpt5/sonnet4 after each call during an execution flow. The scores of alternating between them were higher than any of them individually. Found that interesting at the time.
Autogen from ms was an early attempt at this, and it was fun to play with it, but too early (the models themselves kinda crapped out after a few convos). This would work much better today with how long agents can stay on track.
There was also a finding earlier this year, I believe from the swe-bench guys (or hf?), where they saw better scores with alternating between gpt5/sonnet4 after each call during an execution flow. The scores of alternating between them were higher than any of them individually. Found that interesting at the time.