Next step might be two/more GPT instances having internal dialogue, trying to achieve a consensus before giving its answer. Would also give us opportunity to peek inside its inner workings.
Or a layered system. The surface instance takes the question and produce an internal "thought" out of it, a statement or question itself, and submits it to the layer underneath. This second instance works out a response / text extension to the statement/query from the surface, submits it up, and the surface layer generates the final answer by answering or extending the response of the sub-layer.
This can be extended down or wide, with several instances representing different thinking modalities and the final answer being generated from all of them. Basically reproducing a neural net architecture but at the higher abstraction level.