Pro is approximately in the middle between GPT 3.5 and GPT 4 on four measures (MMLU, BIG-Bench-Hard, Natural2Cod, DROP), it is closer to 3.5 on two (MATH, Hellaswag), and closer to four on the remaining two (GSM8K, HumanEval). Two one way, two the other way, and four in the middle.
So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.
> at least if you assume the benchmarks to be of equal significance.
That is an excellent point. Performance of Pro will definitely depend on the use case given the variability between 3.5 to 4. It will be interesting to see user reviews on different tasks. But the 2 quarter lead time for Ultra means it may as well not be announced. A lot can happen in 3-6 months.
So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.