>its private for outsiders, but it was developed in "collaboration" with OAI, and GPT was tested in the past on it, so they have it in logs somewhere.
They have logs of the questions probably but that's not enough. Frontier Math isn't something that can be fully solved without gathering top experts at multiple disciplines. Even Tao says he only knows who to ask for the most difficult set.
Basically, what you're suggesting at least with this benchmark in particular is far more difficult than you're implying.
>If you think this entire conversation is pointless, then why do you continue?
There's no point arguing about how efficient the models are being (the original point) if you won't even accept the results of the benchmarks. Why i'm continuing ? For now, it's only polite to clarify.
> Frontier Math isn't something that can be fully solved without gathering top experts
Tao's quote above referred on hardest 20% problems, they have 3 levels of difficulty, presumably first level is much easier. Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
> There's no point arguing
Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
The lowest set is easier but still incredibly difficult. Top experts are no longer required sure but that's it. You'll still need the best of the best undergrads at the very least to solve it.
>Also, as I mentioned OAI collaborated on creating benchmark, so they could have access to all solutions too.
Open AI didn't have any hand in providing problems, why you assume they have the solutions I have no idea.
>Lol, let me ask again, why you are arguing then? Yes, I have strong reasonable(imo) doubt that those results are valid.
Are you just bring obtuse or what ? I stopped arguing with you a couple responses ago. You have doubts? good for you. They don't make much sense but hey, good for you.
If you don't want to take the benchmarks at face value then good for you but this entire conversation is pointless.