this model does have a niche use-case: since its so large it does have a lot more knowledge and hallucinates much less. for example as a test question I asked it to list the best restaurants in my small town. and all of them existed. none of the other llms get this right.
I tried the same thing with companies in my industry ("list active companies in the field of X") and it came back with a few that have been shuttered for years, in one case for nearly two decades.
I'm really not seeing better performance than with o3-mini.
If anything, the new results ("list active companies in the field of X") are actually worse than what I'd get with o3-mini, because the 4.5 response is basically the post-SEO Google first page (it appears to default to mentioning the companies that rank most highly on Google,) whereas the o3 response was more insightful and well-reasoned.
I also have access to o3-mini-high and o1-pro.
I don't get it. For general purposes and for writing, 4.5 is no better than o3-mini. It may even be worse.
I'd go so far as to say that Deepseek is actually better than 4.5 for most general purpose use cases.
I seriously don't understand what they're trying to achieve with this release.