Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been using 4.5 for the better part of the day.

I also have access to o3-mini-high and o1-pro.

I don't get it. For general purposes and for writing, 4.5 is no better than o3-mini. It may even be worse.

I'd go so far as to say that Deepseek is actually better than 4.5 for most general purpose use cases.

I seriously don't understand what they're trying to achieve with this release.



this model does have a niche use-case: since its so large it does have a lot more knowledge and hallucinates much less. for example as a test question I asked it to list the best restaurants in my small town. and all of them existed. none of the other llms get this right.


I tried the same thing with companies in my industry ("list active companies in the field of X") and it came back with a few that have been shuttered for years, in one case for nearly two decades.

I'm really not seeing better performance than with o3-mini.

If anything, the new results ("list active companies in the field of X") are actually worse than what I'd get with o3-mini, because the 4.5 response is basically the post-SEO Google first page (it appears to default to mentioning the companies that rank most highly on Google,) whereas the o3 response was more insightful and well-reasoned.


That's also a use case where the consensus among those in the know is that you shouldn't be relying on the model's size in the first place.

You know what gets the list of restaurants in my home town right? Llama 3.2 1b q4 running on my desktop with web search enabled.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: