Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As a benchmark, why do you find the 'opinion' of an LLM useful? The question is completely subjective. Edit: Genuinely asking. I'm assuming there's a reason this is an important measure.


Not OP, but likely because that was the only metric/benchmark/however you want to call it OpenAI showcased in the stream and on the blog to highlight the improvement between 4o and 4.5. To say that this is not really a good metric for comparison, not least because prompting can have a massive impact in this regard, would be an understatement.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: