Hacker News new | past | comments | ask | show | jobs | submit login

This announcement makes we wonder if we are approaching a plateau in these systems. They are essentially claiming close to parity with gpt-4, not a spectacular new breakthrough. If I had something significantly better in the works, I'd either release it or hold my fire until it was ready. I wouldn't let openai drive my decision making, which is what this looks like from my perspective. Their top line claim is they are 5% better than gpt-4 on an arbitrary benchmark in a rapidly evolving field? I'm not blown away personally.



I don’t think we can declare a plateau just based on this. Actually, given that we have nothing but benchmarks and cherry picked examples, I would not be so quick to believe GPT-4V has been bested. PALM-2 was generally useless and plagued by hallucinations in my experience with Bard. It’ll be several months till Gemini Pro is even available. We also don’t know basic facts like the number of parameters or training set size.

I think the real story is that Google is badly lagging their competitors in this space and keeps issuing press releases claiming they are pulling ahead. In reality they are getting very little traction vs. OpenAI.

I’ll be very interested to see how LLMs continue to evolve over the next year. I suspect we are close to a model that will outperform 80% of human experts across 80% of cognitive tasks.


> It’ll be several months till Gemini Pro is even available.

Pro is available now - Ultra will take a few months to arrive.


How could you possibly believe this when the improvement curve had been flattening. The biggest jumps were GPT-2 to GPT-3 and everything after that has been steady but marginal improvements. What you’re suggesting is like people in the 60s seeing us land on the moon and then thinking Star Trek warp drive must be 5 years away. Although people back in the day thought we’d all be driving flying cars right now. I guess people just have fantastical ideas of tech.


It is hard to quantify, but subjectively (and certainly in terms of public perception), each GPT release has been a massive leap over the previous model. Maybe GPT-2 to GPT-3 was the largest, but im not sure how you're judging that a field is stagnating based on one improvement in a series of revolutionary improvements being slightly more significant than the others. I think most would agree the jump from GPT-3 to GPT-4 was not marginal, and I think i'll be borne right when the jump from GPT-4 to GPT-5 isn't either. There may be a wall, but i dont't see a good argument that we've hit it yet. If GPT-5 releases and is only marginally better that will be evidence in that direction, but i'm pretty confident that won't happen.

Your analogy is odd because you're just posing a situation that is analgous to what the situation would look like if you turned out to be right. From the rate of improvement recently, i'd say we're more at the first flight test stage. Yes, of course the jump from a vehicle that can't fly to one that can is in some sense a 'bigger leap' than others in the development cycle, but we still eventually got to the moon.


I hope you're right, because it would be far more entertaining. More realistically, if you look at people's past predictions of the future, well.. You already know. AI people in the 60s also thought AGI was just around the corner, especially when they started playing chess and other games. Maybe we're not better than them at predicting things, but every generation thinks they're right.


Don't look at absolute number, instead think of it in terms of relative improvement.

DocVQA is a benchmark with a very strong SOTA. GPT-4 achieves 88.4, Gemini 90.9. It's only 2.5% increase, but a ~22% error reduction which is massive for real-life usecases where the error tolerance is lower.


This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.


Yes, a lot of those have pretty egregious annotation mistakes. Once you get in high percentage it's often worth going through your dataset with your model prediction and compare. Obviously you can't do that on academic benchmarks (though some papers still do).


In my opinion the field is not that rapidly advancing. The major breakthroughs, where something was really much better than everything before were the following:

GPT-2 February 2019

GPT-3 June 2020

CPT-3.5 December 2022

GPT-4 February 2023

Note that GPT-3 to GPT4 took almost 3 years!


That seems like a pretty remarkable pace of innovation, no?


Yea but some people seem to expect that GPT-4 should follow soon after GPT-4 because GPT-4 followed soon after ChatGPT.


GPT-4 was done training 8 months before release, so 2 years


Interesting, but hard to conclude just from one datapoint. An alternate interpretation is that, given how far Bard lagged behind GPT until this moment, it's a stunning advancement.


I expect a plateau in depth before breadth.

Breadth for example means better multi-modality and real-world actions/control. These are capabilities that we haven't scratched the surface of.

But improving depth of current capabilities (like writing or coding) is harder if you're already 90% of the way to human-level competence and all of your training data is generated by human output. This isn't like chess or go where you can generate unlimited training data and guarantee superhuman performance with enough compute. There are more fixed limitations determined by data when it comes to domains where it's challenging to create quality synthetic data.


It's a PR release. Probably Sundai needs to meet some objective by end of year.


> Their top line claim is they are 5% better than gpt-4 on an arbitrary benchmark in a rapidly evolving field?

Their top line claim is multimodality.


Plateau is largely in hardware, next generation of accelerators with more memory will enable larger models and so on.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: