Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Has Llama-3 just killed proprietary AI models? (kadoa.com)
103 points by hubraumhugo on April 21, 2024 | hide | past | favorite | 70 comments


> Meta released Llama-3 only three days ago, and it already feels like the inflection point when open source models finally closed the gap with proprietary models. The benchmarks show that Llama-3 70B matches GPT-4 and Claude Opus in most tasks, and the even more powerful Llama-3 400B+ model is still training.

I'm all for open models, but where do the benchmarks show that?

The official page https://llama.meta.com/llama3/ does not show any comparisons with GPT-4 or Claude Opus

Looking at https://arena.lmsys.org/, Llama-3-70b-Instruct is ranked #5 while current GPT-4 models and Claude Opus are still tied at #1. Meanwhile, Llama-3-8b-Instruct is ranked #14

Would love to be corrected, but either way an article should include sources for these types of claims.


Some Llama3 400B numbers for the April 15th checkpoint are listed at https://ai.meta.com/blog/meta-llama-3/

Direct image link: https://scontent-atl3-2.xx.fbcdn.net/v/t39.2365-6/439015366_...)


It's above Opus in second position in the "English" only category. It probably suffers in the overall score due to poor multilingual ability (afaik 95% of its training data was English only).

Though usual caveat about small sample size applies, as of now the CI is fairly big. It's also not at the level of those two in "Code" category, I hope Meta will give the CodeLlama variant an update again.


How significant is the difference in Elo scores for the end user? Should I use Claude Opus if I already have GPT4, despite its similar score?

Additionally, how much better is Claude for coding?


I use Poe.com, which gives me access to every model available for $20. When a new model comes out it is quickly added to the available list. So, I've been doing some comparison against Claude & GPT4. I still think GPT4 is better at following instructions and giving me the result I'm after.


Came here to say this. Thought I missed something!

400B does look set to meet GPT-4, which will be exciting, but it's not finished yet.


OpenAI is also a moving target and Llama3 is probably due to get a few months of parity before GPT5 wallops it.

To me, it looks like OpenAI is ahead of the competition by 12-18 months, which isn't nothing but the competition is certainly nipping at their heels.


At the risk of this comment aging poorly, I am not convinced there is a magic GPT-5 around the corner, waiting to wallop anybody. I have a suspicion that we’re into diminishing returns with model scaling. All of the recent flagship releases (Gemini Ultra, Claude Opus, GPT-4 updates) have only advanced the frontier a little. Could OpenAI train something another 1-2 orders of magnitude bigger? Perhaps, but then it wouldn’t be economical to deploy it.


Sora is a thing. It literally walloped everything else in the field to the point the second closest video generation model looked like a broken nonfunctional mess compared to it.

GPT5 might not be the thing we are all assuming it would be. It might not just be a chatbot but an actual AI that can "see" and "talk". By that point, I don't care if performance wise it isn't a big improvement over GPT4 (though it should as we know these models improve significantly when trained with another modal of data, e.g. text+image was much better than text only)


Not ahead of Anthropic. Anthropic is trading blows with the best GPT-4 models these days.

For the open weight/source community I'd put it at 3-6 months behind and catching up. Llama 3 405B might catch the community up unless OpenAI has another LLM up their sleeve that makes a significant jump forward, but we haven't seen evidence of that yet.


Anthropic's Claude 3 came out 12 months after GPT4.


GPT-4 has been updated multiple times since then. Anthropic has been trading blows with those updates to GPT-4.


I thought OpenAI said they weren't training gpt5 and were doing more smaller multimodal models


I wonder if that was a case of being literal as in, technically speaking gpt-5 isn’t being _trained_, while omitting that they are working on collecting and preparing the data.


They have recently confirmed that they were working on GPT5, without confirming or denying that they were training it.

I guess it isn't unthinkable that GPT5 is more than a year out given the gap between GPT3 and GPT4 was 3 years. At the same time, the rumourmongers at businessinsider said their sources are expecting a Summer release, which seems plausible. Nobody really knows when it will release besides Sam Altman though.


> Nobody really knows when it will release besides Sam Altman though.

The day after some model truly beats GPT-4, plus or minus 1-2 weeks, is my guess.


what they say and what they do are different things


Yeah I came to post the same thing. It's nothing like gpt4, just great for the size.


If they did or didn't is up for debate, but I'm pretty sure scorched earth was their goal from the start (specifically targeting the giants, Google and OpenAI). They don't want to be #3 in this race.


Every big new tech category in recent years has quickly matured into a duopoly of a closed, proprietary, premium option and a mass market, public, "open" one. OpenAI, Anthropic, Google, Amazon, Microsoft and a ton of other players are battling for the #1 option, but Meta has cleverly theorized that #2 is going to be easier and a better fit for them.


I think it's not just big categories of software, although I'm sure there are niches with no OS option.


There was a clear leader 18 months ago. Then there was a lot of catchup by everyone.

It’s no longer “hard” build a product like ChatGPT + GPT 3.5, and that would include creating the model. There’s a few trade secrets, but it seems like not much beyond that protecting an ~8 month mote.


It's the smartest possible play.

The leader always wants to be a monopoly. The distant runners up can catch up the most ground by playing nice, being open source, and working with other companies to erode the market share that the leader wanted to lock away.

I'm so glad to see this happening. I'm terrified of a single company winning all of AI. It's starting to look like this won't happen and that OpenAI is simply stretched too thin.

If this pattern holds, OpenAI may wind up as a footnote. Their inability to open up and work with others means that competitors and would-be collaborators will choose the open alternatives.

Meta can win that mindshare and be the friendly facilitator and rails that an entire ecosystem of business is built upon. OpenAI will never be that. They're not "open" enough.


Open models weren’t Meta’s original plan. The LLaMA 1 model was only available to Meta-approved researchers until someone leaked the model weights on 4chan in 2023. Meta issued DMCA takedown requests to HuggingFace and GitHub.


I was hoping for more detail in the post but it really just a seems like a quick and easy way to attract eyeballs? Never heard of the platform, and I cant help but think that the logo is a rip of the old okta brand.


Perhaps for now, but I wouldn't count on there always being a Meta spending $$$ to train enormous models and then giving them away for free. What's the long-term game plan for open-source models when the corporate charity inevitably dries up?


If your product is an AI model (OpenAI, Anthropic, etc) you can't give it away for free.

If your product is a social graph w/ ads (Meta), you can.

It's hardly corporate charity:

* Meta releasing these models creates an improvement and tuning ecosystem around it, giving them access to tons of free developer time.

* It's also a strong recruiting tool, for engineers and researchers frustrated by, e.g., Google and OpenAI becoming increasingly closed. They know they can publish at Meta.

* The cost is insignificant. Meta had 30B in revenue just in Q2 2023.


It's great PR. I hear people refer to Mark Zuckerberg as a "real engineer" and other such platitudes and it's doing wonders to reduce the stigma around Meta amongst the tech savvy minority.

Despite Meta being very similar to Google in terms of incentives, and Google nowadays being decidedly uncool. Doesn't hurt that Mark shipped something worth a damn whereas Google has been floundering for ages.


With credit to Google, they were basically neck-and-neck with Facebook AI Research in a lot of ways. Both were publishing text transformers (BERT vs FastText), both were maintaining SOTA inference libraries (Tensorflow vs Pytorch) and both were investing heavily into researching the field further. I'd even argue that Google was the largest contributor to making open-source AI more like Linux and less like a shitty proprietary product.

There's a whole history of recent machine-learning development where both Google and Facebook have worked together and against each other to push things forward. I think it's entirely mistaken to characterize Google as the understudy when in many ways it's the other way around.


I mean I have been watching what Google has been doing in AI with wonder for ages and do agree that they seem to be rather underrated merely because they've recently been caught on the back foot.

At the end of the day though I use Llama and I use GPT4 and I don't use Bard. Google has an amazing legacy around AI but it really hasn't been performing in the last couple of years. I can imagine they'll have a comeback, but one does wonder if Google has lost their mojo.


How is it that Zuckerberg is suddenly an "engineer"? He's a CEO, but just because you run a business, does not mean you actually do any of the underlying technical work. Are they blind to the emperor's clothing?


He's obviously not closing Jira tickets at Meta, but that doesn't mean he's not an engineer. As an example of the positive impact that the Llama releases have had, this post from him has been doing the rounds lately in response to criticsm like yours:

https://www.facebook.com/notes/775294156352065/


Commoditize your complement and all that.


Building open models is a very strong approach to cornering the market on top tier AI researchers. And as other commenters have mentioned, the raw models are not the product - the vast majority of the value is in how they are integrated into useful products.


Exactly, a free standalone AI is an unsustainable product.

You can't offer it for free, unless you make money in another way.


The corporate charity will not dry up. AI makes it easier to generate content, and Meta's in the business of facilitating the sharing of that content. Content is surface area for ads. AI will also make the virtual realities of the "metaverse", as defined by Mark, easier to reify. It's also a giant marketing and recruiting strategy.


AI does make it easier to generate content, but the type of content it lends itself towards the most is spam. Whether a Facebook where the majority of the content is neural net slurry is something that people will want to engage with once they realise what's going on is an open question I think.

Anecdotally it seems like older demographics are the prime target of the current wave of AI engagement farming on Facebook, because they just don't understand that this technology exists now and assume that all of the "photos" they're seeing are real.


> What’s the long-term game plan for open-source models when the corporate charity inevitably dries up?

Once open models reach and stay at near parity for a while, it’ll make sense for commercial downstream users to support open source community efforts rather than building their own, same as has happened in many other categories of key infrastructure software.


Unless Meta’s bet is that, going forward, models themselves won’t be the competitive differentiator, that it will be about integration. They can give away Llama3,4,5 for free, because no one else can put them Whatsapp or whatever.

idk


I guess many here can think a lot 5D chess business strategies. For me it is just Zuck trying to reach greatness, down in history his name will prob pop up when people think about who brought LLM/AI/AGI? to the masses.


Maybe they are commoditizing their complement here.


Maybe, but at some point vague strategic moves like that will need to actually justify themselves at the earnings call. Throwing endless millions of dollars down the drain to what, depreciate the value of another companies product, which doesn't even directly compete with Metas main breadwinners? What do they actually get out of this?

I suppose if there's any consolation for the open-source AI community it's that Meta has demonstrated a willingness to burn a lot of money for unclear benefit, they're still single-handedly keeping the VR industry on life support at the expense of about $4 billion per quarter. A decade on from acquiring Oculus and no closer to making it profitable, if Metas AI efforts get the same treatment then the free models will probably keep coming for a while.


Zuckerberg's reasoning on AI/ML seemed very justifiable to me in the Dwarkesh podcast

IIRC,

- Having a better model is a competitive advantage in fighting against spam - Better models enables facebook itself to understand their code vulnerabilities, better employee productivity etc. - Being in the frontier of open source, keeps them at an advantage in terms of updates from community


Also Meta is in the business of sharing content. Instagram, Facebook, Threads, and WhatsApp are all about sharing content.

They’re highly incentivized to expand the options for generating content. OpenAI and Anthropic have to make money from the generation, which raises the barrier to creation. Meta can say fuck it, let everyone create and well sell ads against what they create.


They spent much more on metaverse


8k context is tiny compared to what’s out there. They promise much larger contexts but until then it can’t even reliably summarize every web page out there.


There are techniques to extend the context window via fine tuning.

The authors claim this method was used to extend Llama 2 to 128k: https://github.com/jquesnelle/yarn


It hasn't. My guess is that 90% of the folks using proprietary models through chat interface have no need or wants to run their own locally nor do they have the hardware. For those of us who wish to run our models and do have the hardware, I would reckon maybe 20% of us using proprietary models will drop them. The latest models released are very good, but I'm not convinced they are GPT/Claude quality yet.


After Apple’s ad tracking blocking, Meta was worth $500B. After they announced they are full in on AI and want to be too they are now worth $1.2T.

So even if they are losing a bit on infra and training investment, Meta is relevant again. They are cool again.

Meta made a huge comeback.


Llama-3 is semi-proprietary though. It's definitely not open source!!

Has anything changed in the last 9 months: https://news.ycombinator.com/item?id=36815255 ? Is there better access to anything more than weights? Can we now train new models using llama? Are we no longer as restricted in use?

Haven't seen any comments questioning the premise here. It seems pretty constrained what we can do and how built-atoppable Llama is. I like the idea of it as a safeguard against control by some giant but I'm not sure if it's a big enough grant of rights to be something we can built atop.


Llama 3 is just as restricted as Llama 2. The licenses and acceptable use policies are almost identical; the only real change consists of added attribution requirements. Products that use Llama 3 have to "prominently display 'Built with Meta Llama 3' on a related website, user interface, blogpost, about page, or product documentation", and models based on Llama 3 have to "include 'Llama 3' at the beginning of any such AI model name."


Commoditization of the AGI models is inevitable. OpenAI is building the next generation of compound AI systems — an LLM OS, if you will. it does more than just predict the next token. It figures out the right function and service to call when needed. That’s where the arms race has moved to. Meta isn’t going to build a free LLM OS. The comparison is apples to oranges.


GPT4 was released a year ago, and trained two years ago. I wouldn’t sleep on OpenAi, especially how confident Sam Altman sounded in his Lex Friedman podcast.

The real race is to AGI anyways, as whoever gets there first will immediately capture 100% of the market.


> I wouldn’t sleep on OpenAi, especially how confident Sam Altman sounded in his Lex Friedman podcast.

How confident Sam Altman sounds doesn’t figure much into my assessment’s of reality, other than the reality of Sam Altman’s promotional skills.

> The real race is to AGI anyways, as whoever gets there first will immediately capture 100% of the market.

AGI has no actual objective definition, and nothing supports this beyond naked conjecture and quasi-religious dogma.


Im just reluctant to jump to conclusions until I see OpenAi’s next hand.

True on AGI, but I guess I meant a “self improvement” Ai? Whether that exists or is a pipe dream is to be seen, but seems like the goal of most these companies.


You would be wrong to accept this hype anyways.

Llama 3 is not beating GPT4 in benchmarks I've seen, and it's not beating it on LLM Arena. That's all that really matters. It needs to beat GPT4 in the benchmarks and leader boards, or it's a nothingburger, as far as OpenAI's dominance goes.


This article will age poorly when GPT-5 is released /s There will be always a market for proprietary models that is the best, though it’s unclear whether that will be OpenAI’s


GPT-4 and Opus are better for complex/precision tasks, and Haiku is cheaper for everything else.

Good 7B/8B models are still really useful but let’s not be hyperbolic.


Wouldn't say killed, but for certain usage cases definitely provided a strong alternative.

The 8B model seems particularly good at summarization tasks.


The brilliance of Meta's strategy here is: if they offer a (F?)OSS model at near-performance compared to leaders, they commoditize the product of their would-be competitors. Meta doesn't have to make money on API calls, but they could face an existential risk if someone else built the everyday AI companion of the future (e.g. users on ChatGPT UI, microsfoft's much advertised "copilot for everyday").

So -- a defensive play with some positive externalities (e.g. developer ecosystem mindshare + roadmap control, ability to use within their own products at cost, without giving up margin to suppliers).


It can also power features in Instagram, etc.


As long as businesses have proprietary processes, there will be proprietary models. An open-source AI cannot know everything.


I tested Llama-3-70b-8192 on Groq against ChatGPT 4, and while Groq ran it super fast, it hallucinated one answer, and didn’t get the logic correct on another question.

So, ChatGPT 4 is still more reliable for my use case. But if I were to want an LLM to process data, summarize, and so forth, Llama-3 on Groq is very fast.

Questions:

Do you know anything about Intel Hala Point?

Groq: bullshit, but admitted it when I called it out. ChatGPT: did a Bing search (it knew what it didn’t know).

Question 2a (separate chat): If you’re in Canada, what’s the best way to use a TFSA?

2b: Okay, if your portfolio has some tech stocks, some cash cows, and some government bonds, which should be allocated to the TFSA?

The reason I chose Question 2 is that most banks are happy to recommend bad products if it benefits them. Llama-3’s answer reflects the bank bullshit. ChatGPT 4 gives the advice your trustworthy and financially savvy friend would give you.

Follow-on questions for Llama-3:

2c: You have it backwards.

2d: Why did you get it backwards? Were you influenced by the glut of “advice” proffered by banks?


So they are doing the "bait and switch" like Google did for market share and in the end Facebook controls the "best" model? This sounds horrible.


«Any headline that ends in a question mark can be answered by the word no.»

— Betteridge's law of headlines (https://w.wiki/3b$V)


Betteridge's corollary:

As an online discussion about a headline that ends in a question mark grows longer, the probability of a citation of Betteridges's law approaches 1.


In the same domain as Godwin's law

"As an online discussion grows longer, the probability of a comparison involving Nazis or Hitler approaches 1."


No


Something something law of headlines... No. As much as I'd like for this to be the case, it's not.

I think proprietary models will gradually become less popular, though, due to their lack of consistency and control.


Not yet, but I imagine at some point it will. Open source, much like the oft maligned Thanos, is inevitable.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: