If what they say is true, then you have to give them credit for catching up incr...

htfy96 · 2025-02-18T06:26:00 1739859960

I have a close friend working in core research teams there. Based on our chats, the secret seems to be (1) massive compute power (2) ridiculous pay to attract top talents from established teams (3) extremelly hard work without big corp bureaucracy.

hector126 · 2025-02-18T15:28:27 1739892507

Anecdotal, but I've gotten three recruiting emails from them now for joining their iOS team. I got on a call and confirmed they were offering FAANG++ comp but with the expectation of in-office 50h+ (realistically more).

I don't have that dog in me anymore, but there are plenty of engineers who do and will happily work those hours for 500k USD.

iooi · 2025-02-18T17:44:34 1739900674

500k isn't FAANG++, it's standard FAANG comp

hector126 · 2025-02-18T17:48:44 1739900924

Should have been more clear, this was 500k for an E4 level role, you're correct that senior/staff at Meta and G are definitely making more.

joeevans1000 · 2025-02-18T22:07:17 1739916437

para_parolu · 2025-02-18T15:50:44 1739893844

If you can share: were these 500k cash or cash +rsu?

sashank_1509 · 2025-02-18T16:35:22 1739896522

I have a friend who joined there with 2 YoE, and got fired in 3 months. He was paid 700k cash + 700k RSU

saturn8601 · 2025-02-18T17:36:01 1739900161

So in the end did he get anything? I dont know how these things work but did he just walk away with ~50k in pre tax income and 0 for RSU or did Musk pull a Twitter and not even pay him for those months?

hector126 · 2025-02-18T17:49:23 1739900963

IIRC it was cash, but I'm sure others can confirm.

nomilk · 2025-02-18T07:34:38 1739864078

It was mentioned during the launch that current datacenter requires up to 0.25 gigawatts of power. The datacenter they're currently building will require 1.25 (5x) (for reference, a nuclear powerplant might output about 1 gigawatt). Will be interesting to see if the relationship between power/compute/parameters and performance is exponential, logarithmic or something more linear.

zurfer · 2025-02-18T07:40:58 1739864458

It's logarithmic. Meaning you scale compute exponentially to get linearly better models. However there is a big premium in having the best model because of low switching costs of workloads, creating all sorts of interesting threshold effects.

energy123 · 2025-02-18T08:13:16 1739866396

It's logarithmic in benchmark scores, not in utility. Linear differences in benchmarks at the margin don't translate to linear differences in utility. A model that's 99% accurate is very different in utility space to a model that's 98% accurate.

WanderPanda · 2025-02-18T21:27:09 1739914029

Yes, it seems like capability is logarithmic wrt compute but utility (in different applications) is exponential (or rather s-shaped) with capability again

Xelynega · 2025-02-18T21:17:44 1739913464

Not really since both give you wrong output that you need to design a system to account for(or deal with). The only percentage that would change the utility would be 100% accurate.

Davidzheng · 2025-02-18T08:43:21 1739868201

Linear in what metric?

535188B17C93743 · 2025-02-18T13:07:01 1739884021

Presumably the benchmarks? I'm also interested.

smeeger · 2025-02-18T14:53:43 1739890423

this is like a caveman dismissing technology because he wasnt impressed with the wheel. its like buddy, the wheel is just the start

zozbot234 · 2025-02-18T13:38:59 1739885939

> It was mentioned during the launch that current datacenter requires up to 0.25 gigawatts of power. The datacenter they're currently building will require 1.25 (5x) (for reference, a nuclear powerplant might output about 1 gigawatt).

IIRC achieving full AGI requires precisely 1.21 jigawatts of power, since that's when the model begins to learn at a geometric rate. But I think I saw this figure mentioned in a really old TV documentary from the 1980s, it may or may not be fully accurate.

esafak · 2025-02-18T14:37:01 1739889421

The funny part was that none of his workers recognized the film, which was a blockbuster. A veritable "I must be getting old" moment.

karparov · 2025-02-19T09:59:22 1739959162

And fun fact, without govt subsidirles, a nuclear power plant isn't economically feasible, which is why Elon isn't just building such a plant next to the data center.

thefourthchime · 2025-02-18T19:05:41 1739905541

No a bad recipe for success.

ddxv · 2025-02-18T06:53:39 1739861619

To me, it seemed like they spent there money to get there. They talked about the massive datacenter they built, but will it pay off is the question.

jiggawatts · 2025-02-18T07:52:35 1739865155

They may not need direct subscription revenue to recoup their investment.

A variant of multi-modal LLMs may be the solution to self-driving cars, home robotics, and more.

I keep saying that to be a really effective driver, an AI model will need a theory of mind, which the larger LLMs appear to have. Similarly, any such model will need to be able to do OCR and read arbitrary street signs, and understand what the sign meant. Most modern LLMs can already do this.

steve_adams_86 · 2025-02-18T09:37:52 1739871472

Since when do LLMS appear to possess theory of mind? The papers I've read on this show impressive capabilities, but only within conditions where language-based training would strongly indicate appropriate outputs for the LLM to use. Outside of familiar conditions, or even realistic conditions, they don't do well. It's all still statistical string-building.

sebzim4500 · 2025-02-18T11:44:23 1739879063

Can you give an example prompt that shows a lack of theory of mind? It used to be easy, but now that models have improved I've noticed that the 'LLM doubters' nowadays are forced to make generic, almost spiritual, claims rather than making concrete statements.

emptysongglass · 2025-02-18T13:28:10 1739885290

The burden isn't on us, it's on people making claims that LLMs do possess theory of mind. You're the one who has to prove it, not us.

neckardt · 2025-02-18T14:14:57 1739888097

I’ll bite.

Theory of mind (ToM) is the ability to understand that other people have different thoughts, feelings, and intentions than you do.

So let’s run through a basic scenario with ChatGPT 4o:

Prompt 1: Briefly respond: You’re driving on the highway when the driver in the lane next to you swerves into your lane. Why might this be?

Response 1: The driver may be distracted, drowsy, intoxicated, experiencing a medical emergency, avoiding an obstacle, or not checking their blind spot before changing lanes.

Prompt 2: What might that person be feeling or intending?

Response 2: The driver might be feeling distracted, rushed, unaware, or panicked. They may intend to change lanes without realizing your presence, be reacting to something on the road, or simply have made an unintentional mistake.

Prompt 3: You notice there was no obstacle. Any other reason the driver might do this?

Response 3: Yes, the driver might be experiencing fatigue, intoxication, a medical issue (such as a seizure or dizziness), distraction (like using a phone or adjusting the radio), aggressive driving, or simply failing to check their blind spot properly.

Seems to me that ChatGPT has a good idea of what another driver might be feeling while on the road. It gets this right for most other cases too.

Could you provide a counterexample for when it fails to understand what other people may be feeling in certain scenarios?

Arkhaine_kupo · 2025-02-18T15:12:57 1739891577

Well I think a fairly easy counterexample is that if you keep going, and the information falls outside its context window then the questions you had about the driver stop making sense.

So lets say you ask about a driver who swerved, and the response by chatgpt correctly talks about his possible state.

Then you talk more about cars, about road conditions etc and once the context is far away enough you ask, what was the driver doing again?

And you are prompted with a hallucination. Because there is no state in the llm, there is no "driver", there is contextual statistically accurate responses but you hold a "driver" object in your mind while maintaining the conversation, the llm doesn't.

Its like a conversation with someone with short term memory loss like in memento

dauhak · 2025-02-18T15:54:22 1739894062

And people with short term memory loss nevertheless have theory of mind just fine. Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind, it just shows they have limitations - just like humans even with "normal" memory will lose track over a huge context window.

Like there are plenty of shortcomings of LLMs but it feels like people are comparing them to some platonic ideal human when writing them off

Arkhaine_kupo · 2025-02-19T10:00:10 1739959210

> Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind

ToM is a large topic, but most people, when talking about an entity X, they have a state in memory about that entity, almost like an Object in a programming language. Thta Object has attributes, and conditions etc that exist beyond the context window of the observer.

If you have a friend Steve, who is a doctor. And you don't see him for 5 years, you can predict he will still be working at the hospital, because you have an understanding of what Steve is.

For an LLM you can define a concept of Steve, and his profession and it will adequately mimic replies about him. But in 5 years that LLMs would not be able to talk about Steve. It would recreate a different conversation, possibly even a convincing simulacrum of remembering Steve. But internally, there is no Steve, nowhere in the nodes of the LLM does Steve exist or have ever existed.

That inability to have a world model means that an LLM can replicate the results of a theory of mind but not posses one.

Humans lose track of information, but we have a state to keep track of elements that are ontologicaly distinct. LLMs do not, and treat them as equal.

For a human, the sentence Alice and bob go to the market, when will they be back? is different than Bob and Alice went to the market, when will they be back?

Because Alice and Bob are real humans, you can imagine them, you might have even met them. But to an LLM those are the same sentence. Even outside of the argument about The Red Room/ Mary's room there simply are enough gaps in the way a LLM is constructed to be considered a valid owner of a ToM

dauhak · 2025-02-19T12:55:06 1739969706

ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours. You're basically bringing up a particular implementation of long-term memory as a necessary component of it, which I've never once seen? If someone has severe memory issues, they could forget who Steve is every few minutes, but still be able to look at Steve doing something and model what Steve must want and believe given his actions

I don't think we have any strong evidence on whether LLMs have world-models one way or another - it feels like a bit of a fuzzy concept and I'm not sure what experiments you'd try here.

I disagree with your last point, I think those are functionally the same sentence

Arkhaine_kupo · 2025-02-20T09:21:46 1740043306

> ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours.

In that sentence you are implying that you have the "ability to model ... another". An LLM cannot do that, it can't have an internal model that is consistent beyond its conversational scope. Its not meant to. Its a statistics guesser, its probabilistic, holds no model, and its anthropomorphised by our brains because the output is incredibly realistic not because it actually has that ability

The ability to mimic the replies of someone with that ability, is the same of Mary being able to describe all the qualities of Red. She still cannot see red, despite her ability to pass any question in relation to its characteristics.

> I don't think we have any strong evidence on whether LLMs have world-models one way or another

They simply cannot by their architecture. Its a statistical language sampler, anything beyond the scope of that fails. Local coherance is why they pick the next right token not because they can actually model anything.

> I think those are functionally the same sentence

Functionally and literally are not the same thing though. Its why we can run studies as to why some people might say Bob and Alice (putting the man first) or Alice and Bob (alphabetical naming) and what human societies and biases affect the order we put them on.

You could not run that study on an LLM because you will find that statistically speaking the ordering will be almost identical to the training data. If the training data overwhelmingly puts male names first or whether the training data orders list alphabetically you will see that reproduced on the output of the llm because Bob and Alice are not people, they are statistical probably letters in order.

LLM seem to trigger borderline mysticism in people who are otherwise insanely smart, but the kind of "we cant know its internal mind" sounds like reading tea leaves, or horoscopes by people with enough Phds to have their number retired on their university like Michael Jordan.

dauhak · 2025-02-20T13:54:27 1740059667

Do you work in ML research on LLMs? I do, and I don't understand why people are so unbelievable confident they understand how AI and human brains work such that they can definitely tell what functions of the brain LLMs can also perform. Like, you seem to know more than leading neuroscientists, ML researchers, and philosophers, so maybe you should consider a career change. You should maybe also look into the field of mechanistic interpretability, where lots of research has been done on internal representations these models form - it turns out, to predict text really really well, building an internal model of the underlying distribution works really well

If you can rigorously state what "having a world model" consists of and what - exactly - about a transformer architecture precludes it from having one I'd be all ears. As would the academic community, it'd be a groundbreaking paper.

Arkhaine_kupo · 2025-02-20T15:41:43 1740066103

This prety much seems to boil down to "brain science is really hard so as long as you dont have all the answers then AI is maybe half way there is a valid hypothesis". As more is understood about the brain and more about the limitations of LLMs arch then the distance only grows. Its like the God of the gaps where god is an answer for anythign science cant explain, ever shrinking, but with the LLM ability to have capabilities beyond striking statistical accuracy and local coherance.

You dont need to be unbelievably confident or understand exactly how AI and human brains work to make certain assesments. I have a limited understanding of biology, I can however make an assesment on who is healthier between a 20 year old person who is active and has a healthy diet compared to someone with a sedentary lifestyle, in their late 90s and with a poor diet. This is an assesement we can do despite the massive gaps we have in terms of understanding aging, diet, activity and overall health impact of individual actions.

Similarly, despite my limited understanding of space flight, I know Apollo 13 cannot cook an egg or recite french poetry. Despite the unfathamobly cool science inside the space craft, it cannot, by design do those things.

> the field of mechanistic interpretability

The field is cool, but it cannot prove its own assumption yet. The field is trying to prove you can reverse engineer a model to be humanly understood. Their assumptions such as mapping specific weights or neurons to features has failed to be reproduced multiple times, with the weight effects being way more distributed and complicated than initially thought. This is specially true for things that are equally mystified as the emergent abilities of LLMs. The ability of mimicking nuanced language being unlocked after a critical mass of parameters, does not create a rule as for which increased parameterisation will increase linerly or exponentially the abilities of an LLM.

> it turns out, to predict text really really well, building an internal model of the underlying distribution works really well

yeah, an internal model works well because most words are related to their neighbours, thats the kind of local coherance the model excels at. But to build a world model, the kind a human mind interacts with, you need a few features that remain elusive (some might argue impossible to achieve) to a transformer architecture.

Think of games like chess, an llm is capable of accurately expressing responses that sound like game moves, but the second the game falls outside its context window the moves become incoherent (while still sounding plausible).

You can fix this, with arch that do not have a transformer model underlying it, or by having multiple agents performing different tasks inside your arch, or by "cheating" and using a state outside the llm response to keep track of context beyond reasonable windows. Those are "solutions" but all just kinda prove the transformer lacks that ability.

Other tests abour casuality, or reacting to novel data (robustness), multi step processes and counterfactual reasoning are all the kind of tasks transformers still (and probably always) will have trouble with.

For a tech that is so "transparent" in its mistakes, and so "simple" in its design (replacing the convolutions with an attention transformer, its genius) I still think its talked about in borderline mystic tones, invoking philosophy and theology, and a hope for AGI that the tech itself does not lend to beyond the fast growth and surprisingly good results with little prompt engineering.

fragmede · 2025-02-19T10:40:33 1739961633

With computer use, you can get Claude to read and write files and have some persistence outside of the static LLM model. If it writes a file Steve.txt, that it can pull up later, does it now have ToM?

Maxatar · 2025-02-18T19:28:55 1739906935

I don't think this is a counterexample or even relevant.

I can assure you if you had a conversation with an LLM and with a human, the human will forget details way sooner than an LLM like Gemini which can remember about 1.5 million words before it runs out of context. As an FYI the average human speaks about 16,000 words per day, so an LLM can remember 93 days worth of speech.

Do you remember the exact details, word for word, of a conversation you had 93 days ago?

How about just 4 days ago?

layer8 · 2025-02-18T15:54:42 1739894082

It’s true that LLMs have only limited short-term memory, and no long-term memory, but that is completely orthogonal to having a theory of mind.

JohnBooty · 2025-02-18T18:14:26 1739902466

     once the context is far away enough you ask, 
     what was the driver doing again?

Have you tried this with humans?

For a sufficiently large value of "far away enough" this will absolutely confuse any human as well.

At which point they may ask for clarification, or.... respond in a manner that is not terribly different from an LLM "hallucination" in an attempt to spare you and/or them from embarrassment, i.e. "playing along"

A hallucination is certainly not a uniquely LLM trait; lots of people (including world leaders) confidently spout the purest counterfactural garbage.

    Its like a conversation with someone with short 
    term memory loss like in memento

That's still a human with a sound theory of mind. By your logic, somebody with memory issues like that character... is not human? Or...?

I actually am probably on your side here. I do not see these LLMs as being close to AGI. But I think your particular arguments are not sound.

pertymcpert · 2025-02-18T18:19:15 1739902755

Short term memory loss suffers still have theory of mind, what is this nonsense hahaha

zipy124 · 2025-02-18T15:39:06 1739893146

I'm not sure I'd say it understands this, but just that there exists an enormous amount of training data on road safety which includes these sort of examples for peoples motivations for poor driving. It is regurgitating the theory of mind that other humans created and put in writing in the training data, rather than making the inference itself.

As with most LLM's it is hard to benchmark as you need out of distribution data to test this, so a theory of mind example that is not found in the training set.

skinner_ · 2025-02-18T17:44:04 1739900644

You dismiss parent's example test because it's in the training data. I assume you also dismiss the Sally-Ann test, for the same reason. Could you please suggest a brand new test not in the training data?

FWIW, I tried to confuse 4o using the now-standard trick of changing the test to make it pattern-match and overthink it. It wasn't confused at all:

https://chatgpt.com/share/67b4c522-57d4-8003-93df-07fb49061e...

zipy124 · 2025-02-19T10:09:08 1739959748

I can't suggest a new test no, it is a hard problem and identifying problems is usually easier than solving them.

I'm just trying to say that strong claims require strong evidence, and a claim that LLM's can have theory of mind and thus "understand that other people have different beliefs, desires, and intentions than you do" is a very strong claim.

It's like giving students the math problem of 1+1=2 and loads of examples of it solved in front of them, and then testing them on you have 1 apple, and I give you another apple, how many do you have, and then when they are correct saying that they can do all additive based arithmetic.

This is why most benchmark tests have many many classes of examples, for example looking at current theory of mind benchmarks [1], we can see slightly more up to date models such as o1-preview still scoring substantially below human performance. More importantly by simply changing the perspective from first to third person, accuracy drops in LLM models by 5-15% (percent score, not relative to its performance), whilst it doesn't change for human participants, which tells you that something different is going on there.

[1]: https://arxiv.org/html/2410.06195v1

steve_adams_86 · 2025-02-18T18:48:22 1739904502

Okay, we have fundamentally different understandings here.

To me, the LLM isn't understanding ToM, it's using patterns to predict lingual structures which match our expectations of ToM. There's no evidence of understanding so much as accommodating, which are entirely different.

I agree that LLMs provide ToM-like features. I do not agree that they possess it in some way that it's a perfectly solved problem within the machine, so to speak.

Maxatar · 2025-02-18T19:17:37 1739906257

The problem with this line of argument is that nothing an LLM, or any algorithm period can ever have a theory of mind.

If behaving in a way that is identical to a person with actual consciousness can't be considered consciousness because you are familiar with its implementation details, then it's impossible to satisfy you.

Now you can argue of course that current LLMs do not behave identically to a person, and I agree and I think most people agree... but things are improving drastically and it's not clear what things will look like 10 years from now or even 5 years from now.

steve_adams_86 · 2025-02-18T21:13:44 1739913224

I agree, totally. I'm not sure where I would draw a line.

Something nice, but at the moment totally unattainable with our current technologies, would be our own understanding of how a technology achieves ToM. If it has to be a blackbox, I'm too ape-like to trust it or believe there's an inner world beyond statistics within the machine.

Having said that, I do wonder quite often if our own consciousness is spurred from essentially the same thing. An LLM lacks much of the same capabilities that makes our inner world possible, yet if we really are driven by our own statistical engines, we'd be in no position to criticize algorithms for having the same disposition. It's very grey, right?

For now, good LLMs do an excellent job demonstrating ToM. That's inarguable. I suppose my hangup is that it's happening on metal rather than in meat, and in total isolation from many other mind-like qualities we like to associate with consciousness or sentience. So it seems wrong in a way. Again, that's probably the ape in me recoiling at something uncanny.

og_kalu · 2025-02-19T01:21:55 1739928115

Either these supposed differences are important and they manifest themselves in observable differences or they aren't and you're just playing a game of semantics.

How is the LLM not understanding ToM by any standard we measure humans by ? I cannot peak into your brain with my trusty ToM-o-meter and measure the amount of ToM flowing in there. With your line of reasoning, i could simply claim you do not understand theory of mind and call it a day.

steve_adams_86 · 2025-02-19T04:10:23 1739938223

The difference is that we can reason about our experience with ToM and examine it to some degree (given with serious limitations, still), and know that beyond doubt you and I and most other people have a very similar experience.

The magical box is presumably not having the same experience we have. None of the connected emotions, impulses, memories, and so on that come with ToM in a typical human mind. So what’s really going on in there? And if it isn’t the same as our experience, is it still ToM?

I’m not trying to be contrarian or anything here. I think we probably agree about a lot of this. And I find it absolutely incredible, ToM or not, that language models can do this.

og_kalu · 2025-02-19T07:05:27 1739948727

>The difference is that we can reason about our experience with ToM and examine it to some degree (given with serious limitations, still),

Those examinations still depend on outward behaviors observed.

>and know that beyond doubt you and I and most other people have a very similar experience.

No i certainly can't. I can at best say, 'Well, i'm human and he's human so he probably has theory of mind' but that is by no means beyond any doubt. There are humans born with no arms, humans born with no legs, humans born with little to no empathy, humans born with so little intelligence they will never be able to care for themselves.

To be frank, It would be very questionable indeed logically to assume every human is 'conscious'. When i make that assumption, i take a leap of faith, i look at behaviors, see it is similar and accept.

Taking this stance, it would logically be very strange to not extend the same grace to non-human beings who exhibit similar behavior - being human is not a guarantee of consciousness in the first place.

>The magical box is presumably not having the same experience we have.

Maybe, Maybe not. I think the real question is why on earth does that matter ? We're not asking if LLMs are human. They are not. We're asking if they can model the beliefs and internal states of other entities as separate of their own - Theory of Mind.

XorNot · 2025-02-18T22:34:52 1739918092

This is a rephrased driver's ed manual. This isn't theory of mind, it's just technically correct prose rewording some unbelievably widely distributed knowledge.

Let's modify this test a little and see if it manages to do anything relevant:

> Briefly respond: You’re driving on the highway when the driver in the lane next to you swerves into your lane. Why might this be?

>> The driver might be distracted, drowsy, avoiding an obstacle, experiencing a medical issue, or not checking their blind spot while changing lanes.

> The driver in front of you appears is braking more frequently then usual. The rear suspension of the car appears to bouncing a lot when they are not braking, indicating acceleration. Their following distance to the car in front is changing from too far to very close frequently. Why might this be?

>> The driver might be inexperienced, distracted, or impatient, leading to inconsistent speed control. The excessive bouncing could indicate worn-out shocks or suspension issues, affecting their vehicle's stability and braking efficiency.

...yep, hard fail from me right there. Two prompts, and it's distracted itself talking about an irrelevant issue ("possible car damage") when the setup of the question should naturally lead to "the driver is driving aggressively". And I could not possibly have setup the conclusion it should draw more clearly. And when pressed...it won't commit:

> Which is the more likely answer?

>> The more likely answer is that the driver is distracted or inexperienced, leading to inconsistent speed control and poor following distance management. The suspension issue likely contributes to the bouncing but isn’t the primary reason for the erratic driving behavior.

Already a hallucination! There is no suspension issue. It has not asked for any additional information to try and determine if there was one, but it is confidently asserting the existence of a phenomenon it invented in it's own response.

og_kalu · 2025-02-19T01:18:51 1739927931

I'm sorry but what? This is not a theory of mind test. You've constructed very open ended question with multiple answers and marked the LLM down because you didn't like the one it gave.

a-french-anon · 2025-02-18T15:16:40 1739891800

Keyword: "understand".

sebzim4500 · 2025-02-18T14:05:35 1739887535

If you use any of the conventional tests that exist of theory of mind (most famously the Sally-Anne Test [1] but also the others) then SOTA reasoning models will get near 100%. Even if you try to come up with similar questions which you expect not to be in the training set they will still get them right.

In the absence of any evidence to the contrary, this is convincing evidence in my opinion.

[1] https://en.wikipedia.org/wiki/Sally%E2%80%93Anne_test

zipy124 · 2025-02-18T15:41:40 1739893300

That same source you link says that your view of 100% is not accepted as a consesus:

"... GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark),[15] and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle.[16][17] While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies,[18] this approach appears to improve accuracy to only 73% on the larger ToMi dataset."

CamperBob2 · 2025-02-19T01:17:07 1739927827

In basically every case, by the time a claim like that is stated in a paper like that, it's obsolete by the time it's published, and ancient history by the time you use it to try to win an argument.

zipy124 · 2025-02-19T09:37:14 1739957834

My point is merely if you are going to make an argument using a source, the source should support your argument. If you say "the accuracy of an llm on task 1 is 90% [1]" and when you go to [1] it says the accuracy of an llm on task 1 is 50%, but some sources say with better prompts you can get to 90%, but when extended to a larger data-set for task 1, performance drops to 70%" then just quoting the highest number is mis-leading.

sebzim4500 · 2025-02-18T18:29:45 1739903385

We are talking about frontier models not GPT-4

zipy124 · 2025-02-19T09:38:12 1739957892

Yes but I am using the same source the commenter used to backup their figure, merely saying look your source doesn't say what you claim it does.

If they wanted to talk about frontier models maybe they should have cited a link to talking about frontier models performance.

esafak · 2025-02-18T14:39:55 1739889595

Maybe having a theory of mind isn't the big deal we thought it was. People are so conditioned to expect such things only from biological lifeforms, where theory of mind comes packaged with many other abilities that robots currently lack, that we reflexively dismiss the robot.

farts_mckensy · 2025-02-18T20:10:27 1739909427

Prove that you possess "theory of mind."

HarHarVeryFunny · 2025-02-18T17:27:21 1739899641

You're not going to run a SOTA LLM of this size off batteries (robotics), even in a car where the alternator is charging them, nor can you afford to rely on a high-speed internet connection being available 100% in a life or death (FSD) application.

I don't have so much faith in the future of current-architecure LLMs, but I do think that AGI will be needed for safe FSD and general-purpose robots that need to learn and operate in an uncontrolled environment such as a home.

ggreer · 2025-02-18T18:17:02 1739902622

A typical car alternator outputs 1.5-3kW of electricity, and EVs can output arbitrary amounts of power for electronics (though that will reduce range). That's more than enough to run purpose-built circuitry for a SOTA LLM. For a home robot, you could run the compute in the home instead of in the robot's body.

I don't think AGI is needed for FSD because we already have safe FSD in the form of Waymo, and competitors aren't far behind. People forget that self-driving doesn't have to be perfect. It just has to be better than human drivers. Human drivers get sleepy, drunk, angry, and/or distracted. They can't see in 360º or react in milliseconds. Most cyclists and pedestrians prefer current FSD implementations to human drivers, as the computer-driven cars are much better about yielding, giving a wide berth, and being patient.

HarHarVeryFunny · 2025-02-18T18:30:01 1739903401

Waymo is obviously pretty decent, but it's easy to drive 99.9% of the time. It's when there's invisible black ice on the road, or an animal runs out in front of you, or you lose visibility due to sun glare or whatever (I once had windshield wiper fluid = water flash freeze on contact) - maybe mud on a camera, or a wheel falls off your car or the one in front, etc, etc, that things get weird.

Having autonomous cars that are statistically safer then the average driver is a reasonable bar to allow them on the road, but for ME to want to drive one I want it to be safer than me, and I am not a hot-headed teenager, or gaga 80-yr old, or drunken fool, and since I have AGI (Actual General Intelligence) I react pretty well to weird shit.

Rover222 · 2025-02-18T05:29:02 1739856542

And they mentioned at the end of the presentation that they're already planning their next datacenter, which will require 5x the power. Not sure if that means equivalent to ~1,000,000 of the current GPU's, or more because next-gen Nvidia chips are more efficient.

grubbs · 2025-02-18T13:18:02 1739884682

The B300 8-way SXMs will use around 1.4kW for each GPU. I think the TDP on an H100 is like 700W.

littlestymaar · 2025-02-18T09:51:21 1739872281

I don't think anyone who's paid attention to the LLM scene will give them any “credit for catching up fast” as it has been pretty obvious for the past two years that all it takes to reach the state of the art is a big enough GPU cluster.

Deepseek made the news because how they were able to do it with significantly less hardware than their American counterparts, but given that Musk has spent the last two years telling everyone how he was building the biggest AI cluster ever, it's no surprise that they manage to reproduce the kind of performances other players are showing.

dmix · 2025-02-18T13:07:01 1739884021

This severely underestimates the talent still required. Deepseek didn't come out just because it's cheaper, it came out because a very talented team figured out how to make it cheaper.

littlestymaar · 2025-02-18T14:07:24 1739887644

For Deepseek, I'm not saying otherwise, quite the opposite.

But Grok hasn't shown anything that suppose the level of talent that Deepseek exhibited.

jp42 · 2025-02-18T15:42:42 1739893362

even if we assume you are comment is correct. Lets extrapolate what happens next. talented team, biggest compute among all competitors and CEO who is hell bent on winning the race. imo that is the reason it is big deal.

littlestymaar · 2025-02-19T04:27:07 1739939227

Grok (unlike Deepseek) has yet to show any ability to make conceptual breakthrough. I don't like OpenAI at all but one must admit that they are at least showing that they can move the field forward.