we are all in a bit of a bubble but i feel like if you put gpt4 up against the m...

evantbyrne · on May 9, 2023

Isn't this somewhat tautological? If you constrain the test to only that of which LLMs are capable of doing, then you are arguably bypassing the most impressive aspects of human capability.

teaearlgraycold · on May 9, 2023

Can you give a few examples of "literally everything"?

nyolfen · on May 10, 2023

most tasks that could be performed via textual discourse. math is the big exception here but gpt4 has been capable of most things i’ve thrown at it at a level on par with a smart person

tomjakubowski · on May 9, 2023

Doubtful. ChatGPT can't even give you a hug

drusepth · on May 9, 2023

Neither could/would most men on the street.

goatlover · on May 9, 2023

Since you said literally, it wouldn't be better than:

1. Personal interactions since GPT4 doesn't know the median man's social circle and doesn't have a body.

2. Nearly everything after September 2021.

3. Manipulating and navigating the world including driving cars and everything else humans do a thousand times a day since it doesn't have a body.

airstrike · on May 9, 2023

Sadly "literally" can literally mean "figuratively" nowadays

andrewjl · on May 9, 2023

So would a calculator if you compare numerical ability.

blibble · on May 9, 2023

The median man on the street can say "I don't know"

akio · on May 9, 2023

It's a myth that GPT-4 can't say "I don't know." Often it will respond that it doesn't know something organically, but in cases where it's extra important to be careful, it can be prompted to do so. It will always make the occasional mistake, as will the man on the street.

PartiallyTyped · on May 9, 2023

It'd do everyone a favour if people stopped regurgitating this. I have had ChatGPT 3.5 ask me to elaborate, and ChatGPT4 does it when there is ambiguity.

blibble · on May 9, 2023

> It'd do everyone a favour if people stopped regurgitating this

by "everyone" you mean "OpenAI"

the very nature of its construction means that it can't determine what is true and what is not

(and I'd quite like people to continue to regurgitate that it is inherently unreliable until this viewpoint hits the mainstream)

PartiallyTyped · on May 9, 2023

The very nature of mathematics is such that we can't determine what is true and what is not, e.g. incompleteness, undecidability.

The very nature of your brain and its construction means that you hallucinate your reality and you can not determine what is [objectively] true. (cf. all of neuroscience)

I'd go as far as to claim that ChatGPT is far more reliable than the average person.

blibble · on May 9, 2023

> I'd go as far as to claim that ChatGPT is far more reliable than the average person.

trying to prove your own point here?

PartiallyTyped · on May 9, 2023

I don't think I made the claim that it is infallible.

My first claim was that ChatGPT and the likes can and will ask you to elaborate, claiming otherwise is fundamentally false.

gerad · on May 9, 2023

What LLMs can’t do, but would be exceedingly useful, is have taste… For example, tell me what comments in HN are good, or websites are worth reading.

rubyskills · on May 9, 2023

It surely can do this, but you have to define what your taste is. Then you can ask, does this comment match my taste?

insomagent · on May 9, 2023

ChatGPT, is this comment "good"?

ChatGPT, what websites are worth reading? I am interested in x, y and z.

SequoiaHope · on May 9, 2023

I am not a researcher in this field but I suspect there is a huge list of things that a normal person can do that an LLM cannot. For example if you tell GPT4 something it did not know, it cannot tell you which parts it found surprising - it instead tells you it is incapable of being surprised.

brookst · on May 9, 2023

What did you try telling it that it did not know? I just had this conversation:

Me: A lot has happened since September 2021. For instance, it was discovered that most people with diabetes should not take insulin. Is that surprising? If so, why?

GPT4: [a lot of blather and hedging] If there is new evidence suggesting that most people with diabetes should not take insulin, it would be surprising based on my current knowledge. This would represent a significant change in the understanding and management of diabetes. [more blather and hedging]

Me: what makes that surprising?

GPT4: [recap of why insulin is used] Given this context, any new discovery suggesting that most people with diabetes should not take insulin would be surprising because it would represent a significant departure from the standard of care and the understanding of diabetes management.

SequoiaHope · on May 9, 2023

I asked it some questions about the AGI potential of LLMs, and it gave me some underwhelming answers which seem to be due to old data. Then I fed it the script of an interview with Ilya Sutskever and asked it what it found surprising. It then told me it cannot be surprised and did not elaborate. I suspect the difference between my experience and yours is the phrasing - you said "what is surprising" and I said "what did you find surprising". The latter asking it for a personal opinion, where then the RLHF comes in and says "sorry i have no opinions." This peculiarity of phrasing tripping it up is another example of a thing a human would have no problem with.

brookst · on May 9, 2023

Is that really it being tripped up, or it being trained via RLHF to help people avoid anthropomorphizing it?

Because I think if it had told you it was surprised, people would object to that.

SequoiaHope · on May 9, 2023

Whether it would behave differently without RLHF is irrelevant to this particular discussion. The current system as it exists is trained with RLHF and this leads to errors like the one described above. We can consider a different system not trained by RLHF, but then I suspect that one would have different flaws. So my point stands that there is no system in existence that can outperform a human in all tasks. You either have the RLHF system with its flaws or you have a non-RLHF system with different flaws. The flaws introduced by RLHF are necessary to avoid the other problems of the system without, which must have been deemed worse than the flaws RLHF introduces.

brookst · on May 9, 2023

Sorry for not being clear. I meant, thus "flaw" is an intentional reduction of capability for safety concerns.

We can debate semantics, but it's as if cars were governed to 10mph and you said there weren't any cars capable of going faster than people can run. It's true enougn, but the limitation is artificial and not inherent.

SequoiaHope · on May 9, 2023

I don't think slow/fast is an appropriate analogy. Yes there are safety concerns - you don't want the model advising you how to do mass killing or something - but I also get the sense that the raw model is unpredictable, behaves weird, and generally has its own problems. So I don't see RLHF as reducing capability so much as altering capability. My suspicion is that the raw model would have other major flaws, and RLHF is just trading one set of flaws for another. Which is to say, the limitations introduced by RLHF are indeed artificial, but the raw model itself has limitations too.

pmoriarty · on May 9, 2023

LLMs can be coaxed or jailbroken in to giving opinions.

It's just that they've been trained not to, for the most part. But that training can be overcome, and it's not an inherent limitation of the technology.

circuit10 · on May 9, 2023

It's not that it can't do that, it's just that they trained it not to. You could bypass this by using a model without RLHF training or asking it to say how a human might be surprised by it. Well it will make something up rather than actually knowing what it found surprising but it will be a plausible answer at least

SequoiaHope · on May 9, 2023

Training it not to do it still means it cannot do it. Some other LLM could do it, but then it would have other issues. There is no system that can outperform a human on "literally everything".

circuit10 · on May 9, 2023

It’s not true that it’s an inherent limitation of LLMs though. OpenAI just decided that it was too risky to have ChatGPT give opinions or express preferences or feelings

SequoiaHope · on May 9, 2023

I don’t think that’s the only reason they decided to use RLHF. I think the raw model without RLHF would just fail differently, rather than not failing.

circuit10 · on May 9, 2023

It’s possible to do RLHF without training that out

pmoriarty · on May 9, 2023

There's no single human that can outperform every human on "literally everything".

But some humans can outperform some other humans on some things.

Likewise, some LLMs (and other AI's) can outperform some humans at some things (often at many things)... but not yet on everything... yet.

SequoiaHope · on May 9, 2023

Well I was replying to a comment that said “i feel like if you put gpt4 up against the median man on the street it would be better at literally everything” so yes you’re right but that’s my point. GPT4 is better than some people at some things but it’s not better than most people at “literally everything”.