I maintain that LLMs are great for a few things: \* Making stuff up. If you need...

Gigachad · on May 9, 2023

I've been pasting comments in to ChatGPT and asking what they mean by certain terms or phrases and it works extremely well and with an answer I can independently verify to be true. Far more useful than getting a simple dictionary lookup.

An example response I got recently:

>In the context of this comment, "dovish" means that the speaker perceives Powell's statement to be more accommodative towards economic growth and less concerned about inflation than Lowe's statement. This suggests that Powell may be more inclined to keep interest rates low or lower them further to stimulate economic growth, rather than raise them to combat inflation. The term "dove" is often used to describe policymakers who prioritize economic growth over inflation concerns. In contrast, "hawkish" refers to policymakers who prioritize fighting inflation over economic growth.

Meanwhile google gives me this response for a defintion

>Definitions of dovish. adjective. opposed to war. synonyms: pacifist, pacifistic peaceable, peaceful. not disturbed by strife or turmoil or war.

tkgally · on May 9, 2023

> Far more useful than getting a simple dictionary lookup.

Yes.

Long-time dictionary-editor and translator here. What blew me away most when I first tried ChatGPT, and what continues to amaze me, is its ability to distinguish and explain the meanings of individual words relative to their context. I have spent much of my career doing just that, and I always felt that it relied on my knowledge and intuition as a human and would be a really hard problem for AI. Conventional corpus linguistics had not even come close to cracking the problem.

I’ve done tests in which I gave ChatGPT texts that contained very rare words, new words, or nonce words that would not be defined in any dictionaries, and some of which were unattested on the web. When given enough context, it was able to deduce the intended meanings.

In contrast, its ability to explain grammar is terrible. It misidentifies parts of speech, singulars and plurals, the subjects and objects of verbs, even the number of letters in words. I hope that people develop plugins for it to fix its language-explaining ability as, reportedly, other plugins are able to fix its natively terrible mathematical ability.

mlyle · on May 9, 2023

> even the number of letters in words.

This one isn't surprising at all: it doesn't deal with letters, but symbols. It only indirectly knows about spelling.

> In contrast, its ability to explain grammar is terrible. It misidentifies parts of speech, singulars and plurals, the subjects and objects of verbs,

I wonder if this is an area where having to generate word-by-word an immediate final answer immediately is killing it, and if it could be induced to "diagram" a sentence first and get a correct answer. Can you give me an example of the query you're asking?

tkgally · on May 9, 2023

> This one isn't surprising at all: it doesn't deal with letters, but symbols.

It’s not surprising to us now. It was very surprising to me when I first noticed it, as it contrasted sharply with ChatGPT’s ability to explain aspects of language that seem to us humans as being much more difficult, particularly word meaning.

> Can you give me an example of the query you're asking?

I have to admit that all of my testing of its grammar-explaining ability was done last December. I have repeated a few of those tests now with GPT-4, and it did fine [1].

[1] https://www.gally.net/temp/202212chatgpt/explaininggrammar.h...

lucubratory · on May 11, 2023

>I have to admit that all of my testing of its grammar-explaining ability was done last December. I have repeated a few of those tests now with GPT-4, and it did fine [1].

A tale as old as time in this space! I appreciate you checking it again. They are improving so fast.

mlyle · on May 9, 2023

Here is an example of results I got in playing with this some with ChatGPT-4. Note that it doesn't understand the syntactic ambiguity of the sentence:

The sentence can be broken down as follows:

Main Clause: - The burglar threatened the student with the knife.

Subject: - The burglar

Predicate: - threatened the student with the knife.

Object: - the student

Prepositional Phrase: - with the knife

The prepositional phrase "with the knife" modifies the verb "threatened." It tells us what the burglar used to threaten the student.

Now, let's identify the parts of speech of each word in the sentence and whether the nouns are singular or plural:

- The: article, singular - burglar: noun, singular - threatened: verb, past tense - the: article, singular - student: noun, singular - with: preposition - the: article, singular - knife: noun, singular

All the nouns in this sentence are singular. There is no ambiguity in meaning in this sentence.

tkgally · on May 9, 2023

I tried that sentence and three similar ones, and this time GPT-4 caught the ambiguity. The results are here:

https://www.gally.net/temp/202212chatgpt/20230508-syntactic-...

I am especially impressed by its comment about one interpretation of “The woman cleaned her teeth with a knife”: “The woman cleaned her teeth that had a knife in them (highly unlikely, but still a possible interpretation).”

As I noted in another comment, I might need to backtrack on my assessment of its grammatical-explanation ability, as until today I had only done tests using the initial release of ChatGPT, not GPT-4.

pmoriarty · on May 9, 2023

Anthropic's Claude[1] and Claude+[2] both failed some of these.

Question:

I am interested in syntactic ambiguity in English. Can any of the following sentences be interpreted in more than one way? Explain why or why not.

The man opened the package with a knife.

The burglar threatened the student with a knife.

The woman cleaned her teeth with a knife.

The child pointed at the teacher with a knife.

Claude's answer:

These sentences can be interpreted in multiple ways:

1. The man opened the package with a knife. Ambiguous: The knife could have been used as a tool to cut tape and open the package, or it could imply the man threatened someone with a knife while opening the package.

2. The burglar threatened the student with a knife. Unambiguous: The implication is clearly that the burglar threatened the student using a knife as a weapon.

3. The woman cleaned her teeth with a knife. Ambiguous: The knife could refer to a toothpick or dental tool, or it could imply an unusual way of brushing teeth.

4. The child pointed at the teacher with a knife. Ambiguous: The knife could have been used to gesture and point, or it could imply the child threatened the teacher with a knife.

In summary, sentences that suggest using a knife as a tool tend to be ambiguous between a literal and figurative interpretation. Sentences that clearly describe a knife being used as a weapon are generally unambiguous. The ambiguity arises from the multiple meanings and uses of the word "with."

Does this explanation make sense? Let me know if you have any other questions!

Claude+'s answer:

Here are my analyses of potential syntactic ambiguity in those sentences:

The man opened the package with a knife. - This sentence is ambiguous. It could mean either that the man used a knife to open the package, or that the package contained a knife inside. The ambiguity arises because "with a knife" could modify either "opened" or "package".

The burglar threatened the student with a knife. - This sentence is unambiguous. The only coherent interpretation is that the burglar used a knife to threaten the student. "With a knife" can only modify "threatened" in this sentence.

The woman cleaned her teeth with a knife. - This sentence is ambiguous. It's unclear whether it means the woman used a knife to clean her teeth, or her teeth contained a knife. Again, "with a knife" could modify either "cleaned" or "teeth".

The child pointed at the teacher with a knife. - This sentence is unambiguous. The only meaning that makes sense is that the child pointed at the teacher while holding a knife. "With a knife" can only modify "pointed at" in this example.

In summary, ambiguity can arise when a prepositional phrase like "with a knife" can logically modify more than one part of the sentence. Context often helps resolve such ambiguities, but in isolation some sentences remain open to multiple interpretations. Let me know if you have any other questions!

[1] - https://poe.com/Claude-instant

[2] - https://poe.com/Claude%2B

deafpolygon · on May 9, 2023

> In contrast, its ability to explain grammar is terrible.

To be fair, grammar is sometimes a bit of an advanced subject - even if it is ostensibly easy (which it isn't). Isn't that the same problem for every other subject? Other than encyclopedic knowledge, it struggles with many subjects at higher forms (i.e. programming more than just a few basics, math, and so on).

I think this is partially an issue of availability of data.

pmoriarty · on May 9, 2023

Something else to consider is that languages differ as to the complexity of their grammar.

Much research and training of LLMs has been done in English, but if these models were trained on as much data in other languages as English, I wonder if:

- LLMs would do better or worse on intelligence and other tests if they were tested in other languages?

- Could conversing with LLMs be easier or harder in other languages?

Some languages, like Logan or Lojban, might be especially suited towards this sort of testing and interaction, as they were designed to be easy for computers to interpret.

teaearlgraycold · on May 9, 2023

Grammar models already exist and have for years. I think giving GPT access to one would be the right approach there.

teaearlgraycold · on May 9, 2023

Yes. And that gets into the awkward bit. LLMs are incidentally good as databases/search engines for information that is prevalent on the internet and not disputed. Ideally we could separate the LLM from the information. So when asking it for a definition you'd provide your query, the source comment, and then the computer would add in the definition and some uses in context.

Right now we've tangled an understanding of language and a corpus of information together in a way that causes distrust in AI. If the AI gets some fact wrong (like Bard did when demo'd earlier this year) people laugh and think LLMs are a failure. They should not be used for information recall. That is a task that a vector database would be good at.

svachalek · on May 9, 2023

Yes, I think this explains a lot of skeptics. They look at ChatGPT as an information repository, as so it is impressive but flawed, while looking right past its incredible facility with language which is positively superhuman. Some humans can exceed its language skills on some details, but no one can speak so many languages (both natural and programming languages) with the ease and facility of ChatGPT.

masswerk · on May 9, 2023

I just checked with Google, providing minimal context, as in the query, "dovish economy":

> What is does Dovish Mean? Dove refers to an economic policy adviser who advocates for monetary policies involving low-interest rates. The doves argue that inflation isnt bad and that it is bound to have few negative effects on the economy.

Linked to a site that includes all those typos. Followed by other links on doves and hawks.

kristopolous · on May 9, 2023

I use it 90% for entertainment.

In improv theater there's something called "yes and" - essentially you take the premise father, no matter how absurd, without redirection.

You can come up with the most ridiculous things and it just goes with it, hilariously.

I'll come up with one on the spot. "I'm having a very difficult time with my pet snails being social. I'm thinking of starting a social networking site and giving snails tiny phones so they can chat with each other. I need a company name, an elevator pitch, and some copy for my landing page."

And then you ask it to do I dunno, NFTs and crypto currency for snails, give them tiny VR headsets. Have it come up with a jingle and a commercial. You can say instead of unfriending you salt them. Etc... It'll just keep going. Even "A snails rights luddite group of Mennonites and Amish are now protesting my idea. I need a way to appease my critics. Can you write a letter for me that defends snailconnect as healthy and good?"

One of my favorite outputs from this session

"How about SnailConnect; a small trail for snail, one giant leaf for snailkind."

You don't ever get to a place where it's like "well now you're just being ridiculous"

But I agree. It's just a big data version of Eliza - spitting my reflection back at me

teaearlgraycold · on May 9, 2023

I think it’s fun to use as a way to write Star Trek fan fiction. It can get pretty witty.

nyolfen · on May 9, 2023

we are all in a bit of a bubble but i feel like if you put gpt4 up against the median man on the street it would be better at literally everything, except maybe one or two things that person specializes in

evantbyrne · on May 9, 2023

Isn't this somewhat tautological? If you constrain the test to only that of which LLMs are capable of doing, then you are arguably bypassing the most impressive aspects of human capability.

teaearlgraycold · on May 9, 2023

Can you give a few examples of "literally everything"?

nyolfen · on May 10, 2023

most tasks that could be performed via textual discourse. math is the big exception here but gpt4 has been capable of most things i’ve thrown at it at a level on par with a smart person

tomjakubowski · on May 9, 2023

Doubtful. ChatGPT can't even give you a hug

drusepth · on May 9, 2023

Neither could/would most men on the street.

goatlover · on May 9, 2023

Since you said literally, it wouldn't be better than:

1. Personal interactions since GPT4 doesn't know the median man's social circle and doesn't have a body.

2. Nearly everything after September 2021.

3. Manipulating and navigating the world including driving cars and everything else humans do a thousand times a day since it doesn't have a body.

airstrike · on May 9, 2023

Sadly "literally" can literally mean "figuratively" nowadays

andrewjl · on May 9, 2023

So would a calculator if you compare numerical ability.

blibble · on May 9, 2023

The median man on the street can say "I don't know"

akio · on May 9, 2023

It's a myth that GPT-4 can't say "I don't know." Often it will respond that it doesn't know something organically, but in cases where it's extra important to be careful, it can be prompted to do so. It will always make the occasional mistake, as will the man on the street.

PartiallyTyped · on May 9, 2023

It'd do everyone a favour if people stopped regurgitating this. I have had ChatGPT 3.5 ask me to elaborate, and ChatGPT4 does it when there is ambiguity.

blibble · on May 9, 2023

> It'd do everyone a favour if people stopped regurgitating this

by "everyone" you mean "OpenAI"

the very nature of its construction means that it can't determine what is true and what is not

(and I'd quite like people to continue to regurgitate that it is inherently unreliable until this viewpoint hits the mainstream)

PartiallyTyped · on May 9, 2023

The very nature of mathematics is such that we can't determine what is true and what is not, e.g. incompleteness, undecidability.

The very nature of your brain and its construction means that you hallucinate your reality and you can not determine what is [objectively] true. (cf. all of neuroscience)

I'd go as far as to claim that ChatGPT is far more reliable than the average person.

blibble · on May 9, 2023

> I'd go as far as to claim that ChatGPT is far more reliable than the average person.

trying to prove your own point here?

PartiallyTyped · on May 9, 2023

I don't think I made the claim that it is infallible.

My first claim was that ChatGPT and the likes can and will ask you to elaborate, claiming otherwise is fundamentally false.

gerad · on May 9, 2023

What LLMs can’t do, but would be exceedingly useful, is have taste… For example, tell me what comments in HN are good, or websites are worth reading.

rubyskills · on May 9, 2023

It surely can do this, but you have to define what your taste is. Then you can ask, does this comment match my taste?

insomagent · on May 9, 2023

ChatGPT, is this comment "good"?

ChatGPT, what websites are worth reading? I am interested in x, y and z.

SequoiaHope · on May 9, 2023

I am not a researcher in this field but I suspect there is a huge list of things that a normal person can do that an LLM cannot. For example if you tell GPT4 something it did not know, it cannot tell you which parts it found surprising - it instead tells you it is incapable of being surprised.

brookst · on May 9, 2023

What did you try telling it that it did not know? I just had this conversation:

Me: A lot has happened since September 2021. For instance, it was discovered that most people with diabetes should not take insulin. Is that surprising? If so, why?

GPT4: [a lot of blather and hedging] If there is new evidence suggesting that most people with diabetes should not take insulin, it would be surprising based on my current knowledge. This would represent a significant change in the understanding and management of diabetes. [more blather and hedging]

Me: what makes that surprising?

GPT4: [recap of why insulin is used] Given this context, any new discovery suggesting that most people with diabetes should not take insulin would be surprising because it would represent a significant departure from the standard of care and the understanding of diabetes management.

SequoiaHope · on May 9, 2023

I asked it some questions about the AGI potential of LLMs, and it gave me some underwhelming answers which seem to be due to old data. Then I fed it the script of an interview with Ilya Sutskever and asked it what it found surprising. It then told me it cannot be surprised and did not elaborate. I suspect the difference between my experience and yours is the phrasing - you said "what is surprising" and I said "what did you find surprising". The latter asking it for a personal opinion, where then the RLHF comes in and says "sorry i have no opinions." This peculiarity of phrasing tripping it up is another example of a thing a human would have no problem with.

brookst · on May 9, 2023

Is that really it being tripped up, or it being trained via RLHF to help people avoid anthropomorphizing it?

Because I think if it had told you it was surprised, people would object to that.

SequoiaHope · on May 9, 2023

Whether it would behave differently without RLHF is irrelevant to this particular discussion. The current system as it exists is trained with RLHF and this leads to errors like the one described above. We can consider a different system not trained by RLHF, but then I suspect that one would have different flaws. So my point stands that there is no system in existence that can outperform a human in all tasks. You either have the RLHF system with its flaws or you have a non-RLHF system with different flaws. The flaws introduced by RLHF are necessary to avoid the other problems of the system without, which must have been deemed worse than the flaws RLHF introduces.

brookst · on May 9, 2023

Sorry for not being clear. I meant, thus "flaw" is an intentional reduction of capability for safety concerns.

We can debate semantics, but it's as if cars were governed to 10mph and you said there weren't any cars capable of going faster than people can run. It's true enougn, but the limitation is artificial and not inherent.

SequoiaHope · on May 9, 2023

I don't think slow/fast is an appropriate analogy. Yes there are safety concerns - you don't want the model advising you how to do mass killing or something - but I also get the sense that the raw model is unpredictable, behaves weird, and generally has its own problems. So I don't see RLHF as reducing capability so much as altering capability. My suspicion is that the raw model would have other major flaws, and RLHF is just trading one set of flaws for another. Which is to say, the limitations introduced by RLHF are indeed artificial, but the raw model itself has limitations too.

pmoriarty · on May 9, 2023

LLMs can be coaxed or jailbroken in to giving opinions.

It's just that they've been trained not to, for the most part. But that training can be overcome, and it's not an inherent limitation of the technology.

circuit10 · on May 9, 2023

It's not that it can't do that, it's just that they trained it not to. You could bypass this by using a model without RLHF training or asking it to say how a human might be surprised by it. Well it will make something up rather than actually knowing what it found surprising but it will be a plausible answer at least

SequoiaHope · on May 9, 2023

Training it not to do it still means it cannot do it. Some other LLM could do it, but then it would have other issues. There is no system that can outperform a human on "literally everything".

circuit10 · on May 9, 2023

It’s not true that it’s an inherent limitation of LLMs though. OpenAI just decided that it was too risky to have ChatGPT give opinions or express preferences or feelings

SequoiaHope · on May 9, 2023

I don’t think that’s the only reason they decided to use RLHF. I think the raw model without RLHF would just fail differently, rather than not failing.

circuit10 · on May 9, 2023

It’s possible to do RLHF without training that out

pmoriarty · on May 9, 2023

There's no single human that can outperform every human on "literally everything".

But some humans can outperform some other humans on some things.

Likewise, some LLMs (and other AI's) can outperform some humans at some things (often at many things)... but not yet on everything... yet.

SequoiaHope · on May 9, 2023

Well I was replying to a comment that said “i feel like if you put gpt4 up against the median man on the street it would be better at literally everything” so yes you’re right but that’s my point. GPT4 is better than some people at some things but it’s not better than most people at “literally everything”.

potatolicious · on May 9, 2023

This is correct. I sometimes feel like I'm yelling into a hurricane when I say this, but it bears repeating: it's a language model, not a world knowledge model.

LLMs are exceptionally good at transforming many aspects of language - its proficiency in coding is derived from this, not because it "knows" imperative logic.

Tasks where you're asking it to transform text from one form to another (make it shorter, make it longer, make it a different language, etc.) are where it excels. It's particularly poor at knowledge retrieval (i.e., hallucinations galore) and very bad at reasoning - but so far all of the breathless hype has been specifically about the use cases it's bad at and rarely about the cases where it's amazing!

og_kalu · on May 9, 2023

LLMs are not very bad at reasoning. People say this but struggle to demonstrate how.

There are many types of reasoning. For some like abstract analogical reasoning, LLMs are top % performers.

https://arxiv.org/abs/2212.09196

There's many papers at this point demonstrating pretty strong reasoning. From Casual Reasoning, and explanatory reasoning to algorithmic reasoning.

https://arxiv.org/abs/2305.00050

https://arxiv.org/abs/2204.02329

https://arxiv.org/abs/2211.09066

The only kind of reasoning that's somewhat lacking is logical reasoning. And even then it's not awful, just lagging behind equivalent human performance.

So someone really needs to tell where this oft repeated "can't reason" rhetoric is coming from. Because the numbers sure as hell don't support it.

travisjungroth · on May 9, 2023

Logical reasoning is also the easiest to solve with a program if you can get it into the right format, which LLMs are good at.

cjbprime · on May 9, 2023

For what it's worth, I think it feels like you're yelling into a hurricane because you are kind of clearly incorrect.

Can you think of a few questions (using unique nouns etc to avoid parroting) for which the answers require a world knowledge model, so that we can ask them to GPT-4?

> LLMs are exceptionally good at transforming many aspects of language - its proficiency in coding is derived from this

If you paste a Python script and ask GPT-4 what the script will output when run, it is usually correct. How does it achieve this without reasoning about the state of variables over time?

I don't know why the "it's a language model" belief is so strong. For one thing, it's untrue technically: GPT-4 has been trained on image input as well (improving its world modeling as a result, even though it can only output text), so it is a multi-modal model, not a language model.

For a second point, the fact that the model inputs images+text, and outputs text, doesn't prescribe what happens inside the model. That's just what's happening at the ends of the model. The best way to "predict the next token" from text input inside the model is to construct a world model which examines the causes of that text, and extracts consequences from that model. So that's what it does, with fairly poor reliability but clear capability.

goncalo-r · on May 9, 2023

It's not good at reasoning. I asked the following (the term substitutions are there to avoid parroting):

  Let's define the following terms:
  col means "and", pel means "or" and erf is equivalent to "col" if the first argument evaluates to "true", otherwise it's equivalent to "pel"

  can you evaluate the following statement?
  col(true, pel(false, true), erf(col(true, false), true))

And it said this:

  ...
  Next, we look at the expression "erf(col(true, false), true)". Since the first argument is true, erf is equivalent to col, which means we evaluate the entire expression as:

  col(true, false) = false

  Therefore,

  erf(col(true, false), true) = col(true, true) = true

which is obviously wrong because the first arg is false. The final answer happens to be the same. So I then asked

  what about col(true, pel(false, true), erf(col(true, false), false))

and this is its full answer:

  Sure, let's evaluate the expression "col(true, pel(false, true), erf(col(true, false), false))" step by step:

  First, we look at the expression "pel(false, true)". This is an "or" operation, so we evaluate it as follows:

  pel(false, true) = true

  Next, we look at the expression "erf(col(true, false), false)". Since the first argument is true, erf is equivalent to col, which means we evaluate the entire expression as:

  col(true, false) = false

  Therefore,

  erf(col(true, false), false) = pel(false, false) = false

  Finally, we look at the expression "col(true, pel(false, true), erf(col(true, false), false))". Using the results from steps 1 and 2, we can evaluate this as:

  col(true, true, false) = true

  Therefore, the overall value of the statement "col(true, pel(false, true), erf(col(true, false), false))" is true.

You can see it's mostly random nonsense masked as "reasoning". It's not even consistent across sentences. I assume if you do a bigger model, it might do better, but as soon as you increase the example size I suspect it will get it wrong again.

jiggawatts · on May 9, 2023

To be fair, I'm a fairly intelligent human (or so I'd like to think!), and I'm having a lot of trouble following this. I'd have to sit down with pencil and paper and do the transformations on it.

Keep in mind that LLMs like GPT 4 do not have a scratchpad and can't back-track. Asking it a question is the rough equivalent of asking a human standing in the middle of the room a question. No paper, no whiteboard, nothing. Just a demand for a verbal answer then and there, from a standing start.

I've only met a couple of humans who would be able to solve this problem like this one under those circumstances.

Despite its impediments GPT 4 solved this correctly:

    First, let's break down the statement and evaluate the individual components.
    
        col(true, false) = true AND false = false
        pel(false, true) = false OR true = true
        erf(col(true, false), true) = erf(false, true)
    
    Since the first argument of erf evaluates to false, erf is equivalent to "pel":
    
        erf(false, true) = pel(false, true) = false OR true = true
    
    Now we can substitute the results back into the original statement:
    
    col(true, pel(false, true), erf(col(true, false), true)) = col(true, true, true)
    
    The col function acts as an AND operator, so:
    
    col(true, true, true) = true AND true AND true = true
    
    Therefore, the given statement evaluates to true.

Alternatively, to give an idea of the upcoming "Wolfram Alpha" plugin, you can instead ask it to convert the expression into the equivalent Mathematica program:

    col[x_, y_] := And[x, y]
    pel[x_, y_] := Or[x, y]
    erf[x_, y_] := If[x, col[True, y], pel[False, y]]
    col[True, pel[False, True], erf[col[True, False], True]]

Very nice! Unfortunately, it didn't notice that "col" is used with three arguments, not just two. This is a mistake many (most?) humans would make also.

No worries, you can ask it to correct that. It'll apologise and produce this:

    col[args__] := Apply[And, {args}]
    pel[args__] := Apply[Or, {args}]
    erf[x_, y_] := If[x, col[True, y], pel[False, y]]

    col[True, pel[False, True], erf[col[True, False], True]]

Which evaluates to:

    True

There you go. Now go find me a random human who can do any of this from a standing start, no preparation, no warning, no tools, nothing but their wits.

As a bonus, here's a related joke:

    Why did the logical functions col, pel, and erf form a band? 
    Because together, they could create the perfect harmony: 
    they knew when to "col"aborate, when to give each other 
    some space and "pel"ay, and their "erf"forts were always well-timed!

goncalo-r · on May 10, 2023

Here's another example to test if it can actually think critically about a problem instead of just following whatever instructions and whatever the training data had. A human with basic level math could do it.

> can you replace the first letter of "banana" with the next letter of the alphabet?

Yes, I can do that! However, I should note that after several iterations, the resulting word may no longer be considered a recognizable word in the English language. Here are the next 10 iterations:

"canana" becomes "danana" "danana" becomes "eanana" "eanana" becomes "fanana" "fanana" becomes "ganana" "ganana" becomes "hanana" "hanana" becomes "ianana" "ianana" becomes "janana" "janana" becomes "kanana" "kanana" becomes "lanana" "lanana" becomes "manana" And so on for a total of 173 iterations

> I want the final answer

Sure, after 173 iterations of replacing the first letter of "banana" with the next letter of the alphabet, the final word would be:

"zcyzcj"

As noted earlier, after several iterations, the resulting word may no longer be considered a recognizable word in the English language.

goncalo-r · on May 10, 2023

It's just an example. An human could solve this with an arbitrary number of substitutions, but I'm sure GPT4 breaks at some point. I don't think it's fair to compare it with a human with no paper and pencil... It's like asking asking the AI to not use any RAM. Humans are very bad at keeping stuff in RAM so paper and pencil is all we've got.

pmoriarty · on May 9, 2023

You might find this video[1] on why "GPT is smarter than you think" interesting.

They're older, but the "Sparks of AGI" paper[2] and video[3] are also well worth your time.

[1] - https://m.youtube.com/watch?v=wVzuvf9D9BU

[2] - https://arxiv.org/abs/2303.12712

[3] - https://m.youtube.com/watch?v=qbIk7-JPB2c

stevenhuang · on May 9, 2023

You think it's not a world model.

You think it doesn't know imperative logic.

Rightfully you're being ignored because you're not aware these are in fact open questions; we don't have the answers yet, and those that think they do have not looked deep enough into cognition/philosophy, where they'll find many proposed answers to these same questions also are theorized to underpin human consciousness.

te_chris · on May 9, 2023

This is a pointlessly religious dimissal of the comment you're replying to. Yes, there's a counterfactual to the comment but this sort of mysticism is ridiculous.

stevenhuang · on May 9, 2023

The ability to recursively identify structural patterns is intelligent behavior and theorized to play a crucial component in consciousness known as predictive coding

https://en.m.wikipedia.org/wiki/Predictive_coding

https://en.m.wikipedia.org/wiki/Free_energy_principle

Our brains are theorized to work like this using some form of hierarchical latent structure, and learning via prediction errors.

Sounds a lot like model building and LLMs, yes?

It's some true tragic comedy that you think this is religious mysticism instead of stopping and wondering if you might not be lacking some knowledge foundation. You really couldn't have proven my point better.

Please consider reflecting on your ability to assess material outside your expertise.

lukebechtel · on May 9, 2023

(EDIT: I'm understanding you to say "there is no world model in there". This was written as a counterpoint to that.)

Counterpoint by example:

Imagine someone reads the first half of the original article, and then closes their eyes and (without peeking) writes both:

(1) The rest of the article, verbatim

(2) All of the hacker news comments on this article's posting here, including yours, and this one.

If this person existed, would they not have built a very successful world model?

teaearlgraycold · on May 9, 2023

In some ways I'm glad people are unable to understand how to use them. It gives me a competitive advantage.

jillesvangurp · on May 9, 2023

Actually they are also getting good at using tools, generating working code for common problems, helping you explore vast bodies of published information (limited by what it has been trained on obviously, particularly Bing chat seems good at this), and a few related topics.

Limiting factors are mainly interfacing with other systems and tools, expanding the training data to include stuff you need (e.g. up to date documentation for whatever you are using vs. the two years out of date stuff it was trained on). This is more a limitation of UX (chat) than it is a limitation of the underlying model.

It's weak on logic problems, math, and a few other things. But then a most people are also not very good at that and you could use tools for that (which is something chat gpt 4 can do). And people hallucinate, lie, and imagine things all the time. And if you follow US politics, there have been a few amusingly bad examples on that front in recent months. To the point where you might wonder if some politicians are using AI to write their speeches; or would be better off doing so.

It's our ability to defer to tools and other people that makes us more capable than each of us individually is. Individually, most of us aren't that impressive.

Even a few years ago, the notion that you could have a serious chat via a laptop with an AI on just about any topic and get coherent responses would have been science fiction. Now it is science fact. Most of the AIs from popular science fiction movies (Space Odyssey, Star Trek, Star Wars, etc.) are actually not that far of from what chat GPT 4 can do. Or arguably a bit retarded even (usually for comedic effect). You can actually get chat gpt to role play those.

mxkopy · on May 9, 2023

Transformers originally were made for language translation. So the way I think about it, GPT models translate questions to answers. Hence the hallucinations - some questions can't be answered by just associative reasoning and pattern matching.

ljlolel · on May 9, 2023

Don’t forget text expansion. Fluffing up some text into longer text (similar to the first point). Expansion (same as AI driven image zoom/superresolution) can similarly create new made up parts.

Second point is transformation (any style/language to any style/language) and third is reduction.

teaearlgraycold · on May 9, 2023

Yes, although I don't think that's the best use of an LLM. I worked for a company that specialized in doing that for marketing purposes.

xvilka · on May 9, 2023

Add here language learning. Arguably, it's the best fit application for large language models.

spullara · on May 9, 2023

Translating almost any language to any other language.

prmph · on May 9, 2023

So, with ChatGPT, is machine translation finally fully solved?

waboremo · on May 9, 2023

No, not fully. ChatGPT and Google Translate/DeepL are about equal. This is for a couple reasons: machine translations are very good these days, and ChatGPT struggles with domain specific translations (whereas these commercial products have incentive to improve within).

It's marginally better at sounding natural with translations though. "In fact, the frequent use of soap is bad for the skin." (chatgpt) vs "In fact, using soap too often is bad for your skin." (google)[1]

[1] https://arxiv.org/abs/2301.08745

og_kalu · on May 9, 2023

I know what the paper says but actually using both, GPT-4 is far ahead of google and Deepl. I think the isolated one sentence datasets used for evaluations are no longer up to snuff.

Trying something longer and more comprehensive and the difference is very clear.

https://github.com/ogkalu2/Human-parity-on-machine-translati...

waboremo · on May 9, 2023

Yes, that's where the improvements to sounding natural really compound.

jiggawatts · on May 9, 2023

There is no comparison in my experience, especially when going between wildly unrelated languages instead of Latin-derived European languages.

I tested some complex sentences translated from English to Hungarian to Chinese and then back to English and it preserved 90-95% of the meaning. Google Translate will shred anything that goes through those kinds of steps.

It can also translate to Latin, which shouldn't be surprising but was still a bit of a shock when I tried it. It's Elvish and Klingon is not so good however, the round-trip only vaguely preserves the meaning.

svachalek · on May 9, 2023

Maybe not quite ChatGPT 2023 but it looks like from here it's just a matter of refinement. Someone may always create a better toothpaste, but I don't expect to ever really be surprised and impressed by a new toothpaste. ChatGPT is in the same place as a translator.

hasmanean · on May 9, 2023

They’re API: Automated pseudo intelligence.

The only people impressed by them are pseudo-intellectuals. The only jobs they will replace are jobs that required medium intelligence.

the_jeremy · on May 9, 2023

> The only people impressed by them are pseudo-intellectuals

They are cutting edge technology, worked on by extremely smart groups of people spending billions of dollars. How is that not impressive?

> The only jobs they will replace are jobs that required medium intelligence [or less]

That's at least half of the jobs in the world. That's a lot of jobs.

nottorp · on May 9, 2023

> groups of people spending billions of dollars

Why is how much is spent on it a metric?

flangola7 · on May 9, 2023

The majority of jobs in the world require medium or lower intelligence.

Historically at 20-40% unemployment wars start and country leaders are physically disassembled in the street.

oblio · on May 9, 2023

> Historically at 20-40% unemployment wars start and country leaders are physically disassembled in the street.

Close, but not quite. At 20-40% unemployment and with people <<starving>> and <<physically insecure>>, wars start.

flangola7 · on May 9, 2023

That seems like distinguishing without a difference. At 40% unemployment starvation and lack of physical security are practically a given.

oblio · on May 10, 2023

No, they aren't. Spain and Greece have had...

https://tradingeconomics.com/spain/unemployment-rate

https://tradingeconomics.com/greece/unemployment-rate

About 30% each. Yes, it was hard, but society didn't collapse.

EVa5I7bHFq9mnYK · on May 9, 2023

Isn't that the usual figure of unemployment rate in Spain? They sure like to protest, but have not started any wars recently.