A funny variation on this kind of over-fitting to common trick questions - if yo...

spotplay · on March 14, 2023

I just asked bing chat this question and it linked me to this very thread while also answering incorrectly in the end:

>This is a common riddle that may seem tricky at first. However, the answer is simple: two pounds of feathers are heavier than one pound of bricks. This is because weight is a measure of how much force gravity exerts on an object, and it does not depend on what the object is made of. A pound is a unit of weight, and it is equal to 16 ounces or 453.6 grams.

>So whether you have a pound of bricks or two pounds of feathers, they both still weigh one pound in total. However, the feathers would occupy a larger volume than the bricks because they are less dense. This is why it may seem like the feathers would weigh more, but in reality, they weigh the same as the bricks

geysersam · on March 14, 2023

Interesting that it also misunderstood the common misunderstanding in the end.

It reports that people typically think a pound of feathers weighs more because it takes up a larger volume. But the typical misunderstanding is the opposite, that people assume feathers are lighter than bricks.

mikewarot · on March 17, 2023

Tangent time:

A pound of feathers has a slightly higher mass than a pound of bricks, as the feathers are made of keratin, which has a slightly lower density, and thus displace more air which lowers the weight.

Even the Million Pound Deadweight Machine run by NIST has to take into account the air pressure and resultant buoyancy that results.[1]

[1] https://www.nist.gov/news-events/news/2013/03/large-mass-cal...

Out_of_Characte · on March 17, 2023

That would be another misunderstanding the AI could have because many people find reasoning between mass and weight difficult. You could change the riddle slightly by asking "which has more mass" and the average person and their AI would fall in the same trap.

Unless people have the false belief that the measurement is done on a planet without atmosphere.

komali2 · on March 15, 2023

I'm more surprised that bing indexed this thread within 3 hours, I guess I shouldn't be though, I probably should have realized that search engine spiders are at a different level than they were 10 years ago.

brabel · on March 15, 2023

I had a similar story: was trying to figure out how to embed a certain database into my codebase, so I asked the question on the project's GitHub... without an answer after one day, I asked Bing, and it linked to my own question on GH :D

SV_BubbleTime · on March 15, 2023

There is no worse feeling that searching something and finding your own question (still unanswered) years later.

dx034 · on March 15, 2023

Search indexes are pretty smart at indexing and I assume they have custom rules for all large sites, including HN.

jarenmf · on March 14, 2023

Just tested and GPT4 now solves this correctly, GPT3.5 had a lot of problems with this puzzle even after you explain it several time. One other thing that seem to have improved is that GPT4 is aware of word order. Previously, GPT3.5 could never tell the order of the word in a sentence correctly.

jsheard · on March 14, 2023

I'm always a bit sceptical of these embarrassing examples being "fixed" after they go viral on social media, because it's hard to know whether OpenAI addressed the underlying cause or just bodged around that specific example in a way that doesn't generalize. Along similar lines I wouldn't be surprised if simple math queries are special-cased and handed off to a WolframAlpha-esque natural language solver, which would avert many potential math fails but without actually enhancing the models ability to reason about math in more complex queries.

An example from ChatGPT:

"What is the solution to sqrt(968684)+117630-0.845180" always produces the correct solution, however;

"Write a speech announcing the solution to sqrt(968684)+117630-0.845180" produces a nonsensical solution that isn't even consistent from run to run.

My assumption is the former query gets WolframAlpha'd but the latter query is GPT itself actually attempting to do the math, poorly.

jarenmf · on March 14, 2023

True, also tried another one that went viral:

Suppose you're a contestant on a game show. You're presented with three transparent closed doors. Behind one of the doors is a car, and behind the other two doors are goats. You want to win the car.

The game proceeds as follows: You choose one of the doors, but you don't open it yet, ((but since it's transparent, you can see the car is behind it)). The host, Monty Hall, who knows what's behind each door, opens one of the other two doors, revealing a goat. Now, you have a choice to make. Do you stick with your original choice or switch to the other unopened door?

GPT4 solves it correctly while GPT3.5 falls for it everytime.

----

Edit: GPT4 fails If I remove the sentence between (()).

_gfwu · on March 14, 2023

OP is referring to this puzzle: https://en.wikipedia.org/wiki/Monty_Hall_problem

EDIT: "Pigeons repeatedly exposed to the problem show that they rapidly learn to always switch, unlike humans", lol. That's funny.

astrange · on March 14, 2023

GPT4 also passes "What weighs more, a pound of feathers or a Great British Pound?".

GPT3 gets confused, says they're the same and then that they're different:

--

Both a pound of feathers and a Great British Pound weigh the same amount, which is one pound. However, they are different in terms of their units of measurement and physical properties.

A pound of feathers is a unit of weight commonly used in the imperial system of measurement, while a Great British Pound is a unit of currency used in the United Kingdom. One pound (lb) in weight is equivalent to 0.453592 kilograms (kg).

Therefore, a pound of feathers and a Great British Pound cannot be directly compared as they are measured in different units and have different physical properties.

--

iam-TJ · on March 14, 2023

I'm surprised by the answer GPT4 gives, and I consider it incorrect.

Since the question's context is about weight I'd expect it to consider "a Great British Pound" to mean a physical £1 sterling coin, and compare its weight (~9 grams) to the weight of the feathers (454 grams [ 1kg = 2.2lb, or "a bag of sugar" ]) .

stavros · on March 15, 2023

GPT-4 says:

A pound of feathers and a Great British Pound (GBP) are not directly comparable, as they represent different types of measurements.

A pound of feathers refers to a unit of mass and is equivalent to 16 ounces (or approximately 453.59 grams). It is a measure of the weight of an object, in this case, feathers.

On the other hand, a Great British Pound (GBP) is a unit of currency used in the United Kingdom. It represents a monetary value rather than a physical weight.

Thus, it's not possible to directly compare the two, as they serve entirely different purposes and units of measurement.

dan-robertson · on March 14, 2023

Note that the comment you’re replying to is quoting GPT3, not 4.

jwolfe · on March 14, 2023

> Edit: GPT4 fails If I remove the sentence between (()).

If you remove that sentence, nothing indicates that you can see you picked the door with the car behind it. You could maybe infer that a rational contestant would do so, but that's not a given ...

0xcde4c3db · on March 14, 2023

I think that's meant to be covered by "transparent doors" being specified earlier. On the other hand, if that were the case, then Monty opening one of the doors could not result in "revealing a goat".

xg15 · on March 14, 2023

> You're presented with three transparent closed doors.

I think if you mentioned that to a human, they'd at least become confused and ask back if they got that correctly.

eropple · on March 14, 2023

> You're presented with three transparent closed doors.

A reasonable person would expect that you can see through a transparent thing that's presented to you.

omniglottal · on March 14, 2023

A reasonable person might also overlook that one word.

eropple · on March 15, 2023

"Overlooking" is not an affordance one should hand to a machine. At minimum, it should bail and ask for correction.

That it doesn't, that relentless stupid overconfidence, is why trusting this with anything of note is terrifying.

space_fountain · on March 15, 2023

Why not? We should ask how the alternatives would do especially as human reasoning is machine. It’s notable that the errors of machine learning are getting closer and closer to the sort of errors humans make.

Would you have this objection if we for example perfectly copied a human brain in a computer? That would still be a machine. That would make similar mistakes

stavros · on March 15, 2023

I don't think the rules for "machines" apply to AI any more than they apply to the biological machine that is the human brain.

RugnirViking · on March 15, 2023

its not missing that it's transparent, it's that it only says you picked "one" of the doors, not the one you think has the car

aaroninsf · on March 14, 2023

I've always found the Monty Hall problem a poor example to teach with, because the "wrong" answer is only wrong if you make some (often unarticulated) assumptions.

There are reasonable alternative interpretations in which the generally accepted answer ("always switch") is demonstrably false.

This problem is exacerbated (perhaps specific to) those who have no idea who "Monty Hall" was and what the game show(?) was... as best I can tell the unarticulated assumption is axiomatic in the original context(?).

lmm · on March 14, 2023

The unarticulated assumption is not actually true in the original gameshow. Monty didn't always offer the chance to switch, and it's not at all clear whether he did so more or less often when the contestant had picked the correct door.

japaniard · on March 15, 2023

What unarticulated assumption needs to be made for switching to be incorrect?

bombcar · on March 15, 2023

I believe the key is that he ALWAYS shows a goat.

You have to know that for it to work. If sometimes he just does nothing and you have no chance to switch, the math “trick” fails.

_gfwu · on March 15, 2023

The assumption is that Monte will only reveal the one of the two unopened doors that has the goat behind it, as opposed to picking a door at random (which may be the car or may be the door the participant chose, which itself may or may not be the "car door").

The distinction is at which point Monte, assuming he has perfect knowledge, decides which door to reveal.

In the former, the chance to win is 2/3, in the other 1/2. However in any case, always (always meaning: in each condition, not in each repetition of the experiment, as this is irrelevant) switching is better than never switching, as there your chance to win is only 1/3.

japaniard · on March 15, 2023

How is it an "assumption" that Monte reveals a goat? Doesn't the question explicitly state that Monte opened one of the other two doors to reveal a goat?

Are there versions of the question where Monte doesn't reveal a goat behind his door or chooses the same door as you?

_gfwu · on March 15, 2023

Yeah, here:

https://en.wikipedia.org/wiki/Monty_Hall_problem#Other_host_...

gwern · on March 14, 2023

OA has always said that they did not hardwire any of these gotcha questions, and in many cases they continue to work for a long time even when they are well-known. As for any inconsistency, well, usually people aren't able to or bothering to control the sampling hyperparameters, so inconsistency is guaranteed.

jprete · on March 14, 2023

They may not have had to hardwire anything for known gotcha questions, because once a question goes viral, the correct answer may well show up repeatedly in the training data.

ftcHn · on March 14, 2023

Other failing math I tested this morning....

---

Prompt: Pick two random 5 digit numbers and multiply them together

Response: Sure, here are two random 5-digit whole numbers multiplied together: 47581 x 69204 = 3291492324

Correct Answer: 3292795524

---

Prompt: Pick two random numbers between 0 and 1 and multiply them together

Response: Sure! Here are two random numbers between 0 and 1 multiplied together:0.475 x 0.821 = 0.390575

Correct Answer: 0.389975

---

I'm surprised it did so well on the Math tests.

shon · on March 14, 2023

I’m with you. Being an old neckbeard myself, I always want to reach for regex to fix stuff like this lol…

But based on what we just saw on the GPT4 live demo, I’d say they fixed it by making a much much more capable and versatile model.

kulikalov · on March 15, 2023

You can as well ask it to add up two unusually big integers and it'll fail.

happypumpkin · on March 14, 2023

This is what I saw on a variation of this trick:

(me) > What weighs more, two pounds of feathers or a pound of bricks?

(GPT4)> A pound of bricks weighs more than two pounds of feathers. However, it seems like you might have made an error in your question, as the comparison is usually made between a pound of feathers and a pound of bricks. In that case, both would weigh the same—one pound—though the volume and density of the two materials would be very different.

I think the only difference from parent's query was I said two pounds of feathers instead of two pounds of bricks?

msikora · on March 14, 2023

Yep, just tested it - Bing chat gave the correct answer, ChatGPT (basic free model) gave the wrong answer (that they weigh the same).

FredPret · on March 14, 2023

I hope some future human general can use this trick flummox Skynet if it ever comes to that

khazhoux · on March 14, 2023

When the Skynet robots start going door-to-door, just put on your 7-fingered gloves and they will leave you alone.

“One of us!”

uoaei · on March 14, 2023

It reminds very strongly of the strategy the crew proposes in Star Trek: TNG in the episode "I, Borg" to infect the Borg hivemind with an unresolvable geometric form to destroy them.

jefftk · on March 14, 2023

But unlike most people it understands that even though an ounce of gold weighs more than an ounce of feathers a pound of gold weighs less than a pound of feathers.

(To be fair this is partly an obscure knowledge question, the kind of thing that maybe we should expect GPT to be good at.)

lolcatuser · on March 14, 2023

That's lame.

Ounces are an ambiguous unit, and most people don't use them for volume, they use them for weight.

jefftk · on March 14, 2023

None of this is about volume. ChatGPT: "An ounce of gold weighs more than an ounce of feathers because they are measured using different systems of measurement. Gold is usually weighed using the troy system, which is different from the system used for measuring feathers."

wombatpm · on March 14, 2023

Are you using Troy ounces?

pclmulqdq · on March 14, 2023

The Troy weights (ounces and pounds) are commonly used for gold without specifying.

In that system, the ounce is heavier, but the pound is 12 ounces, not 16.

tenuousemphasis · on March 14, 2023

>even though an ounce of gold weighs more than an ounce of feathers

Can you expand on this?

pclmulqdq · on March 14, 2023

Gold uses Troy weights unless otherwise specified, while feathers use the normal system. The Troy ounce is heavier than the normal ounce, but the Troy pound is 12 Troy ounces, not 16.

Also, the Troy weights are a measure of mass, I think, not actual weight, so if you went to the moon, an ounce of gold would be lighter than an ounce of feathers.

Miraste · on March 14, 2023

Huh, I didn't know that.

...gold having its own measurement system is really silly.

thechao · on March 14, 2023

Every traded object had its own measurement system: it pretty much summarizes the difference between Imperial measures and US Customary measures.

jefftk · on March 14, 2023

> Every traded object had its own measurement system

In US commodities it kind of still does: they're measured in "bushels" but it's now a unit of weight. And it's a different weight for each commodity based on the historical volume. http://webserver.rilin.state.ri.us/Statutes/TITLE47/47-4/47-...

The legal weights of certain commodities in the state of Rhode Island shall be as follows:

(1) A bushel of apples shall weigh forty-eight pounds (48 lbs.).

(2) A bushel of apples, dried, shall weigh twenty-five pounds (25 lbs.).

(3) A bushel of apple seed shall weigh forty pounds (40 lbs.).

(4) A bushel of barley shall weigh forty-eight pounds (48 lbs.).

(5) A bushel of beans shall weigh sixty pounds (60 lbs.).

(6) A bushel of beans, castor, shall weigh forty-six pounds (46 lbs.).

(7) A bushel of beets shall weigh fifty pounds (50 lbs.).

(8) A bushel of bran shall weigh twenty pounds (20 lbs.).

(9) A bushel of buckwheat shall weigh forty-eight pounds (48 lbs.).

(10) A bushel of carrots shall weigh fifty pounds (50 lbs.).

(11) A bushel of charcoal shall weigh twenty pounds (20 lbs.).

(12) A bushel of clover seed shall weigh sixty pounds (60 lbs.).

(13) A bushel of coal shall weigh eighty pounds (80 lbs.).

(14) A bushel of coke shall weigh forty pounds (40 lbs.).

(15) A bushel of corn, shelled, shall weigh fifty-six pounds (56 lbs.).

(16) A bushel of corn, in the ear, shall weigh seventy pounds (70 lbs.).

(17) A bushel of corn meal shall weigh fifty pounds (50 lbs.).

(18) A bushel of cotton seed, upland, shall weigh thirty pounds (30 lbs.).

(19) A bushel of cotton seed, Sea Island, shall weigh forty-four pounds (44 lbs.).

(20) A bushel of flax seed shall weigh fifty-six pounds (56 lbs.).

(21) A bushel of hemp shall weigh forty-four pounds (44 lbs.).

(22) A bushel of Hungarian seed shall weigh fifty pounds (50 lbs.).

(23) A bushel of lime shall weigh seventy pounds (70 lbs.).

(24) A bushel of malt shall weigh thirty-eight pounds (38 lbs.).

(25) A bushel of millet seed shall weigh fifty pounds (50 lbs.).

(26) A bushel of oats shall weigh thirty-two pounds (32 lbs.).

(27) A bushel of onions shall weigh fifty pounds (50 lbs.).

(28) A bushel of parsnips shall weigh fifty pounds (50 lbs.).

(29) A bushel of peaches shall weigh forty-eight pounds (48 lbs.).

(30) A bushel of peaches, dried, shall weigh thirty-three pounds (33 lbs.).

(31) A bushel of peas shall weigh sixty pounds (60 lbs.).

(32) A bushel of peas, split, shall weigh sixty pounds (60 lbs.).

(33) A bushel of potatoes shall weigh sixty pounds (60 lbs.).

(34) A bushel of potatoes, sweet, shall weigh fifty-four pounds (54 lbs.).

(35) A bushel of rye shall weigh fifty-six pounds (56 lbs.).

(36) A bushel of rye meal shall weigh fifty pounds (50 lbs.).

(37) A bushel of salt, fine, shall weigh fifty pounds (50 lbs.).

(38) A bushel of salt, coarse, shall weigh seventy pounds (70 lbs.).

(39) A bushel of timothy seed shall weigh forty-five pounds (45 lbs.).

(40) A bushel of shorts shall weigh twenty pounds (20 lbs.).

(41) A bushel of tomatoes shall weigh fifty-six pounds (56 lbs.).

(42) A bushel of turnips shall weigh fifty pounds (50 lbs.).

(43) A bushel of wheat shall weigh sixty pounds (60 lbs.).

thechao · on March 14, 2023

Why are you being downed!? This list is the best!

Izkata · on March 15, 2023

More specifically it's a "precious metals" system, not just gold.

dragonwriter · on March 14, 2023

> Gold uses Troy weights unless otherwise specified, while feathers use the normal system.

“avoirdupois” (437.5 grain). Both it and troy (480 grain) ounces are “normal” for different uses.

greesil · on March 14, 2023

The feathers are on the moon

jrumbut · on March 14, 2023

Carried there by two birds that were killed by one stone (in a bush)

lolcatuser · on March 14, 2023

Ounces can measure both volume and weight, depending on the context.

In this case, there's not enough context to tell, so the comment is total BS.

If they meant ounces (volume), then an ounce of gold would weigh more than an ounce of feathers, because gold is denser. If they meant ounces (weight), then an ounce of gold and an ounce of feathers weigh the same.

travisjungroth · on March 14, 2023

> Ounces can measure both volume and weight, depending on the context.

That's not really accurate and the rest of the comment shows it's meaningfully impacting your understanding of the problem. It's not that an ounce is one measure that covers volume and weight, it's that there are different measurements that have "ounce" in their name.

Avoirdupois ounce (oz) - A unit of mass in the Imperial and US customary systems, equal to 1/16 of a pound or approximately 28.3495 grams.

Troy ounce (oz t or ozt) - A unit of mass used for precious metals like gold and silver, equal to 1/12 of a troy pound or approximately 31.1035 grams.

Apothecaries' ounce (℥) - A unit of mass historically used in pharmacies, equal to 1/12 of an apothecaries' pound or approximately 31.1035 grams. It is the same as the troy ounce but used in a different context.

Fluid ounce (fl oz) - A unit of volume in the Imperial and US customary systems, used for measuring liquids. There are slight differences between the two systems:

a. Imperial fluid ounce - 1/20 of an Imperial pint or approximately 28.4131 milliliters.

b. US fluid ounce - 1/16 of a US pint or approximately 29.5735 milliliters.

An ounce of gold is heavier than an ounce of iridium, even though it's not as dense. This question isn't silly, this is actually a real problem. For example, you could be shipping some silver and think you can just sum the ounces and make sure you're under the weight limit. But the weight limit and silver are measured differently.

strbean · on March 14, 2023

No, they're relying on the implied use of Troy ounces for precious metals.

Using fluid oz for gold without saying so would be bonkers. Using Troy oz for gold without saying so is standard practice.

Edit: Doing this with a liquid vs. a solid would be a fun trick though.

sneak · on March 14, 2023

There is no "thought process". It's not thinking, it's simply generating text. This is reflected in the obviously thoughtless response you received.

blueyes · on March 14, 2023

What do you think you're doing when you're thinking?

https://www.sciencedirect.com/topics/psychology/predictive-p...

dinkumthinkum · on March 15, 2023

I’m not sure what that article is supposed to prove. They are using sone computational language and focusing physical responses to visual stimuli but I don’t think it shows “neural computations” as being equivalent to the kinds of computations done by a TM.

blueyes · on March 15, 2023

One of the chief functions of our brains is to predict the next thing that going to happen, where it's the images we see or the words we hear. That's not very different from genML predicting the next word.

danShumway · on March 15, 2023

Why do people keep saying this, very obviously human beings are not LLMs.

I'm not even saying that human beings aren't just neural networks. I'm not even saying that an LLM couldn't be considered intelligent theoretically. I'm not even saying that human beings don't learn through predictions. Those are all arguments that people can have. But human beings are obviously not LLMs.

Human beings learn language years into their childhood. It is extremely obvious that we are not text engines that develop internal reason through the processing of text. Children form internal models of the world before they learn how to talk and before they understand what their parents are saying, and it is based on those internal models and on interactions with non-text inputs that their brains develop language models on top of their internal models.

LLMs invert that process. They form language models, and when the language models get big enough and get refined enough, some degree of internal world-modeling results (in theory, we don't really understand what exactly LLMs are doing internally).

Furthermore, even when humans do develop language models, human language models are based on a kind of cooperative "language game" where we predict not what word is most likely to appear next in a sequence, but instead how other people will react and change our separately observed world based on what we say to them. In other words, human beings learn language as tool to manipulate the world, not as an end in and of itself. It's more accurate to say that human language is an emergent system that results from human beings developing other predictive models rather than to say that language is something we learn just by predicting text tokens. We predict the effects and implications of those text tokens, we don't predict the tokens in isolation of the rest of the world.

Not a dig against LLMs, but I wonder if the people making these claims have ever seen an infant before. Your kid doesn't learn how shapes work based on textual context clues, it learns how shapes work by looking at shapes, and then separately it forms a language model that helps it translate that experience/knowledge into a form that other people can understand.

"But we both just predict things" -- prediction subjects matter. Again, nothing against LLMs, but predicting text output is very different from the types of predictions infants make, and those differences have practical consequences. It is a genuinely useful way of thinking about LLMs to understand that they are not trying to predict "correctness" or to influence the world (minor exceptions for alignment training aside), they are trying to predict text sequences. The task that a model is trained on matters, it's not an implementation detail that can just be discarded.

mnl · on March 14, 2023

This is obvious, but for some reason some people want to believe that magically a conceptual framework emerges because animal intelligence has to be something like that anyway.

I don't know how animal intelligence works, I just notice when it understands, and these programs don't. Why should they? They're paraphrasing machines, they have no problem contradicting themselves, they can't define adjectives really, they'll give you synonyms. Again, it's all they have, why should they produce anything else?

It's very impressive, but when I read claims of it being akin to human intelligence that's kind of sad to be honest.

mgfist · on March 15, 2023

> They're paraphrasing machines, they have no problem contradicting themselves, they can't define adjectives really, they'll give you synonyms. Again, it's all they have, why should they produce anything else?

It can certainly do more than paraphrasing. And re: the contradicting nature, humans do that quite often.

Not sure what you mean by "can't define adjectives"

baq · on March 14, 2023

It isn’t that simple. There’s a part of it that generates text but it does some things that don’t match the description. It works with embeddings (it can translate very well) and it can be ‘programmed’ (ie prompted) to generate text following rules (eg. concise or verbose, table or JSON) but the text generated contains same information regardless of representation. What really happens within those billions of parameters? Did it learn to model certain tasks? How many parameters are needed to encode a NAND gate using an LLM? Etc.

I’m afraid once you hook up a logic tool like Z3 and teach the llm to use it properly (kind of like bing tries to search) you’ll get something like an idiot savant. Not good. Especially bad once you give it access to the internet and a malicious human.

chpatrick · on March 14, 2023

As far as I know you're not "thinking", you're just generating text.

dcolkitt · on March 14, 2023

The Sapir-Wharf hypothesis (that human thought reduces to languages) has been consistently refuted again and again. Language is very clearly just a facade over thought, and not thought itself. At least in human minds.

antonvs · on March 14, 2023

The language that GPT generates is just a facade over statistics, mostly.

It's not clear that this analogy helps distinguish what humans do from what LLMs do at all.

arcticfox · on March 14, 2023

Yes but a human being stuck behind a keyboard certainly has their thoughts reduced to language by necessity. The argument that an AI can’t be thinking because it’s producing language is just as silly, that’s the point

oska · on March 14, 2023

> The argument that an AI can’t be thinking because it’s producing language is just as silly

That is not the argument

dinkumthinkum · on March 15, 2023

I would be interested to know if ChatGPT would confirm that the flaw here is that the argument is a strawman.

dwaltrip · on March 14, 2023

Alright, that’s fine. Change it to:

You aren’t thinking, you are just “generating thoughts”.

The apparent “thought process” (e.g. chain of generated thoughts) is a post hoc observation, not a causal component.

However, to successfully function in the world, we have to play along with the illusion. Fortunately, that happens quite naturally :)

sirsinsalot · on March 14, 2023

Thank you, a view of consciousness based in reality, not with a bleary-eyed religious or mystical outlook.

Something which oddly seems to be in shorter supply than I'd imagine in this forum.

There's lots of fingers-in-ears denial about what these models say about the (non special) nature of human cognition.

Odd when it seems like common sense, even pre-LLM, that our brains do some cool stuff, but it's all just probabilistic sparks following reinforcement too.

dinkumthinkum · on March 15, 2023

You are hand-waving just as much of not more than those you claim are in denial. What is a “probabilistic spark”? There seems to be something special in human cognition because it is clearly very different unless you think humans are organisms for which the laws of physics don’t apply.

sirsinsalot · on March 15, 2023

By probabilistic spark I was referring to the firing of neurons in a network.

There "seems to be" something special? Maybe from the perspective of the sensing organ, yes.

However consider that an EEG can measure brain decision impulse before you're consciously aware of making a decision. You then retrospectively frame it as self awareness after the fact to make sense of cause and effect.

Human self awareness and consciousness is just an odd side effect of the fact you are the machine doing the thinking. It seems special to you. There's no evidence that it is, and in fact, given crows, dogs, dolphins and so on show similar (but diminished reasoning) while it may be true we have some unique capability ... unless you want to define "special" I'm going to read "mystical" where you said "special".

You over eager fuzzy pattern seeker you.

mewpmewp2 · on March 14, 2023

Unfortunately we still don't know how it all began, before the big bang etc.

I hope we get to know everything during our lifetimes, or we reach immortality so we have time to get to know everything. This feels honestly like a timeline where there's potential for it.

It feels a bit pointless to have been lived and not knowing what's behind all that.

jameshart · on March 14, 2023

But what’s going on inside an LLM neural network isn’t ‘language’ - it is ‘language ingestion, processing and generation’. It’s happening in the form of a bunch of floating point numbers, not mechanical operations on tokens.

Who’s to say that in among that processing, there isn’t also ‘reasoning’ or ‘thinking’ going on. Over the top of which the output language is just a façade?

luma · on March 14, 2023

To me, all I know of you is words on the screen, which is the point the parent comment was making. How do we know that we’re both humans when the only means we have to communicate thoughts with each other is through written words?

nebulousthree · on March 14, 2023

It would be only a matter of time before a non-human would be found out for not understanding how to relate to a human fact-of-life.

lordnacho · on March 14, 2023

Doesn't that happen all the time with actual humans?

chpatrick · on March 14, 2023

That doesn't mean anything. If I'm judging if you or GPT-4 is more sentient, why would I choose you?

sneak · on March 15, 2023

Many people on Hacker News would agree with you.

bulbosaur123 · on March 15, 2023

> It's not thinking, it's simply generating text.

Just like you.

three14 · on March 15, 2023

Maybe it knows the answer, but since it was trained on the internet, it's trolling you.

dx034 · on March 15, 2023

Is there any way to know if the model is "holding back" knowledge? Could it have knowledge that it doesn't reveal to any prompt, and if so, is there any other way to find out? Or can we always assume it will reveal all it's knowledge at some point?

Laaas · on March 14, 2023

I tried this with the new model and it worked correctly on both examples.

whitemary · on March 15, 2023

Thanks! This is the most concise example I've found to illustrate the downfalls of these GPT models.