Hacker Newsnew | past | comments | ask | show | jobs | submit | jll29's commentslogin

AI professor here. I know this page is a joke, but in the interest of accuracy, a terminological comment: we don't call it a "hallucination" if a model complies exactly with what a prompt asked for and produces a prediction, exactly as requested.

Rater, "hallucinations" are spurious replacements of factual knowledge with fictional material caused by the use of statistical process (the pseudo random number generator used with the "temperature" parameter of neural transformers): token prediction without meaning representation.

[typo fixed]


(I should have thought of this yesterday but have just replaced 'hallucinates' with 'imagines' in the title...though one could object to that too...)

I agree with your first paragraph, but not your second. Models can still hallucinate when temperature is set to zero (aka when we always choose the highest probability token from the model's output token distribution).

In my mind, hallucination is when some aspect of the model's response should be consistent with reality but is not, and the reality-inconsistent information is not directly attributable or deducible from (mis)information in the pre-training corpus.

While hallucination can be triggered by setting the temperature high, it can also be the result of many possible deficiencies in model pre- and post- training that result in the model outputting bad token probability distributions.


I've never heard the caveat that it can't be attributable to misinformation in the pre-training corpus. For frontier models, we don't even have access to the enormous training corpus, so we would have no way of verifying whether or not it is regurgitating some misinformation that it had seen there or whether it is inventing something out of whole cloth.

> I've never heard the caveat that it can't be attributable to misinformation in the pre-training corpus.

If the LLM is accurately reflecting the training corpus, it wouldn’t be considered a hallucination. The LLM is operating as designed.

Matters of access to the training corpus are a separate issue.


I believe it was a super bowl ad for gemini last year where it had a "hallucination" in the ad itself. One of the screenshots of gemini being used showed this "hallucination", which made the rounds in the news as expected.

I want to say it was some fact about cheese or something that was in fact wrong. However you could also see the source gemini cited in the ad, and when you went to that source, it was some local farm 1998 style HTML homepage, and on that page they had the incorrect factoid about the cheese.


> If the LLM is accurately reflecting the training corpus, it wouldn’t be considered a hallucination. The LLM is operating as designed.

That would mean that there is never any hallucination.

The point of original comment was distinguishing between fact and fiction, which an LLM just cannot do. (It's an unsolved problem among humans, which spills into the training data)


> That would mean that there is never any hallucination.

No it wouldn’t. If the LLM produces an output that does not match the training data or claims things that are not in the training data due to pseudorandom statistical processes then that’s a hallucination. If it accurately represents the training data or context content, it’s not a hallucination.

Similarly, if you request that an LLM tells you something false and the information it provided is false, that’s not a hallucination.

> The point of original comment was distinguishing between fact and fiction,

In the context of LLMs, fact means something represented in the training set. Not factual in an absolute, philosophical sense.

If you put a lot of categorically false information into the training corpus and train an LLM on it, those pieces of information are “factual” in the context of the LLM output.

The key part of the parent comment:

> caused by the use of statistical process (the pseudo random number generator


OK if everyone else agrees with your semantics then I agree

not that the internet had contained any misinformation or FUD when the training data was collected

also, statments with certainty about fictitious "honey pot prompts" are a problem, plausibly extrapolating from the data should be more governed by internal confidence.. luckily there are benchmarks now for that i believe


The LLM is always operating as designed. All LLM outputs are "hallucinations".

The LLM is always operating as designed, but humans call its outputs "hallucinations" when they don't align with factual reality, regardless of the reason why that happens and whether it should be considered a bug or a feature. (I don't like the term much, by the way, but at this point it's a de facto standard).

That's because of rounding errors

i agree, not just the multinomial sampling that causes hallucinations. If that were the case, setting temp to 0 and just argmax over the logits would "solve" hallucinations. while round-off error causes some stochasticity it's unlikely to be the the primary cause, rather it's lossy compression over the layers that causes it.

first compression: You create embeddings that need to differentiate N tokens, JL lemma gives us a bound that modern architectures are well above that. At face value, the embeddings could encode the tokens and provide deterministic discrepancy. But words aren't monolithic , they mean many things and get contextualized by other words. So despite being above jl bound, the model still forces a lossy compression.

next compression: each layer of the transformer blows up the input to KVQ, then compresses it back to the inter-layer dimension.

finally there is the output layer which at 0 temp is deterministic, but it is heavily path dependent on getting to that token. The space of possible paths is combinatorial, so any non-deterministic behavior elsewhere will inflate the likelihood of non-deterministic output, including things like roundoff. heck most models are quantized down to 4 even2 bits these days, which is wild!


> In my mind, hallucination is when some aspect of the model's response should be consistent with reality

By "reality", do you mean the training corpus? Because otherwise, this seems like a strange standard. Models don't have access to "reality".


> Models don't have access to "reality"

This is an explanation of why models "hallucinate" not a criticism for the provided definition of hallucination.


That's a poor definition, then. It claims that a model is "hallucinating" when its output doesn't match a reference point that it can't possibly have accurate information about. How is that an "hallucination" in any meaningful sense?

"Hallucination" has always seemed like a misnomer to me anyway considering LLMs don't know anything. They just impressively get things right enough to be useful assuming you audit the output.

If anything, I think all of their output should be called a hallucination.


We don't know if anything knows anything because we don't know what knowing is.

On the other hand, once you're operating under the model of not knowing if anything knows anything, there's really no point in posting about it here, is there?

This is just something that sounds profound but really isn’t.

Knowing is actually the easiest part to define and explain. Intelligence / understanding is much more difficult to define.


I took a semester long 500 level class back in college on the theory of knowledge. It is not easy to define - the entire branch of epistemology in philosophy deals with that question.

... To that end, I'd love to be able to revisit my classes from back then (computer science, philosophy (two classes from a double major), and a smattering of linguistics) with the world state of today's technologies.


Others have suggested "bullshit". A bullshitter does not care (and may not know) whether what they say is truth or fiction. A bullshitter's goal is just to be listened to and seem convincing.

The awareness of the bullshitter is used to differentiate between 'hard' and 'soft' bullshit. https://eprints.gla.ac.uk/327588/1/327588.pdf

> "Hallucination" has always seemed like a misnomer to me anyway considering LLMs don't know anything. They just impressively get things right enough to be useful assuming you audit the output.

If you pick up a dictionary and review the definition of "hallucination", you'll see something in the lines of "something that you see, hear, feel or smell that does not exist"

https://dictionary.cambridge.org/dictionary/english/hallucin...

Your own personal definition arguably reinforces the very definition of hallucination. Models don't get things right. Why? Because their output contrasts with content covered by their corpus, thus outputting things that don't exist or were referred in it and outright contrast with factual content.

> If anything, I think all of their output should be called a hallucination.

No. Only the ones that contrast with reality, namely factual information.

Hence the term hallucination.


Want to second this. Asking the model to create a work of fiction and it complying isn't a pathology. Mozart wasn't "hallucinating" when he created "The Marriage of Figaro".

But many artists are hallucinating when they envisioned some of their pieces. Who's to say Mozart wasn't on a trip when he created The Marriage of Figaro.

That would have to be a very very long hallucination because it’s a huge opera that took a long time to write.


We don't know Mozart's state of mind when he composed.

He didn't hallucinate the Marriage of Figaro but he may well have been hallucinating.


Terminology-wise, does this read like a better title instead?:

Show HN: Gemini Pro 3 generates the HN front page 10 years from now


> Terminology-wise, does this read like a better title instead?:

Generates does not convey any info on the nature of the process used to create the output. In this context, extrapolates or predicts or explores sound more suitable.

But nitpicking over these words is pointless and represents going off on a tangent. The use of the term "hallucination" reffers to the specific mechanism used to generate this type of output. Just like prompting a model to transcode a document and thus generating an output that doesn't match any established format.


I'd vote for imagines.

Wouldnt confabulate/confabulations be a better term in substitute for "hallucinating"?

The OP clearly didn't mean "hallucination" as a bug or error in the AI, in the way you're suggesting. Words can have many different meanings!

You can easily say, Johnny had some wild hallucinations about a future where Elon Musk ruled the world. It just means it was some wild speculative thinking. I read this title in this sense of the world.

Not everything has to be nit-picked or overanalysed. This is an amusing article with an amusing title.


Exactly! At first this is the precise reason I didn't click through as I thought from the title, a page must have been somehow outputted/hallucinated by error, but luckily I then saw the number of votes, revised my choice and saw a great page.

I'm partial though, loving Haskell myself (as a monad_lover) i'm happy it wasn't forgotten too :)


To me, “imagine” would have been a more fitting term here.

(“Generate”, while correct, sounds too technical, and “confabulate” reads a bit obscure.)


"imagine" gives too much credence of humanity to this action which will continue the cognitive mistake we make of anthropomorphzing llms

In French we call that kind of practices "affabulations". Maybe fraud, deception or deceit are the closest matching translations for this context.

That is what the LLM are molded to do (of course). But this is also the insistence by informed people to unceasingly use fallacious vocabulary. Sure a bit of analogy can be didactic, but the current trend is rather to leverage on every occasion to spread the impression that LLM works with processes similar to human thoughts.

A good analogy also communicate the fact that it is a mere analogy. So carrying the metaphor is only going to accumulate more delusion than comprehension.


wouldn't the right term be 'confabulation'?

No, still too negatively connoted. "Writes" "Predicts" "caricatures" is closer.

Have we abandoned the term "generate" already?

Forgot about it, my human mind has its limits. I don't know about a "we" though. I'm not representative of anyone.

Latin: Extraclaudiposteriorifabricátio

Pronunciation: ex-tra-clau-dee-pos-TE-ri-o-ri-fa-bri-KA-tee-o

Meaning: "The act of fabricating something by pulling it from one’s posterior."

  extra- = out of
  claudi- (from claudere, close/shut) repurposed for “the closed place”
  posterior- = the backside
  fabricatio = fabrication, invention
German: Poausdenkungsherausziehungsmachwerk

Pronunciation: POH-ows-den-kungs-heh-RAUS-tsee-oongs-MAHKH-verk

Meaning: "A contrived creation pulled out of the butt by thinking it up."

  Po = butt
  Ausdenkungs- = thinking-up
  Herausziehung = pulling-out
  Machwerk = contrived creation
Klingon: puchvo’vangDI’moHchu’ghach

Pronunciation: POOKH-vo vang-DEE-moakh-CHU-ghakh (roll the gh, hit the q hard, and use that throat ch like clearing your bat’leth sinuses)

Meaning: "The perfected act of boldly claiming something pulled out from the butt."

  puch = toilet (a real Klingon word)
  -vo’ = from
  vang = act, behave, assert (real root)
  -DI’ = when (adds timing spice)
  -moH = cause/make
  -chu’ = perfectly / clearly / expertly
  -ghach = turns a verb phrase into a noun (canonical nominalizer)

Interesting.

There are AI professors out there already!


This thread is dripping from nostalgia, in a good way.

I wonder how things are going to be in 25 or 50 years, what will today's kids look back with the same kind of devotion and nostalgia.

A lot of things are intangible/immaterial now (for non-geeks/non-hoarders, their inbox, online playlist and photos will likely be gone, they won't have any paper letters or plastic-framed holiday slide photographs or anything like that).


"I miss having to actually click on a website to get there. These neural implants don't have the feel of old-time web buttons..."

mouse => mice

corpus => corpora

thesaurus => thesauri

Emacs => Emacsen

Unix => Unices


Agree 100%. Don't train a 100-billion parameter neural transformer if you can solve it with a bash one-liner.

Use your human intelligence and save our planet!


[Admittedly off-topic, but relevant to the readers of the post]

Since this post is about financial investments, equity trading and M&A activities, I wonder is anyone shorting AI stocks to profit from the current bubble? I would be interested on people's stance (yes/no/why), and which instruments they use (if yes).


Being short anything AI now seems like shotgun tasting unless you really want to give Citadel and Jane Street money since the options premiums are so high, but I have been trying to get a bit less exposed to tech over the last few months and just been buying other ETFs that are less exposed to tech.

I’m curious to know which you believe is overvalued. Is it the model creators, who are essentially producing a commoditized product that will never generate the current inflated values? Or is it the individuals who build useful tools that are only incrementally better than previous solutions? Or is it the companies that are attempting to develop groundbreaking products, but they are perpetually six months to two years away from delivery?

Kudos to all involved in freeing up Kindles around the world.

I agree - Ada is very similar to Pascal, and much faster to pick up than, say, C++.

The DOD could easily have organized Ada hackathons with a lot of prize money to "make Ada cool" if they had chosen to in order to get the language out of the limelight. They could also have funded developing a free, open source toolchain.

Ada would never have been cool.

Ironically I remember one of the complaints was it took a long time for the compilers to stabilize. They were such complex beasts with a small userbase so you had smallish companies trying to develop a tremendously complex compiler for a small crowd of government contractors, a perfect recipe for expensive software.

I think maybe they were just a little ahead of their time on getting a good open source compiler. The Rust project shows that it is possible now, but back in the 80s and 90s with only the very early forms of the Internet I don't think the world was ready.


Out of curiosity:

1: If you had to guess, how high is the level of complexity of rustc?

2: How do you think gccrs will fare?

3: Do you like or dislike the Rust specification that originated from Ferrocene?

4: Is it important for a systems language to have more than one full compiler for it?


Given how much memory and CPU time is burned compiling Rust projects I'm guessing it is pretty complex.

I'm not deep enough into the Rust ecosystem to have solid opinions on the rest of that, but I know from the specification alone that it has a lot of work to do every time you execute rustc. I would hope that the strict implementation would reduce the number of edge cases the compiler has to deal with, but the sheer volume of the specification works against efforts to simplify.


> They could also have funded developing a free, open source toolchain.

If the actual purpose of the Ada mandate was cartel-making for companies selling Ada products, that would have been counter-productive to their goals.

Not that compiler vendors making money is a bad thing, compiler development needs to be funded somehow. Funding for language development is also a topic. There was a presentation by the maker of Elm about how programming language development is funded [0].

[0]: https://youtube.com/watch?v=XZ3w_jec1v8


Is the Gnat compiler not sufficiently free and open source? It does not fulfill the comment calling for "toolchain" however.

Edit: Thanks for that video. It is an interesting synthesis ad great context.


GNAT exists because DoD funded a free, open source toolchain.

A great reason to try and support small distros is that older computers can still be used as long as they work.

There are also some charities that ship old PCs to Africa, install a small Linux distro on them, e.g.:

https://www.computers4charity.org/computers-for-africa

https://worldcomputerexchange.org/


Excellent point.

When I lived in London I helped clients donate a lot of kit to ComputerAid International:

https://www.computeraid.org/

And what's now Computers4Charity:

https://www.computers4charity.org/


Neat.

Perhaps it would be even nicer if the "advent" theme was more prominently present, e.g. using the Bible as the target data file to be used.

Here's three examples tasks from me:

(1) Write an sh script (using only POSIX standard commands) to create a Keywords in Context (KWIC) concordance of the new testament.

(2) Write a bash script that uses grep with regular expressions to extracts all literal quotes of what Jesus said in the New Testament. [Incidentally, doing this task manually marked the beginnings of philology and later automating it marked the beginning of what was later called literary and linguistic computing, corpus linguistics, computational linguistics, and digital humanities.]

(3) How many times is Jesus mentioned by each of the four accounts of his life (Matthew, Mark, Luke, and John)?

(You may begin by extracting the New Testament from the end of the Bible with a grep command.)

Dataset: https://openbible.com/textfiles/kjv.txt


Respectfully, many would find that off-putting. "Advent of X" in tech is entirely decoupled from religion. Keeping it neutral seems to me the "nicest" approach. That said, something like what you described might be a cool exercise for your bible study group. Finally, I appreciated your "incidentally" aside about the origins of philology.

I'll preface this by saying I was never especially bothered or moved by the non-religious use of "Advent of X", for better or for worse. In fact, my remark is inspired only by your comment, concerning its consistency and who has the stronger case.

Specifically, while it is true that certain kinds of words can become decoupled from their original meanings (which is generally normal), in this case, the usage is not so decoupled, especially given that this usage occurs during the religious season of Advent and with the intentional allusion to the religious season of Advent. (Otherwise, what is "Advent of X" without its religious origin and which takes place at the exact same time during the year?)

You can make a much stronger argument that the non-religious usage is a kind of cultural appropriation. That would make your concern entirely backwards. Your wish is to keep it "neutral" to please those who don't practice Advent, as you show a simultaneous lack of concern for the tradition it appropriates from. This involves a tacit claim of possessing the authority to do so as well, but if anything, given the source, if anything, the authority belongs not to the appropriators, but to the Church.

One wonders how a "Ramadan of Code" or "Teshuvah of Shell" would be received.

"Neutrality" is, of course, a bunk concept, and the idea that we ought to be guided by what is "nice" rather than what is "good" is a grave misunderstanding of how decisions ought to be made.


Thanks for the response. I don't intend to get into a protracted debate here, but would like to point out that the winter solstice holiday we call "Christmas" was itself appropriated from various traditions, esp. "Saturnalia" as practiced by the Romans. In modern times, countless non-religious homes in the US feature "advent calendars", with tiny treats hidden behind numbered doors. They -- like santa, elves, gift-giving, and "christmas" trees -- have nothing to do with Christian orthodoxy.

By all means, you should do what you think is "good". That's what I strive to do. My comment about "nice" was literally quoting you, so in trying to take me to task for that, as with your broader point, you've missed the mark. I don't think your hypocrisy is intentional, but I do feel good about pointing it out.

Have a nice day and holiday season! :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: