Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If I write a book that contains Einstein's theory of relativity by virtue of me copying it, did I create the theory? Did my copying of it indicate anything about my understanding of it? Would you be justified to think the next book I write would have anything of original value?

I think what he is trying to say is that LLMs current architecture seems to mainly work by understanding patterns in the existing body of knowledge. In some senses finding patterns could be considered creative and entail reasoning. And that might be the degree to which LLMs could be said to be capable of reasoning or creativity.

But it is clear humans are capable of creativity and reasoning that are not reducible to mere pattern matching and this is the sense of reasoning that LLMs are not currently capable of.



> If I write a book that contains Einstein's theory of relativity by virtue of me copying it, did I create the theory? Did my copying of it indicate anything about my understanding of it? Would you be justified to think the next book I write would have anything of original value?

No, but you described a `cp` command, not an LLM.

"Creativity" in the sense of coming up with something new is trivial to implement in computers, and has long been solved. Take some pattern - of words, of data, of thought. Perturb it randomly. Done. That's creativity.

The part that makes "creativity" in the sense we normally understand it hard, isn't the search for new ideas - it's evaluation of those ideas. For an idea to be considered creative, it has to match a very complex... wait for it... pattern.

That pattern - what we call "creative" - has no strict definition. The idea has to be close enough to something we know, so we can frame it, yet different enough from it as to not be obvious, but still not too different, so we can still comprehend it. It has to make sense in relevant context - e.g. a creative mathematical proof has to still be correct (or a creative approach to proving a theorem has to plausibly look like it could possibly work); creative writing still has to be readable, etc.

The core of creativity is this unspecified pattern that things we consider "creative" match. And it so happens that things matching this pattern are a match for pattern "what makes sense for a human to read" in situations where a creative solution is called for. And the latter pattern - "response has to be sensible to a human" - is exactly what the LLM goal function is.

Thus follows that real creativity is part of what LLMs are being optimized for :).


> For an idea to be considered creative, it has to match a very complex... wait for it... pattern.

If we could predefine what would count as creativity as some specific pattern, then I'm not sure that would be what I would call creative, and certainly wouldn't be an all-inclusive definition of creativity. Nor is creativity merely creating something new by perturbing data randomly as you mentioned above.

While LLMs might be capable of some forms of creativity depending on how you define it, I think it remains to be seen how LLMs' current architecture could on its own accomplish the kinds of creativity implicit in scientific progress in the Kuhnian sense of a paradigm shift or in what some describe as a leap of artistic inspiration. Both of these examples highlight the degree to which creativity could be considered both progress in an objective sense but also be something that is not entirely foreshadowed by its precursors or patterns of existing data.

I think there are many senses in which LLMs are not demonstrating creativity in a way that humans can. I'm not sure how an LLM itself could create something new and valuable if it requires predefining an existing pattern which seems to presuppose that we already have the creation in a sense.


My take on Kuhn's paradigm shift is that it's still incremental progress, but the shift happens at a meta level. I.e., for the scientific example, you need some accumulated amount of observations and hypotheses before the paradigm shift can happen, and while the science "before" and "after" may look hugely different, it's still the case that the insight causing the shift is still incremental. In the periods before paradigm shifts, the science didn't stay still, waiting for a lone genius to make a big conceptual leap that randomly happened to hit paydirt -- if we could do such probability-defying miracles, we'd have special relativity figured out by Ancient Greeks. No, the science just kept accumulating observations and insights, narrowing down the search space until someone (usually several someones around the world, at the same time) was in the right place and context to see the next step and take it.

This kind of follows from the fact that, even if the paradigm-shifting insight was caused by some miracle feat of a unique superhuman genius, it still wouldn't shift anything until everyone else in the field was able to verify the genius was right, that they found the right answer, as oppose to a billion different possible wrong answers. To do that, the entire field had to have accumulated enough empirical evidence and theoretical understanding to already be within one or two "regular smart scholar" leaps from that insight.

With art, I have less experience, but my gut instinct tells me that even there, "artistic inspiration" can be too big a leap from what was before, as otherwise other people would not recognize or appreciate it. Also, unlike science, the definition of "art" is self-referential: art is what people recognize as art.

Still, I think you make a good point here, and convinced me that potential for creativity of LLMs, in their current architecture, is limited and below that of humans. You said:

> While LLMs might be capable of some forms of creativity depending on how you define it, I think it remains to be seen how LLMs' current architecture could on its own accomplish the kinds of creativity implicit in scientific progress in the Kuhnian sense of a paradigm shift or in what some describe as a leap of artistic inspiration.

I think the limit stems strictly from LLMs being trained off-line. I believe LLMs could go as far as making the paradigm-shifting "Kuhnian leap", but they wouldn't be able to increment on it further. Compared to humans, LLMs are all "system 1" and almost none "system 2" - they rely on "intuition"[0], which heavily biases them towards things they've learned before. In a wake of a paradigm shift, a human can make themselves gradually unlearn their own intuitions. LLM's can't, without being retrained. Because of that, the forms of creativity that involve making a paradigm-shifting leap and making a few steps forward from it are not within reach of any current model.

--

[0] - LLMs basically output things that seem most likely given what came before; I think this is the same phenomenon as when humans think and say what "feels like best" in context. However, we can pause and override this; LLMs can't, because they're just run in a forward pass - they neither have an internal loop, nor are they trained for the ability to control an external one.


>Creativity" in the sense of coming up with something new is trivial to implement in computers, and has long been solved. Take some pattern - of words, of data, of thought. Perturb it randomly. Done. That's creativity.

Formal Proof Systems aren't even nearly close to completion, and for patterns we don't have a strong enough formal system to fully represent the problem space.

If we take the P=NP problem, that likely can be solved formally that a machine could do, but what is the "pattern" here that we are traversing here? There is a definitely a deeper superstructure behind these problems, but we can only glean the tips, and I don't think the LLMs with statistical techniques can glean further in either. Natural Language is not sufficient.


Whatever the underlying "real" pattern is, doesn't really matter. We don't need to represent it. People learn to understand it implicitly, without ever seeing some formal definition spelled out - and learn it well enough that if you take M works to classify as "creative" or "not", then pick N people at random and ask each of them to classify each of the works, you can expect high degree of agreement.

LLMs aren't leaning what "creativity" is from first principles. They're learning it indirectly, by being trained to reply like a person would, literally, in the fully general meaning of that phrase. The better they get at that in general, the better they get at the (strict) subtask of "judging whether a work is creative the same way a human would" - and also "producing creative output like a human would".

Will that be enough to fully nail down what creativity is formally? Maybe, maybe not. On the one hand, LLMs don't "know" any more than we do, because whatever the pattern they learn, it's as implicit in their weights as it is for us. On the other hand, we can observe the models as they learn and infer, and poke at their weights, and do all kinds of other things that we can't do to ourselves, in order to find and understand how the "deeper superstructure behind these problems" gets translated into abstract structures within the model. This stands a chance to teach us a lot about both "these problems" and ourselves.

EDIT:

One could say there's no a priori reason why those ML models should have any structural similarity to how human brains work. But I'd say there is a reason - we're training them on inputs highly correlated with our own thoughts, and continuously optimizing them not just to mimic people, but to be bug for bug compatible with them. In the limit, the result of this pressure has to be equivalent to our own minds, even if not structurally equivalent. Of course the open question is, how far can we continue this process :).


As far as I can tell, I think you are interchanging the ability to recognize creativity with the ability to be creative. Humans seem to have the ability to make creative works or ideas that are not entirely derivative from a given data set or fit the criteria of some pre-existing pattern.

That is why I mentioned Kuhn and paradigm shifts. The architecture of LLMs do not seem capable of making lateral moves or sublations that are by definition not derivative or reducible to its prior circumstance, yet humans do, even though the exact way we do so is pretty mysterious and wrapped up in the difficulties in understanding consciousness.

To claim LLMs can or will equal human creativity seems to imply we can clearly define not only what creativity is, but also consciousness and also how to make a machine that can somehow do both. Humans can be creative prima facie, but to think we can also make a computer do the same thing probably means you have an inadequate definition of creativity.


I wrote a long response wrt. Kuhn under your earlier comment, but to summarize it here: I believe LLMs can make lateral moves, but they will find it hard to increment on them. That is, they can make a paradigm-shifting creative leap itself, but they can't then unlearn the old paradigm on the spot - their fixed training is an attractor that'll keep pulling them back.

As for:

> As far as I can tell, I think you are interchanging the ability to recognize creativity with the ability to be creative.

I kind of am, because I believe that the two are intertwined. I.e. "creativity" isn't merely an ability to make large conceptual leaps, or "lateral moves" - it's the ability to make a subset of those moves that will be recognized by others as creative, as opposed to recognized as wrong, or recognized as insane, or recognized as incomprehensible.

This might apply more to art than science, since the former is a moving target - art is ultimately about matching subjective perceptions of people, where science is about matching objective reality. A "too creative" leap in science can still be recognized as "creative" later if it's actually correct. With art, whether "too creative" will be eventually accepted or forever considered absurd, is unpredictable. Which is to say, maybe we should not treat these two types of "creativity" as the same thing in the first place.


> Take some pattern - of words, of data, of thought. Perturb it randomly. Done. That's creativity.

This seems a miopic view of creativity. I think leaving out the pursuit of the implications of that perturbation is leaving out the majority of creativity. A random number generator is not creative without some way to explore the impact of the random number. This is something that LLM inference models just don't do. Feeding previous output into the context of a next "reasoning" step still depends on a static model at the core.


>If I write a book that contains Einstein's theory of relativity by virtue of me copying it, did I create the theory? Did my copying of it indicate anything about my understanding of it? Would you be justified to think the next book I write would have anything of original value?

If you, after copying the book, could dynamically answer questions about the theory, it's implications, and answer variations of problems or theoretical challenges in ways that reflect mainstream knowledge, I think that absolutely would indicate understanding of it. I think you are basically making Searle's chinese room argument.

>But it is clear humans are capable of creativity and reasoning that are not reducible to mere pattern matching and this is the sense of reasoning that LLMs are not currently capable of.

Why is that clear? I think the reasoning for that would be tying it to a notion "the human experience", which I don't think is a necessary condition for intelligence. I think nothing about finding patterns is "mere" insofar as it relates to demonstration of intelligence.


> But it is clear humans are capable of ...

Its not though, nobody really knows what most of the words in that sentence mean in the technical or algorithmical sense, and hence you can't really say whether llms do or don't possess these skills.


>nobody really knows what most of the words in that sentence mean in the technical or algorithmical sense

And nobody really knows what consciousness is, but we all experience it in a distinct, internal way that lets us navigate the world and express ourselves to others, yet apparently some comments seem to dismiss this elephant of sensation in the room by pretending it's no different than some cut and dried computational system that's programmed to answer certain things in certain ways and thus "is probably no different from a person trained to speak". We're obviously, evidentially more than that.


> by pretending it's no different than some cut and dried computational system

This is not really what is going on, what is going on is a mix-up in interpreting the meaning of words, because the meaning of words is not transitive between subject matter unless we arrive at a scientific definition which is leading, and we have not (yet).

When approaching the word consciousness from a spiritual POV, it is clear that LLMs may not possess it. When approaching consciousness from a technical point of view, it is clear that LLMs may possess it in the future. This is because the spiritual POV is anthropologically reductive (consciousness is human), and the technical POV is technically reductive (consciousness is when we can't tell it apart).

Neither statements help us clarify opposing positions because neither definitions are falsifiable, and so not scientific.


I disagree with that characterization. I don’t experience consciousness as an “internal way that lets us navigate the world and express ourselves to others”. To me it is a purely perceptional experience, as I concluded after much introspection. Sure it feeds back into one’s behavior, mostly because we prefer certain experiences over others, but I can’t identify anything in my inner experience that is qualitatively different in nature from a pure mechanism. I do agree that LLMs severely lack awareness (not just self-awareness) and thus also consciousness. But that’s not about being a “mere” computational system.


Words are not reducible to technical statements or algorithms. But, even if they were, then by your suggestion there's not much point in talking about anything at all.


They absolutely are in the context of a technical, scientific or mathematical subject.

Like in the subject of LLMs everyone knows what a "token" or "context" means, even if they might mean different things in a different subject. Yet, nobody knows what "consciousness" means in almost any context, so it is impossible to make falsifiable statements about consciousness and LLMs.

Making falsifiable statements is the only way to have an argument, otherwise its just feelings and hunches with window dressing.


> LLMs current architecture seems to mainly work by understanding patterns in the existing body of knowledge ...

>But it is clear humans are capable of creativity and reasoning that are not reducible to mere pattern matching and this is the sense of reasoning that LLMs are not currently capable of

This is not clear at all. As it seems to me, it's impossible to imagine or think of things that are not in someway tied to something you've already come to sense or know. And if you think I am wrong, I implore you to provide a notion that doesn’t agree. I can only imagine something utterly unintelligible, and in order to make it intelligible, would require "pattern matching" (ie tying) it to something that is already intelligible. I mean how else do we come to understand a newly-found dead/unknown language, or teach our children? What human thought operates completely outside existing knowledge, if not given empirically?


Why can’t creativity be taking the works, a bunch of works, finding a pattern then randomly perturbing a data point/concept to see if there are new patterns.

Then cross referencing that new random point/idea to see if it remains internally consistent with the known true patterns in your dataset.

This is how humans create new ideas often?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: