I appeciate how Schmidhuber puts focus on non-US-centric history of Deep Learning and modern AI.
He sometimes overdoes these. Even when something is not related to Deep Learning and modern AI. He tries to push very narrow and one-sided views without much room for nuance.
For example- "Turing used his (quite inefficient) model only to rephrase the results of Gödel and Church on the limits of computability."
He is trying to say that Turing did nothing new, his work is a mere extension of Godel's (!), and he only accounted for practical computational reality. That is not true.
He goes on further by saying, "Later, however, the simplicity of these machines made them a convenient tool for theoretical studies of complexity." Like, Turing and Church's work was only good for that. I digress. I believe that Turing's work is more seminal and paved the way for modern general computers.
Another bit here says "The formal models of Gödel (1931-34), Church (1935), Turing (1936), and Post (1936) were theoretical pen & paper constructs that cannot directly serve as a foundation for practical computers."
It is putting the works of Turing, Post, Church, and Godel all together. Like they are the same. I see the logic as they are all impractical, but they are different levels of impractical. Putting them all under one bracket like that makes no sense.
I respect Schmidhuber a lot, but some of his behaviour is borderline childish which are more pronounced in his claims about AI history. You should be skeptical about what he says (and what anyone says).
Despite all this, I did not know many things written in this article. I learned a lot and had fun reading it.
I first read about Zuse in Isaacson's Innovators book. Did not know that he used a high-level language for a chess program. Fascinating. He is very underrated.
This is classic Schmidhubris, a mixture of truth and exaggeration.
Turing's paper is a landmark, and includes even an equivalence proof of the power of the lambda calculus and his universal machine. In doing so, he came up with a fixed-point combinator, an applicative combinator, now called the Turing fixed point combinator [1]. Especially appropriate on a website run by the Y combinator.
Turing's paper should run as a TV ad "But wait, there's more..." Many applications of topology (he argues why the alphabet should be finite, rather than simply saying that it is finite, by using the Bolzano-Weierstrass theorem, essentially), combinatorics, logic, functional programming - when FP was hardly a decade old and largely unknown outside Princeton - and all from an undergraduate course project report!
It's an honor to receive a response from you. I agree on all points, especially the lack of combinators for strongly typed systems. (As I understand it, it is impossible to assign types to such terms.) My main point was that disrespect for Turing's paper is largely unjustified.
I see neither hubris nor exaggeration. Before Turing, it was Church who proved the equivalence of the power of his lambda calculus and Gödel's universal model.
Agree, the quote lumping the models of computation together is very misleading. Turing’s machines are immediately obvious how to mechanize and automate, even with very old technology. That was the key difference. Then proving them equivalent to mu-recursive functions and lambda calculus showed that these models of computation actually deserved to be called such. It proved that they could be implemented on a machine with no human ingenuity or creativity required to carry out the steps, which is the heart of what it means to be computable.
Neither did Gödel's automatic theorem prover require "human ingenuity or creativity required to carry out the steps" to enumerate all the theorems from the axioms, which are also enumerable. Gödel really described "the heart of what it means to be computable."
The problem is that if you don't like Schmidhuber's interpretation of the history of logic, computer science and AI, and you want to propose a better interpretation, you need to know that history at least as well as he does.
And "know that history as well as he does" approach is not suited for knowledge fields. The notion that you need to know as much as a creator to critic their works is childish and immature.
I might not know as much as him about the field, but I am sufficiently well versed about the topics discussed in this article to point out his biases and fallacies.
I am a fan of people like LeCun and Schmidhuber, but I don't worship them.
I have read many articles and listened to many talks by JS, and the kind of biases present in this article is nothing new.
>> The notion that you need to know as much as a creator to critic their works
is childish and immature.
Schmidhuber is not a "creator" in the sense that Pekinpah, or Coppola, are
"creators" and his works are not works of art that can be experienced and
criticised by anyone based on aesthetics alone (even though aesthetics also can
take a lot of honing).
In any case my comment was that, to propose a better interpretation of the work
described by Schmidhuber (not "to point out their biases and fallacies", as you
say), you need to know that work at least as well as him. Otherwise, you find
yourself criticising the work of someone who knows more than you know.
I don't know about you, but I've done that in the past and it made me look and
feel like a fool. You may think you are "sufficiently well versed about the
topics discussed in this article" but if you so dismiss the need for general
knowledge, beyond the contents of one article, then there may easily be any
amount of knowledge that you're missing and that makes you think you know all
you need, simply because you don't know what you don't know.
Remember what Socrates, the wisest of men, said: "I know one thing, that I know
nothing, Jon Snow".
I may have garbled that a bit there. But you get the picture. Arrogance cannot
replace knowledge. Never assume that you know more than you need to know to
prove wrong an expert in a field of knowledge as deep and with as long a history
as AI, just because you have an internet connection.
Finally, nobody "worships" LeCun or Schmidhuber. What people admire is their
knowledge and the hard work they put into acquiring that knowledge. And the
reason they admire it is because anyone who's ever try to get to grips with a
scientific subject understands how much hard work it takes to acquire
knowledge.
You write that he is "trying to say that Turing did nothing new, his work is a mere extension of Godel's (!), and he only accounted for practical computational reality. That is not true." However, Schmidhuber's text on Gödel/Church/Turing/Post/Zuse is much more nuanced than that:
"In 1935, Alonzo Church derived a corollary / extension of Gödel's result by showing that Hilbert & Ackermann's famous Entscheidungsproblem (decision problem) does not have a general solution.[CHU] To do this, he used his alternative universal coding language called Untyped Lambda Calculus, which forms the basis of the highly influential programming language LISP.
In 1936, Alan Turing introduced yet another universal model which has become perhaps the most well-known of them all (at least in computer science): the Turing Machine.[TUR] He rederived the above-mentioned result.[T20](Sec. IV) Of course, he cited both Gödel and Church in his 1936 paper[TUR] (whose corrections appeared in 1937). In the same year of 1936, Emil Post published yet another independent universal model of computing,[POS] also citing Gödel and Church. Today we know many such models. Nevertheless, according to Wang,[WA74-96] it was Turing's work (1936) that convinced Gödel of the universality of both his own approach (1931-34) and Church's (1935).
What exactly did Post[POS] and Turing[TUR] do in 1936 that hadn't been done earlier by Gödel[GOD][GOD34] (1931-34) and Church[CHU] (1935)? There is a seemingly minor difference whose significance emerged only later. Many of Gödel's instruction sequences were series of multiplications of number-coded storage contents by integers. Gödel did not care that the computational complexity of such multiplications tends to increase with storage size. Similarly, Church also ignored the spatio-temporal complexity of the basic instructions in his algorithms. Turing and Post, however, adopted a traditional, reductionist, minimalist, binary view of computing—just like Konrad Zuse (1936).[ZU36] Their machine models permitted only very simple elementary instructions with constant complexity, like the early binary machine model of Leibniz (1679).[L79][LA14][HO66] They did not exploit this back then—for example, in 1936, Turing used his (quite inefficient) model only to rephrase the results of Gödel and Church on the limits of computability. Later, however, the simplicity of these machines made them a convenient tool for theoretical studies of complexity. (I also happily used and generalized them for the case of never-ending computations.[ALL2])"
You also write that Turing's work "paved the way for modern general computers." According to the text, however, this practical part really started with Konrad Zuse's patent application of 1936:
"The formal models of Gödel (1931-34), Church (1935), Turing (1936), and Post (1936) were theoretical pen & paper constructs that cannot directly serve as a foundation for practical computers. Remarkably, Konrad Zuse's patent application[ZU36-38][Z36][RO98] for the first practical general-purpose program-controlled computer also dates back to 1936. It describes general digital circuits (and predates Claude Shannon's 1937 thesis on digital circuit design[SHA37]). Then, in 1941, Zuse completed Z3, the world's first practical, working, programmable computer (based on the 1936 application). Ignoring the inevitable storage limitations of any physical computer, the physical hardware of Z3 was indeed universal in the "modern" sense of Gödel, Church, Turing, and Post—simple arithmetic tricks can compensate for Z3's lack of an explicit conditional jump instruction.[RO98]"
So you don't agree with the text but can you point out a concrete error instead of referring to "Many of the child parents to my main address to that"? I could not really find anything convincing in those child parents.
I agree with much of what he says, I just pointed out things I don't agree with.
And what's up with the wall of quote? I read the article in full.
On top of what I already said, I add that crediting Lambda Calculus as only the basis of a high-level programming language does not do it justice.
He is giving Turing and Post credit in one paragraph, and taking it away in another. As if like he cannot decide.
More importantly, Turing's approach is completely reduced to adopting "a traditional, reductionist, minimalist, binary view of computing—just like Konrad Zuse". It is not giving even close to the credit deserved. Many of the child parents to my main address to that.
I could not really find anything convincing in those child parents. What exactly is the important contribution of Turing that is not mentioned in the text? Furthermore, the text does not credit "Lambda Calculus as only the basis of a high-level programming language." It says that in 1935, Alonzo Church used it to derive "a corollary / extension of Gödel's result by showing that Hilbert & Ackermann's famous Entscheidungsproblem (decision problem) does not have a general solution." That was the important contribution of Church before Turing proved the same thing in a different way.
Both the discovery of Lambda Calculus and the Turing Machine as models of computation are quite important and immediately useful, I'd say. Though very formal, just a little bit of syntactic sugar and eye-squinting make for useful programming constructs and more or less direct lines to early as well as modern programming languages.
Only true for nondeterministic systems. Which all computers are as there is physical nondeterminism. However, it can (mostly) be ignored in any practical application of these theories, because the chances of you ending up in a nondeterministic computation at all are minute, never mind an unbounded one. And we are talking about practical applications here.
Many-core and networked computer systems fight indeterminacy. My networking algorithms would be a lot simpler if I could guarantee that everything operated in lockstep; the fact that I have to discard lockstep to get performance is because lockstep is a high-cost abstraction over a high-entropy (so, basically non-deterministic) underlying reality, not because indeterminacy is somehow inherently better.
For networking, yes. For multicore, it's merely a concession to the fact that instructions on my architecture are variable-length, and there's a transparent cache mechanism (requiring knowledge of memory access patterns, which requires knowing the result of the computation ahead of time).
Not necessarily? If you have a real-time OS and you write your program well, you can synchronise timings and have cores send messages to each other without queues. It's hard to write fast code that does that, but in narrow circumstances, it's possible (I'm thinking embedded applications) – and when it is possible, it's faster than the equivalent algorithm with indeterminacy.
Indeterminacy slows things down. It's a concession.
And natural determinacy, if you can get it, speeds up processing more than indeterminacy. The useful question is “what's the lowest-level model I can usefully use?”, not “can I do it with indeterminacy?”.
Do you think the above statement is wrong? If so, why?
I've tried really hard to understand which part of that paper is an example of a computation that a Turing machine couldn't implement, and I have been completely unable to do so. I will say that the writing in the paper is in serious need of some basic copy editing - it is full of typos, repeated phrases and entire paragraphs (especially in the abstract).
The repeated paragraphs are in the abstract on the site, not in the actual pdf. Here is a sample:
> “Monster” is a term introduced in [Lakatos 1976] for a mathematical construct that introduces inconsistencies and paradoxes. Euclid, Richard Dedekind, Gottlob Frege, Bertrand Russell, Kurt Gödel, Ludwig Wittgenstein, Alonzo Church, Alan Turing, and Stanisław Jaśkowski all had issues with mathematical monsters as discussed in this article. Monsters can lurk long undiscovered. For example, that “theorems are provably computational enumerable” [Euclid approximately 300 BC] is a monster was only discovered after millennia when [Church 1934] used it to identify fundamental inconsistency in the foundations of mathematics that is resolved in this article.
> Euclid, Richard Dedekind, Gottlob Frege, Bertrand Russell, Kurt Gödel, Ludwig Wittgenstein, Alonzo Church, Alan Turing, and Stanisław Jaśkowski all had issues with mathematical monsters as discussed in this article. This article explains how the theories Actors and Ordinals recraft classical foundations without limiting mathematical power.
> Euclid, Richard Dedekind, Gottlob Frege, Bertrand Russell, Kurt Gödel, Ludwig Wittgenstein, Alonzo Church, Alan Turing, and Stanisław Jaśkowski all had issues with mathematical monsters as discussed in this article. Computer Science brings new concerns and considerations to foundations beyond earlier work requiring new machinery. This article explains how the theories Actors and Ordinals recraft classical foundations without limiting mathematical power.
> “Monster” is a term introduced in [Lakatos 1976] for a mathematical construct that introduces inconsistencies and paradoxes. Since the very beginning, monsters have been endemic in foundations. They can lurk long undiscovered. For example, that "theorems are provably computational enumerable" [Euclid approximately 300 BC] is a monster was only discovered after millennia when [Church 1934] used it to identify fundamental
For typos in the actual PDF, here are a few :
- teams of Wrens (female
Naval Officers) operated large-scale
simulations in [sic!] that discovered ways
to defeat U-boat attacks that were
crippling Britain.
- The reason that in practice that [sic!] an Actor can be
hundreds of times faster is that in order to carry out a concurrent
computation, the parallel [...]
- close request when received, send [...] [not a typo, just strange phrasing?]
- if it is not recorded as that the engine is [...]
- Actor event induction (cf. [Turing 1949]) can used to
prove [...]
- Suppose to obtain a contradiction that there is [...]
De-emphasizing Turing as an instance of focusing on non-US-centric history? Turing was not American! Furthermore, Church was American, but that quote is aggrandizing to Church while diminishing of Turing. I don't think that quote is an example of what you claim.
I'm puzzled as to why the author seems to think that the incompleteness theorem says anything important about AI (rather than just pointing out that self-contradictions can be a potentially irritating problem in logical systems - something that was known since antiquity.)
He sometimes overdoes these. Even when something is not related to Deep Learning and modern AI. He tries to push very narrow and one-sided views without much room for nuance.
For example- "Turing used his (quite inefficient) model only to rephrase the results of Gödel and Church on the limits of computability."
He is trying to say that Turing did nothing new, his work is a mere extension of Godel's (!), and he only accounted for practical computational reality. That is not true.
He goes on further by saying, "Later, however, the simplicity of these machines made them a convenient tool for theoretical studies of complexity." Like, Turing and Church's work was only good for that. I digress. I believe that Turing's work is more seminal and paved the way for modern general computers.
Another bit here says "The formal models of Gödel (1931-34), Church (1935), Turing (1936), and Post (1936) were theoretical pen & paper constructs that cannot directly serve as a foundation for practical computers."
It is putting the works of Turing, Post, Church, and Godel all together. Like they are the same. I see the logic as they are all impractical, but they are different levels of impractical. Putting them all under one bracket like that makes no sense.
I respect Schmidhuber a lot, but some of his behaviour is borderline childish which are more pronounced in his claims about AI history. You should be skeptical about what he says (and what anyone says).
Despite all this, I did not know many things written in this article. I learned a lot and had fun reading it.
I first read about Zuse in Isaacson's Innovators book. Did not know that he used a high-level language for a chess program. Fascinating. He is very underrated.