A chatbot which refers to its guidelines in every chat is annoying to use, so it's just a nice usability feature to ask the bot to suppress that tendency.
The article sort of danced around what I think is the most natural way List is a "recipe": it's the bounded nondeterminism monad (a `List<T>` is a nondeterministic result; one could implement `List<T> -> T` by selecting an answer uniformly at random from the finite multiset).
Seriously, I've read things about lists and nondeterminism a few times in this thread, and I can't help but wonder if "you guys" (functional programming nerds, maybe?) use the word "nondeterministic" different than the rest of the world?
If not, I'd love a good explanation about what makes lists non-deterministic, and why we would want that, and why they seem to be perfectly deterministic in imperative programming languages.
It is a particular sense of "nondeterminism", but it's not specific to functional programming, I think it's the usual one theoretical CS as a whole. It's the same sense in which "nondeterminism" is used in P vs NP, for example.
Think of a computation as a process of changing state. At a given point in time, the computer is in a certain state, the current state. The computation can be described in terms of a function that acts on the current state.
In a deterministic computation, the function takes in the current state, and produces a single state as output which will be the state the computer enters on the next tick.
In a non-deterministic computation, the function takes in the current state and produces a set of states as output. These states are all the possible states the computer might enter on the next tick. We don't know (or just don't care) which one of these states it will enter.
You can model a non-deterministic computation as a deterministic one, by using a list `currentStates` to store the set of all possible current states of the computation. At each "tick", you do `currentStates = flatMap(nextStates, currentStates)` to "progress" the computation. In the end `currentStates` will be the set of all possible end states (and you could do some further processing to choose a specific end state, e.g. at random, if you wish, but you could also just work with the set of end states as a whole).
It's in this sense that "a list is a non-deterministic result", although this is really just one thing a list can represent; a list is a generic data structure which can represent all sorts of things, one of which is a non-deterministic result.
* A deterministic .NET runtime (https://github.com/Smaug123/WoofWare.PawPrint); been steaming towards `Console.WriteLine("Hello, world!")` for months, but good lord is that method complicated
* My F# source generators (https://github.com/Smaug123/WoofWare.Myriad) contain among other things a rather janky Swagger 2.0 REST client generator, but I'm currently writing a fully-compliant OpenAPI 3.0 version; it takes a .json file determining the spec, and outputs an `IMyApiClient` (or whatever) with one method per endpoint.
* Next-gen F# source generator framework (https://github.com/Smaug123/WoofWare.Whippet) is currently on the back burner; Myriad has more warts than I would like, and I think it's possible to write something much more powerful.
Once you've compiled it for one platform, you've re-bootstrapped it, at which point you can use the real compiler to cross-compile for another platform.
Ideally, you want more than one bootstrapped platform. Platforms eventually die, and you don't want to rely on an emulator for bootstrapping.
Some time from now x86_64 will fade away, and there's a large chance rust will still be around. I know that this will probably take a long time, but it's better and easier to do it now than later.
Depends what you're trying to identify. If you're trying to identify "machine-generated", yes. If you're trying to identify "spam", probably not? Spam posted sarcastically is no more something I'd want in my comments section than spam posted by a bot.
> You can moderate comments on your own blog however you'd like.
Thank you so much for that permission.
(this is an example of sarcasm, it's used as a form of criticism to express disagreement by saying the exact opposite of what is actually meant. Currently, this could serve as a test for human text, because the LLM slop that I read is typically asskissingly obsequious, whereas humans are often not that friendly to each other)
All it says is "Something went wrong. Try reloading." — no indication having an account logged in would help (…and I don't feel like creating an account just to check…)
From some Googling and use of Claude (and from summaries of the suggestively titled "Impossible Languages" by Moro linked from https://en.wikipedia.org/wiki/Universal_grammar ), it looks like he's referring to languages which violate the laws which constrain the languages humans are innately capable of learning. But it's very unclear why "machine M is capable of learning more complex languages than humans" implies anything about the linguistic competence or the intelligence of machine M.
In this article he is very focused on science and works hard to delineate science (research? deriving new facts?) from engineering (clearly product oriented). In his opinion ChatGPT falls on the engineering side of this line: it's a product of engineering, OpenAI is concentrating on marketing. For sure there was much science involved but the thing we have access to is a product.
IMHO Chomsky is asking: while ChatGPT is a fascinating product, what is it teaching us about language? How is it advancing our knowledge of language? I think Chomsky is saying "not much."
Someone else mentioned embeddings and the relationship between words that they reveal. Indeed, this could be a worthy area of further research. You'd think it would be a real boon when comparing languages. Unfortunately the interviewer didn't ask Chomsky about this.
For other commentators, as I understand it, Chomsky's talking about well-defined grammar and language and production systems. Think Hofstadter's Godel Escher Bach. Not "folk" understanding of language.
I have no understanding or intuition, or even a finger nail grasp, for how an LLM generates, seemingly emulating, "sentences", as though created with a generative grammar.
Is any one comparing and contrasting these two different techniques? Being noob, I wouldn't even know where to start looking.
I've gleaned that someone(s) are using LLM/GPT to emit abstract syntax trees (vs a mere stream of tokens), to serve as input for formal grammars (eg programming source code). That sounds awesome. And something I might some day sorta understand.
I've also gleaned that, given sufficient computing power, training data for future LLMs will have tokenized words (vs just character sequences). Which would bring the two strategies closer...? I have no idea.
(Am noob, so forgive my poor use of terminology. And poor understanding of the tech, too.)
I don't really understand your question but if a deep neural network predicts the weather we don't have any problem accepting that the deep neural network is not an explanatory model of the weather (the weather is not a neural net). The same is true of predicting language tokens.
Apologies, I don't know enough to articulate my question, which is probably nonsensical any way.
LLMs (like GPT) and grammars (like Backus–Naur Form) are two different kinds of generative (production) systems, right?
You've been (heroically) explaining Chomsky's criticism of LLMs to other noobs: grammars (theoretically) explain how humans do language, which is very different from how ChatGPT (stochastic parrots) do language. Right?
Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.
Especially once the (tokenized) training data for GPTs is word based instead of just snippets of characters.
Because I notice grammars everywhere and GPT is still magic to me. Maybe I'd benefit if I could understand GPTs in terms of grammars.
> Since GPT mimics human language so convincingly, I've been wondering if there's any overlap of these two generative systems.
It's not really relevant if there is overlap, I'm sure you can list a bunch of ways they are similar. What's important is 1. if they are different in fundamental ways and 2. whether LLMs explain anything about the human language faculty.
For 1. the most important difference is that human languages appear to have certain constraints (roughly that language has parse tree/hierarchical structure) and (from the experiments of Moro) humans seem to not be able to learn arguably simpler structures that are not hierarchical. LLMs on the other hand can be trained on those simpler structures. That shows that the acquisition process is not the same, which is not surprising since neural networks work on arbitrary statistical data and don't have strong inductive biases.
For 2. even if it turned out that LLMs couldn't learn the same languages it doesn't explain anything. For example you could hard-code the training to fail if it detects an "impossible language" then what? You've managed to create an accurate predictor but you don't have any understanding of how or why it works. This is easier to understand with non-cognitive systems like the weather or gravity: If you create a deep neural network that accurately predicts gravity it is not the same as coming up with the general theory of relativity (which could in fact be a worse predictor for example at quantum scales). Everyone argues the ridiculous point that since LLMs are good predictors then gaining understanding about the human language faculty is useless, which is a stance that wouldn't be accepted for the study of gravity or in any other field.
> is not an explanatory model of the weather (the weather is not a neural net)
I don't follow. Aren't those entirely separate things? The most accurate models of anything necessarily account for the underlying mechanisms. Perhaps I don't understand what you mean by "explanatory"?
Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.
> The most accurate models of anything necessarily account for the underlying mechanisms
But they don't necessarily convey understanding to humans. Prediction is not explanation.
There is a difference between Einstein's General Theory of Relativity and a deep neural network that predicts gravity. The latter is virtually useless for understanding gravity (that's even if makes better predictions).
> Specifically in the case of deep neural networks, we would generally suppose that it had learned to model the underlying reality. In effect it is learning the rules of a sufficiently accurate simulation.
No, they just fit surface statistics, not underlying reality. Many physics phenomena were predicted using theories before they were observed, they would not be in the training data even though they were part of the underlying reality.
> No, they just fit surface statistics, not underlying reality.
I would dispute this claim. I would argue that as models become more accurate they necessarily more closely resemble the underlying phenomena which they seek to model. In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them. I will admit that's just my intuition though - I don't have any means of rigorously proving such a claim.
I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy. From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.
> Many physics phenomena were predicted using theories before they were observed
Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe. During the process of refining our existing models we predict new things that we've never seen and those predictions are then used to test the validity of the newly proposed models.
This is getting away from the original point which is that deep neural networks are, by default, not explanatory in the way Einstein's theory of relativity is.
But even so,
> In other words, I would claim that as a model more closely matches those "surface statistics" it necessarily more closely resembles the underlying mechanisms that gave rise to them.
I don't what it means, for example, for a deep neural network, to "more resemble" the underlying process of the weather. It's also obviously false in general: If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions. The same is true for two different pseudo-random number generator circuits that produce the same output.
> I have yet to see an example where a more accurate model was conceptually simpler than the simplest known model at some lower level of accuracy.
I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors. For example an idealized ball rolling down a plane, Galileo's mass/gravity thought experiment, Kepler etc. Many of these models ignore less important details to focus on the fundamental ones.
> From an information theoretic angle I think it's similar to compression (something that ML also happens to be almost unbelievably good at). Related to this, I've seen it argued somewhere (I don't immediately recall where though) that learning (in both the ML and human sense) amounts to constructing a world model via compression and that rings true to me.
In practice you get nowhere trying to recreate the internals of a cryptographic pseudo-random number generator from the output it produces (maybe in theory you could do it with infinite data and no bounds on computational complexity or something) even though the generator itself could be highly compressed.
> Sure, but what leads to those theories? They are invariably the result of attempting to more accurately model the things which we can observe.
Yes but if the model does not lead to understanding you cannot come up with the new ideas.
Admittedly my original question (how "not explanatory" leads to "is not a") begins to look like a nit now that I understand the point you were trying to make (or at least I think I do). Nonetheless the discussion seems interesting.
That said, I'm inclined to object to this "explanatory" characteristic you're putting forward. We as humans certainly put a lot of work into optimizing the formulation of our models with the express goal of easing human understanding but I'm not sure that's anything more than an artifact of the system that produces them. At the end of the day they are tools for accomplishing some purpose.
Perhaps the idea you are attempting to express is analogous to concepts such as principal component analysis as applied to the representation of the final model?
> If you have a mechanical clock and quartz-crystal analog clock you are not going to be able to derive the internal workings of either or distinguish between them from the hand positions.
Arguably modern physics analogously does exactly that, although the amount of resources required to do so is astronomical.
Anyhow my claim was not about the ability or lack thereof to derive information from the outputs of a system. It was that as you demand increased accuracy from a model of the hand positions (your example) you will be necessarily forced to model the internal workings of the original physical system to increasingly higher fidelity. I claim that there is no way around this - that fundamentally your only option for increasing the accuracy of the output of a model is for it to more closely resemble the inner workings of the thing being modeled. Taken to the (notably impossible) extreme this might take the form of a quantum mechanics based simulation of the entire system.
Extrapolating this to the weather, I'm claiming that any reasonably accurate ML model will necessarily encompass some sort of underlying truth about the physical system that it is modeling and that as it becomes more accurate it will encode more such truth. Notably, I make no claim about the ability of an unaided human to interpret such truths from a binary blob of weights.
> I don't understand what you mean. Simple models often yield a high level of understanding without being better predictors.
I said nothing about efficiency of educating humans (ie information gathering by or transfer between agents) but rather about model accuracy versus model complexity. I am claiming that more accurate models will invariably be more complex, and that said complexity will invariably encode more information about the original system being modeled. I have yet to encounter a counterexample.
> [CSPRNG recreation]
It is by design impossible to "model" the output of such a function in a bitwise accurate manner without reproducing the internals with perfect fidelity. In the event that someone figures out how to model the output in an imprecise manner without access to the key that would generally be construed as the algorithm having been broken. In other words that example aligns perfectly with my point in the sense that it cannot be approximated to any degree better than random chance with a "simpler" (ie less computationally complex than the original) mechanism. It takes the continuum of accuracy that I was originally describing and replaces it with a step function.
> Yes but if the model does not lead to understanding you cannot come up with the new ideas.
I suppose human understanding is a prerequisite to new human constructed models but my (counter-)point remains. Physics theories are "nothing more" than humans fitting "surface statistics" to increasing degrees of accuracy. I think this is a fairly fundamental truth with regards to the philosophy of science.
You're free to choose not to memorise things, but please don't be an arsehole about people who do want to do so for whatever reason. Having said that, you seem to misunderstand the point of spaced repetition, which is that you don't memorise the same thing over and over again; instead, you memorise it enough times to learn it, and not many more.
Reality check incoming. Yes you are an asshole. Your word choice matters, and you picked a large number of insulting terms.
"not ... smart", "[only use is to] win stupid games.", "[you will spend] your life doing jobs you hate for money".
My dude, I'm learning Japanese vocabulary, which has helped me to start reading light novels for fun. By your argument: I'm not smart, wasting my time, and hate my life, or soon will?
Thing is I would fully agree with you that memorizing != comprehension, but that doesn't mean than memorizing is without it's use. Do yourself a favor and learn to not be so rude. It is literally the first item in the site guidelines:
Triggered? Nah, I was merely answering your question. It's clear at this point that any more time I spend here would only be feeding a troll, so rather than address any of that nonsense I think I'll do something more productive instead. Maybe I'll do some flashcards.
Austral is a really cool experiment and I love how much effort was put into the spec which you've linked to. It explains the need for capabilities and linear types, and how they interact, really well.
A chatbot which refers to its guidelines in every chat is annoying to use, so it's just a nice usability feature to ask the bot to suppress that tendency.