> I think you might be missing the point. It's not that an utterance/proposition...

silent_cal · on Aug 10, 2021

1. Well said, but I would press it further and say that there is at least a pathway for us (human beings) to determine what the actual meaning of the utterance is by considering the context, the speaker, by asking for clarification, and so forth. The problem for the probabilistic/empirical language model is that it can't ever recognize "one right answer", even in theory. That's a problem, because there really is one discoverable meaning to every intentional utterance, even if it's not always successfully understood in practice.

2. How would you go about assigning latent attributes to words?

darawk · on Aug 11, 2021

1. Yes, that's true. However i'm not sure I see clearly the distinction you're trying to draw here. You could build a neural network that can ask clarifying questions too. You can certainly define an ML model that recognizes one right answer as well: just pick the answer that has highest probability in your softmax output.

2. Well, there are a lot of ways to do this in principle, but word vectors in their original form are themselves latent attributes. All modern NLP is based on this concept already. Let's look at the 'suitcase' example in the article:

> The trophy did not fit in the suitcase because it was too > 1a. small > 1b. big

In NLP this problem is generally called "coreference resolution". That is, resolving which prior object a given pronoun is referring to. The ambiguity of this problem is indeed very subtle, and probably quite hard for an ML algorithm to learn. You can look at a parse example here:

https://huggingface.co/coref/

This is a near state of the art coreference resolution model, and it indeed fails to properly resolve here. Although you do see that the probabilities do actually shift towards the right answer, which is (very weakly) suggestive that it might be beginning to learn it.

So, trying to unpack what's going on here linguistically, the essence of the problem is that the relation between "it" and "trophy" vs "suitcase" is contextually contingent. The necessary context is that trophies go inside of suitcases, and not vice versa. You then need to understand how the concept of insertion relates to sizes. Putting thing A inside of thing B requires that thing B be larger than thing A. This is certainly a subtle, context-rich problem.

In order to try to solve this problem, a model would need to have a more structural understanding of language. You indeed cannot learn this problem based on simple word frequency counting, for exactly the reasons they state. However, even though the frequencies of big/small may be equal, there are more nuanced conditional probabilities that are not. Conditional on the latent concept of insertion, the pattern relating object size to coreference should be very statistically apparent.

silent_cal · on Aug 11, 2021

1. I'm thinking of it in terms of two different kinds of knowledge that we're going after when we try to (A) understand a proposition vs (B) make a probabilistic guess at the right answer from among a discrete range of possibilities. However our understanding of language works, it's not the same as flagging a maximized probability. If someone says "the bank is on Fifth Street next to the fire hydrant", you immediately know all kinds of things when you take this in - that you're getting directions, that you will soon be looking for a fire hydrant, the address of the bank, that the speaker knows the area pretty well... this big "bundle of truth" emerges from the proposition immediately when we take it in. I just don't see it as analogous to flagging the highest softmax probability score, which at best provides a "most likely" translation into SQL code or a vector, or something like that.

2. If we're introducing concepts like insertion into the model, aren't we then returning to a logical (as opposed to a probabilistic/statistical) language model? Or said differently, aren't we merely introducing our own understanding of language into the model before it's even trained?

darawk · on Aug 12, 2021

> 1. I'm thinking of it in terms of two different kinds of knowledge that we're going after when we try to (A) understand a proposition vs (B) make a probabilistic guess at the right answer from among a discrete range of possibilities. However our understanding of language works, it's not the same as flagging a maximized probability. If someone says "the bank is on Fifth Street next to the fire hydrant", you immediately know all kinds of things when you take this in - that you're getting directions, that you will soon be looking for a fire hydrant, the address of the bank, that the speaker knows the area pretty well... this big "bundle of truth" emerges from the proposition immediately when we take it in. I just don't see it as analogous to flagging the highest softmax probability score, which at best provides a "most likely" translation into SQL code or a vector, or something like that.

I certainly agree that it doesn't feel analogous, but i'm not sure that it's as clearly distinct as it intuitively seems. Certainly for most utterances you and I can imagine several plausible alternative interpretations. We are just very good at discerning the right one quickly and accurately, which means that we experience that determination as certainty, rather than probability. But that doesn't necessarily mean that we are perfectly collapsing it to a single meaning right away. In fact for many sentences, the meaning may change completely by adding/removing a word from the end, e.g.:

https://en.wikipedia.org/wiki/Garden-path_sentence

These are obviously not typical of all language, but I don't think it's unreasonable to say that all language in some sense behaves this way, and these types of sentences just make it unusually apparent to us.

> 2. If we're introducing concepts like insertion into the model, aren't we then returning to a logical (as opposed to a probabilistic/statistical) language model? Or said differently, aren't we merely introducing our own understanding of language into the model before it's even trained?

Sort of. This gets into some tricky territory about what concepts are, and what language actually is. But I think if you play around with word vectors a bit, you'll get a sense of what I mean here. Here's a good article summarizing some of their properties:

https://kawine.github.io/blog/nlp/2019/06/21/word-analogies....

You can see that in some sense conceptual essence is being captured statistically here. You can represent concepts as embeddings in a statistical space, where the angles within that space capture relationships between them. It gets kind of difficult to reason about exactly what this "is", but, such is the difficulty in using language to analyze and dissect language.

There can sometimes be a bit of a Homunculus fallacy that underlies the way people reason about language, concepts, and thinking, and I think the original article here is guilty of that. That is, people assert the presence of some immaterial essence to human reasoning or cognition in an implicit way, but don't really justify or define it. Like, what is the concept of 'insertion' really? Intuitively it feels like it has some essence that exists outside of its relations to other concepts. But the only thing we can truly directly talk about is its relationship to other concepts. In this sense, we can say that any given concept is defined by its relation to every other concept. Reifying this abstract notion in the language of mathematics, we can define this as angles in Hilbert space (though it's important to note, that this is not an assertion of essence, merely a descriptive and useful formalism). When we do things like that it tends to feel like these statistical/mathematical descriptions are not "really" capturing meaning, and while I don't want to positively assert that they are necessarily, I do want to challenge the idea that this is obviously true on its face. I think the evidence for this claim is actually pretty weak, when you really analyze it.

I found this article to be very helpful in thinking about a lot of these issues:

https://www.lesswrong.com/posts/9iA87EfNKnREgdTJN/conceptual...

silent_cal · on Aug 12, 2021

1. I'd agree it's not "right away", so I think my use of "immediately" as too strong a word. But it sounds like we both agree that the experience of what happens when we are in the process of understand a proposition is different from the "experience" (if you can call it that) of flagging the highest probability.

2. I also agree with you that the perception of relations of things is primary in the way we know things, and it is pretty amazing how word vectors seem to capture some of these relationships in an analogous way (king - man + woman = queen). But I also think we see the properties of things before we see their relations to each other - otherwise, how could you know there is a relationship, not knowing that there are at least two things or properties to relate? But I'm going too far afield now!