I agree re: Chinchilla’s main point, what I was trying to point out is that I’m not sure that it’s clear that simply adding “more compute”, which over the past 2 years has meant “increasing model sizes”, is going to keep scaling as was stated.
It is still unclear whether we even have enough training tokens to adequately train some of these models (as GPT4 is unknown, thinking of PaLM here)? Galactica gets around that by repeating high quality tokens. Anthropic’s work conversely shows repeated tokens can significantly degrade performance.
I expect there will be higher yield/more focus on refining datasets or training objectives (e.g. GNN work by Leskovec and Liang) in the near future rather than just throwing more compute at CommonCrawl.
OpenAI claims significantly improved hallucination yet still by their own metric scores up to 80% on factual accuracy tests and scores 60% on TruthfulQA, so it’s still far too frequent to be reliable despite the presumably large effort into RLHF and incorporation of ChatGPT data.
The problem space of potentially inaccurate seems too large for RLHF to be a good solution.
> Aren't we all just symbol manipulation machines? Isn't truth just created by assigning symbols? Is there more value in experiencing an apple falling on your head than reading on an apple falling onto Newton's head?
Yeah there is actually, having experienced gravity grounds Newton’s law in reality and provides a sanity check. What’s the evidence to suggest that we are purely symbol manipulation machines?
In reality, human decision making is very “multimodal” when you move away from low hanging fruit.
> If we assume that GPT-4 is maybe..
I assume you’re offering a simplistic explanation of how LLMs work but continuing this hypothetical I don’t really follow the logic behind that math. If LLMs truly encode knowledge and contain a “model” of reality the whole point is that it is reasoning as it plays chess and not necessarily regurgitating.
It “plays” chess (except makes illegal moves and appears to be playing in very predictable ways as better discussed in the other HN chess post) in a way suggesting that it’s still “fill in the blank” rather than innately understanding/modelling the game which is the claim being made of LLMs.
It is still unclear whether we even have enough training tokens to adequately train some of these models (as GPT4 is unknown, thinking of PaLM here)? Galactica gets around that by repeating high quality tokens. Anthropic’s work conversely shows repeated tokens can significantly degrade performance.
I expect there will be higher yield/more focus on refining datasets or training objectives (e.g. GNN work by Leskovec and Liang) in the near future rather than just throwing more compute at CommonCrawl.
OpenAI claims significantly improved hallucination yet still by their own metric scores up to 80% on factual accuracy tests and scores 60% on TruthfulQA, so it’s still far too frequent to be reliable despite the presumably large effort into RLHF and incorporation of ChatGPT data.
The problem space of potentially inaccurate seems too large for RLHF to be a good solution.
> Aren't we all just symbol manipulation machines? Isn't truth just created by assigning symbols? Is there more value in experiencing an apple falling on your head than reading on an apple falling onto Newton's head?
Yeah there is actually, having experienced gravity grounds Newton’s law in reality and provides a sanity check. What’s the evidence to suggest that we are purely symbol manipulation machines?
In reality, human decision making is very “multimodal” when you move away from low hanging fruit.
> If we assume that GPT-4 is maybe..
I assume you’re offering a simplistic explanation of how LLMs work but continuing this hypothetical I don’t really follow the logic behind that math. If LLMs truly encode knowledge and contain a “model” of reality the whole point is that it is reasoning as it plays chess and not necessarily regurgitating.
It “plays” chess (except makes illegal moves and appears to be playing in very predictable ways as better discussed in the other HN chess post) in a way suggesting that it’s still “fill in the blank” rather than innately understanding/modelling the game which is the claim being made of LLMs.