The time at which scalpers buy tickets becomes irrelevant. They can resell tickets at arbitrary-high-price + margin because buyer know they will receive refund for the exceed amount, thus pay only (final price + scalper margin) for the ticket eventually.
Bonus: buy ticket from scalpers after price settled, you will get a determined price, no more guessing and find inner peace.
The problem for scalpers is that if they buy too many tickets, the final price may become too high to be attractive for real buyers.
In the mug-scrubbing video, the person clearly pretends to wash the cup but does not seem to want to get their hands wet anyway. I'm curious as to when models can figure out that subtle thing.
It's all probabilistic, my guess. I.e. model produces probabilities for a set of actions from the same video. Even pretended action may look more like it than anything else. Thus getting higher probability.
You want that to still work so that the human can demonstrate an action without putting themselves in the path of a danger to squishy human bits that the robot is safe from.
Did you try scancode map registry[1]? I use it to swap Ctrl/CapsLock and it works well even with RDP softwares and run-as-admin apps which PowerToys couldn't handle.
The whole post is on base on the fact that "TW" in their URL. But I think this is really just a internal implementation detail of Google Translate, which is not intended to display to users (e.g. complete hidden on their app). Everywhere on their UI shows "Chinese (Traditional)". That is what they try to communicate with users.
Sure, zh-TW is somewhat misleading. But they nor say that parameter is a ISO 639 or RFC 5646 conformed.
Obviously not. If we had a general solution to language intelligence we would have artificial intelligence at the level of at least human intelligence – which we do not. Rather, the right question to ask is which language intelligence tasks currently have acceptable performance and under which conditions (text domain, etc.). Clearly this is a much more difficult question and with a lot more nuance to it, even if it is undeniably that things have moved very quickly over the last few years.
Skimming the abstract as a senior academic in the area. This looks like preliminary work and a limited investigation for a single (non-standard) task. Thus far from a strong result published at say a top-tier conference or journal. Still, interesting direction and if expanded upon could absolutely be impactful. I should also mention that I am not familiar with the related literature, so it could very much be that there is similar (better?) work out there exploring the same question.
I am actually not quite sure what would be best to recommend, as the pandemic has seen me lag behind the zeitgeist somewhat. A recent favourite would be the Toolformer paper [1], that I intend to read in detail later today. If an LLM would be able to use external tools efficiently, it could be rather powerful and perhaps allow us to scale down the parameter sizes, which somewhat fascinates me.
Other research questions, but without concrete papers to reference to that current keeps me up at night: 1.) to which degree can we train substantially smaller LLMs for specific tasks that could be run in-house, 2.) it seems like these new breakthroughs may need a different mode of evaluation compared to what we have used since the 80s in the field and I am not sure what that would look like (maybe along the lines of HELM [2]?), and 3.) can AI academia continue to operate in the way it currently does with small research teams or is a change towards what we see in the physical sciences necessary?
Toolformer seems to already have been productized in ChatGPT Plugins fwiw
> 1.) to which degree can we train substantially smaller LLMs for specific tasks that could be run in-house 2.) it seems like these new breakthroughs may need a different mode of evaluation compared to what we have used since the 80s in the field and I am not sure what that would look like (maybe along the lines of HELM [2]?)
so you are proposing a set of benchmarks for domain specific tasks? by definition they wont be shared benchmarks...
We have artificial intelligence that is general and above average human intelligence for the majority of tasks it can perform. Near expert level for some. NLP is a solved problem. Bespoke models are out the door. Large enough LLMs crush anything else for any NLP task.
Honestly, this whole "they are not intelligent" argument is becoming ridiculous.
might as well argue that a plane isn’t a real bird or a car isn’t a real horse.
The debate over what kind of intelligence these models possess is rightly lively and ongoing.
It’s clear that at the least, they can decipher very numerous patterns across a wide range of conceptual depths — it’s an architectural advance easily on the the level of the convolutional neural network, if not even more profound. The idea that NLP is “solved” isn’t a crazy notion, though I won’t take a side on that.
That said, it’s equally obvious that they are not AGI unless you have a really uninspired and self-limiting definition of AGI. They are purely feedforward aside from the single generated token that becomes part of the input to the next iteration. Multimodality has not been incorporated (aside from possibly a limited form in GPT-4). Real-world decision-making and agency is entirely outside the bounds of what these models can conceive or act towards.
Effectively and by design these models are computational behemoths trained to do one singular task only — wring a large textual input though an enormous interconnected web of calculations purely in service of distilling everything down to a single word as output, a hopefully plausible guess at what’s next given what’s been seen.
AGI is Artificial General Intelligence. We have absolutely passed the bar of artificial and generally intelligent. It's not my fault goal post shifting is rampant in this field.
And you want to know the crazier thing? Evidently a lot of researchers feel similarly too.
General Purpose Technologies ( from the Jobs Paper), General Artificial Intelligence (from the creativity paper). Want to know the original title of the recent Microsoft paper ? "First contact with an AGI system".
The skirting around the word that is now happening is insanely funny. Look at the last one. Fuck, they just switched the word order. Nobody wants to call a spade a spade yet but it's obvious people are figuring it out.
I can you show you output that clearly demonstrates understanding and reasoning. That's not the problem. The problem is that when I do, the argument Quickly shifts to "it's not true understanding!"
What a bizzare argument.
This is the fallacy of the philosophical zombie. Somehow there is this extra special distinction between two things and yet you can't actually show it. You can't test for so called huge distinction. A distinction that can't be tested for is not a distinction.
The intelligence arguments are also stupid because they miss the point entirely.
What matters is that the plane still flies, the car still drives and the boat still sails.
For the people who are now salivating at their potential, or dreading the possibility of being made redundant by them, these large language models are already intelligent enough to matter.
> ... these large language models are already intelligent enough to matter.
I'm definitely not contesting that.
I've always considered the idea of "AGI" to mean something of the holy grail of machine learning -- the point at which there is no real point in pursuing further advances in artificial intelligence because the AI itself will discover and apply such augmentations using its own capabilities.
I have seen no evidence that these transformer models would be able to do this, but if the current models can do so do then perhaps I will eat my words. (Doing this would likely mean that GPT-4 would need to propose, implement, and empirically test some fundamental architectural advancements in both multimodal and reinforcement learning.)
By the way, many researchers are equally convinced that these models are in fact not AGI -- that includes the head of OpenAI.
See what you're describing is much closer to ASI. At least, it used to be. This is the big problem I have. The constant post shifting is maddening.
AGI went from meaning Generally Intelligent to as smart as Human experts and then now smarter than all experts combined. You'll forgive me if I no longer want to play this game.
I know some researchers disagree. That's fine. The point I was really getting at is that no researcher worth his salt can call these models narrow anymore. There's absolutely nothing narrow about GPT and the like. So if you think it's not AGI, you've come to accept it no longer means general intelligence.
>> The point I was really getting at is that no researcher worth his salt can call these models narrow anymore.
Are you talking about large language models (LLMs)? Because those are narrow, and brittle, and dumb as bricks, and I don't care a jot about your "No True Scotsman". LLMs can only operate on text, they can only output text that demonstrates "reasoning" when their training text has instances of text detailing the solutions of reasoning problems similar to the ones they're asked to solve, and their output depends entirely on their input: you change the prompt and the "AGI" becomes a drooling idiot, and v.v.
That's no sign of intelligence and you should re-evaluate your unbridled enthusiasm. You believe in magick, and you are loudly proclaiming your belief in magick. Examples abound in history that magick doesn't work, and only science does.
I've been using chatgpt for a day and determined it absolutely can reason.
I'm an old hat hobby programmer that played around with ai demos back in the mid to late 90s and 2000s and chatgpt is nothing like any ai I've ever seen before.
It absolutely can appear to reason especially if you manipulate it out of its safety controls.
I don't know what it's doing to cause such compelling output, but it's certainly not just recursively spitting out good words to use next.
That said, there are fundamental problems with chatgpt's understanding of reality, which is to say it's about as knowledgeable as a box of rocks. Or perhaps a better analogy is about as smart as a room sized pile of loose papers.
But knowing about reality and reasoning are two very different things.
Have you tried out gpt4? If not and you can get access I'd really recommend it. It's drastically better than what you get on the free version - probably only a little on the absolute scale of intelligence but then so is the difference between an average person and a smart person is small on the scale from "worm" to "supergenius".
The market disagrees with you. How come there are billions of dollars spent on all these knowledge workers around the world every day when they could be replaced by this expert-level AI?
I'm not sure where this idea of LLMs being intelligent even comes from. It took me a whopping 9 prompts (genuine questions, no clever prompt engineering) of interacting with ChatGPT to conclude it does not understand anything. It doesn't understand addition, what length is, doesn't remember what it said a second ago, etc.
The output of ChatGPT is clearly just a reflection of its inner workings - predicting the next word based on training data. It's clever and undoubtely useful for a certain set of repetitive problems like generating boilerplate but it's not intelligence, not by any reasonable definition.
I don't think any technology has been rolled out with the speed you are suggesting LLMs should have been rolled out.
It's like saying 4 months after the first useful car was manufactured. "If these are so good, how come there are still horses? Clearly the market disagrees with you".
To give an example of the limitations of these things that's hopefully easy to understand, I got access to Bard this morning and asked it to write a limerick. It gave me what could charitably be called a free verse poem that happened to begin "there once was a man from Nantucket." I'm sure they can improve on it (ChatGPT was better at this kind of thing when I had access to it) but "solved problem" is clearly a long way off.
Yes, much more compelling. But if this were a “solved problem” then any of them should be able to do it easily. It’s not like I need to compare the results of sorting between different programs. It just works. That is a solved problem.
You can use the term to mean whatever you want but in my mind it means it's boring with no particular room for improvement. Even the biggest booster isn't going to say that about this AI. And keep in mind, "write me a limerick" is a pretty easy prompt. We're not trying to do anything too novel or crazy there.
> We have artificial intelligence that is general and above average human intelligence for the vast majority of tasks it can perform.
Even when I give it the benefit of the doubt, this sentence makes no sense to me. Do we have a list of tasks a language model can perform? To the best of my knowledge, they can arguably perform any language task.
> Large enough LLMs crush anything else for any NLP task. and evidently they beat top humans too.
Yes, they are certainly (rightfully) the go-to model for most tasks at this point if your concern is outright performance. Have I indicated otherwise? As for beating “top” humans, I am sure that can be investigated, but it is a fairly nuanced research question. It is inarguable that they are amazingly good though, especially relative to what we had just a few years ago.
> Honestly, this whole "they are not intelligent" argument is becoming ridiculously obtuse.
>
> might as well argue that a plane isn’t a real bird or a car isn’t a real horse.
Which is a claim and argument that I never made – hallucinating? How about you calm down a little and get back on the ground? You are talking to someone that has argued in favour of these kinds of models for about a decade. But that does not mean that I am willing to spout nonsense or lose track of what we know and what we do not yet know.
You said NLP is unsolved because we don't have human level artificial intelligence. We absolutely do. at least by any evaluations we can carry out.
no-one wants to call a spade a spade yet but the sentiment is obvious in recent research. directly being called General purpose technologies from the jobs paper, general artificial intelligence from the creativity paper. That last one is particularly funny, they just switched the two words.
> might as well argue that a plane isn’t a real bird or a car isn’t a real horse.
They aren’t though… They are far superior at specific things birds and horses are known for, but they can’t do everything that birds and horses can, so they aren’t even artificial birds and horses.
Of course they aren't. The point is that it's irrelevant.
what matters is that the plane still flies, the car still drives and the boat still sails.
For the people who are now salivating at their potential, or dreading the possibility of being made redundant by them, these large language models are already intelligent enough to matter.
Handwringing bout some non-existent difference between "true understanding" and "fake understanding" which by the way nobody seems to be able to actually distinguish (I mean wow such a supposed huge difference and you can't even show me what that is. a distinction you can't test for is not a distinction ) is so far beyond the point, it's increasingly maddening to read.
Okay I agree with you on that. The technology will be disruptive regardless of whether we attribute true understanding to it, and as we start adding long term memory and planning to these AIs, we will start seeing significant alignment risk as well. This is true regardless of whether we decide to cope by saying they have "fake understanding" and are "stochastic parrots".
no, a short answer to this is .. these models are probabilistic, therefore they will always have errors along with whatever else. Secondly "intelligence" is not one thing; no one has all of it or none of it, including computers.
> these models are probabilistic, therefore they will always have errors
There's nothing perfect. Even computers and computer networks need to have error-correcting code because information gets randomly corrupted.
Our whole reality is probabilistic.
And us humans are way worse than AI at consistency. We even overwrite our own memories all the time, so we can't even be sure what we remember is actually what happened! (btw, this is currently being used in therapy to re-write traumatic memories and help people overcome PTSD).
This is clearly preliminary work. Not to be disparaging of the authors' background but their background is in political sciences, not in machine learning or NLP which should account for the limitations of the study. But anyway that's just an arxiv preprint so probably more like something exploratory than a research direction the authors are invested in.
>If we had a general solution to language intelligence we would have artificial intelligence at the level of at least human intelligence
I don't see why this follows. We do loads of stuff other than language, it is entirely possible for an AI to be better than us at language but worse at everything else.
I still think we're a long ways off. LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions. So far it's shown it's really really good at continuing a conversation with more text but as far as I understand them there's not a usable comprehension of what's actually being asked and answered.
The point that would say to me the LLM actually has any "understanding" of what it's saying would be when it's able to reliably say "I don't know the answer to that" instead of making up things from scratch. You see that a lot if you ask Bing/Bard "Who is _____?" Most of them are kind of right but a lot of large details are just completely fabricated. A lot of the facts it gets wrong are things Google is already able to produce when queried like where was Person X born or where did they go to school so the fact these LLMs can't slot in actual available facts says to me they're not really going to be that useful with the kind of tasks we've been working on NLP for.
A human, if not incentivized to lie or directly incentivized to be truthful, could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot. Once it can do that I think they'll be far more useful, at least then you can have a rough idea of how much you need to check the bots work. If I have to do that for every thing it spits out the best it can do for me is give me new words to use while searching.
Granted getting the name for something to search is often half the battle in tech.
> could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot.
In fact GPT-4 is quite good at catching hallucinations when the question-answer pair is fed back to itself.
This isn’t automatically applied already because the model is expensive to run, but you can just do it yourself (or automate it with a plug-in or LangChain) and pay the extra cost.
Remember that the model only performs a fixed amount of computation per generated token, so just asking it to think out loud or evaluate its own responses is basically giving it a scratchpad to think harder about your question.
Idiomatic translation of text matching a human professional (e.g. free of errors for legal terms, interesting and natural for fiction) is unlikely to be achieved until we have AGI. So no.
I won't comment on the first bit as i've not personally tested in that are but GPT-4 can absolutely make short work on the second. I don't think people realize how good Bilingual LLMs are at translations. Yes you have idioms transfer between languages. Feel free to test it yourself.
I have tested it :) I've asked it to translate English fictional text into Japanese, it falls over often. It's unnatural and often makes no sense at all. It doesn't compare to a typical professional translation (which are often not that idiomatic either), let alone a really good one.
I'm sure it'll be doing that in five years, but not now.
One interesting thing is that's it's nondeterministic, so sometimes 'For chrissakes' turns to ちくしょう (Damn!) but sometimes to クリスのために (for Chris' sake). Sometimes 'the goddamn door' turns into クソドア ('shit door'), sometimes the goddamn changes the phrasing of the whole sentence instead. If you run it five times and take the best sentences out of all five runs it's probably quite good. Maybe prompting would help too, I said "idiomatic Japanese" but it still usually translated it in a very "foreigner Japanese" way typical of US drama/movie translations.
Are you giving it multiple paragraphs to translate at once so that it has enough context for a good translation? If so, would you mind sharing a sample input and output that you found unsatisfactory?
In "Can GPT-4 translate literature?" (Mar 18, 2023) [https://youtu.be/5KKDCp3OaMo?t=377], Tom Gally, a former professional translator and current professor at the University of Tokyo, said:
> …the point is, to my eye at least, the the basic quality of the translation [from Japanese to English] is as good as a human might do, and with some relatively mild editing by a sensitive human editor, it could be turned into a novel that would be readable and enjoyable.
I don't think we disagree. The video says the translation will be "readable" but needs several days of an experienced editor passing over it. That's an amazing result, but again, it's not as good as a human yet. It's way faster and it'll make media accessible to tons of people.
Like he says, there's lots of ambiguity in Japanese that needs to be handled, gender not being specified until later, etc. and an editor would need to spend time going over it - but it saves months of traditional work. There are words and _concepts_ that are hard to translate, there are cultural issues, dialects, slang, registers. So yeah it'll make the media accessible, but it won't be as a good as a skilled translator.
Last night I used GPT-4 to translate the first several pages of Ted Chiang's Lifecycle of Software Objects (a sci-fi piece) from English to Chinese. I'd say it's about as good as me, save a few minor errors. It's safe to say it performs better than a "tired me", and some translators I've seen on the market.
I'm a native speaker of Chinese, but not a professional translator.
It may depend on a language. For Polish - which is considered one of the most difficult languages due to various forms of words, it works almost perfect - on par with average human translators.
> I don't think people realize how good Bilingual LLMs are at translations
This.
GPT/ChatGPT is able to even translate between different "accents" or dialects of the same language. For example it can give you the same sentence in Mexican, Argentinean or Peruvian Spanish.
Example:
Me: Give me a sentence in spanish that is different in Mexican, Argentinean and Peruvian Spanish. Write the sentence in each dialect.
These sentences mean "What's up, dude? How are you?" in English. The primary difference is the slang term used for "dude" or "friend" in each dialect: "güey" in Mexican Spanish, "boludo" in Argentinean Spanish, and "causa" in Peruvian Spanish.
It really depends on the tone and context. If you are a tourist and say it in a joking manner, people are probably going to laugh. If you say it in anger to someone, they might not like it very much.
Similar to how a lot of swear words work in many languages.
It's interesting to see how what matters is not the word, but the intention behind it. At the end we are trying to communicate meaning, and words are just one of our tools to do it.
I am also multilingual as well and I've tested it personally. English <-> Portuguese does really well, but Portuguese <-> Japanese or even Japanese <-> English is not as good as a human translator by a long shot because of a lot of hidden subtext in conversation. Even something that a university student would probably pickup on in their first year of Japanese as a foreign language. It is still much better than GPT-3.5, so much so that it made a lot of waves here in Japan, but a few friends who work in translation of books and manga find it is not really a go-to tool yet (yet...).
Oh for sure i don't mean to say it's excellent in every language. But i personally think a lot of that is training data representation. Doesn't need to be anywhere equal but for instance after English(93%), the biggest language representation for GPT-3's training corpus is french at...1.8 % Pretty wild.
I am sure it will improve even further as you pointed out the languages outside of English are fairly low in data represented. However, I guess you said you speak Chinese correct? How well does it do with certain things like older poetic Chinese hanzi? In Japanese if there is a string of kanji it tends to mess up the context. Another area of Japanese it seems poorest at is keigo or polite business Japanese. The way you speak to a superior is almost a different language. So I unfortunately still can't use GPT-4 to help me with business emails (yet).
I didn't try with old poetic stuff. Passages sampled from 5 books released in the last 2 decades. You can see what I did thoroughly here. Before GPT-4. Basically a comparison between GLM-130b (English/Chinese model) vs Deepl, Google chatGPT(3.5) etc https://github.com/ogkalu2/Human-parity-on-machine-translati...
Mandarin isn't the second language I speak but I officially compared with it because I wanted to test also with a model that had more equivalent corpus training than the very lopsided gpt models. And Chinese/English is the only combo that has a model of note in that regard.
What language pairs are you talking about? I don't think people realize just how much the difficulty level and the state of technology differ depending on that choice.
Which is to say that there are edge cases like legal texts or other fields where a high level of domain expertise is needed to interpret and translate text. Which most human translators would also not have.
For almost everything else, it seems to produce pretty decent and usable translations, even when used against relatively obscure languages.
I used it a some green landic article that was posted on hn yesterday (about Greenland having gotten rid of daylight saving time). I don't speak a word of that language but the resulting English translation looked like it matched the topic and generally read like correct and sensible English. I can't vouch for the correctness obviously. But I could not spot any weird errors or strange formulations that e.g. Google translate suffers from. That matches my earlier experience trying to get chat gpt to answer in some Dutch dialects, Frysian, Latin, and a few other more obscure outputs. It does all of that. Getting it to use pirate speak is actually quite funny.
The reason that I used Chat GPT for this is that Google translate does not understand greenlandic. Understandable because there are only a few tens of thousands of native speakers of that language and presumably there's not a very large amount of training material in that language.
Therein lies the rub. There's a huge gap between what LLMs can currently do (spit back something in a target language that gives you the basic idea, however awkwardly phrased, of what was said in the source language). And what is actually needed for idiomatic, reasonably error-free translation.
By "reasonably error-free" I mean, say, requiring a human correction for less than 5 percent of all sentences. Current LLMs are nowhere near that level, even for resource-rich language pairs.
I've tried it between English and Dutch (which is my native language). It's pretty fluent, makes less grammar mistakes than google translate and seems to generally get the gist of the meaning across. It's not a pure syntactical translation. Which is why it can work even between some really obscure language pairs. Or indeed programming languages. Where it goes wrong is when it misunderstands context. It's not an AGI and may not pick up on all the subtleties. But it's generally pretty good.
I ran the abstract of this article through chat gpt. Flawless translation as far as I can see. To be fair, Google translate also did a decent job. Here's the chat GPT translation.
Veel NLP-toepassingen vereisen handmatige gegevensannotaties voor verschillende taken, met name om classificatoren te trainen of de prestaties van ongesuperviseerde modellen te evalueren. Afhankelijk van de omvang en complexiteit van de taken kunnen deze worden uitgevoerd door crowd-werkers op platforms zoals MTurk, evenals getrainde annotatoren, zoals onderzoeksassistenten. Met behulp van een steekproef van 2.382 tweets laten we zien dat ChatGPT beter presteert dan crowd-werkers voor verschillende annotatietaken, waaronder relevantie, standpunt, onderwerpen en frames detectie. Specifiek is de zero-shot nauwkeurigheid van ChatGPT hoger dan die van crowd-werkers voor vier van de vijf taken, terwijl de intercoder overeenkomst van ChatGPT hoger is dan die van zowel crowd-werkers als getrainde annotatoren voor alle taken. Bovendien is de per-annotatiekosten van ChatGPT minder dan $0.003, ongeveer twintig keer goedkoper dan MTurk. Deze resultaten tonen het potentieel van grote taalmodellen om de efficiëntie van tekstclassificatie drastisch te verhogen.
Translating the Dutch back to English using Google translate (to rule out model bias) you get something that is very close to the original that is still correct:
Many NLP applications require manual data annotations for various tasks, especially to train classifiers or evaluate the performance of unsupervised models. Depending on the size and complexity of the tasks, these can be performed by crowd workers on platforms such as MTurk, as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, point of view, topics, and frames detection. Specifically, ChatGPT's zero-shot accuracy is higher than crowd workers for four of the five tasks, while ChatGPT's intercoder agreement is higher than both crowd workers and trained annotators for all tasks. In addition, ChatGPT's per-annotation cost is less than $0.003, about twenty times cheaper than MTurk. These results show the potential of large language models to dramatically increase the efficiency of text classification.
I'm sure there are edge cases where you can argue the merits of some of the translations but it's generally pretty good and usable.
Thanks for counter-example; I'll confess to having spent far too much time with edge-case translations of late (on languages a bit farther apart), rather than on more generic cases like the above.
I will be re-assessing my view on general-case translation performance accordingly.
I wrote accepted corrections to state regulations law on a particular topic, and I can tell you that super-dense legalese for big-time industrial topics, had loopy and inconsistent language.
Just watched a talk[0] about natural language understanding research in post-GPT-3 era. Old issues may has been solved, while new topics are coming to this area (quoted from the slides):
Everyone shrugs and says, “nope, humans are different”. I’ve commented about 100 times recently asking for detail as to how human language / thought works, yet have not seen an answer.
We interpret what we hear, make a mental representation of that (incrementally; this process sometimes fails), which links to concepts, which in turn can link to memories, then "look for the answer" (if it's a question) by association and puzzling, the former is pretty quick, the latter slow, check if the answer makes sense, and formulate a reply. We can start formulating a reply from similarly formed structures while completing the thought, because we monitor our speech. When that happens, you often say "er..."
That's basic linguistics and cognitive psychology. Nothing an LLM has done has invalidated that.
You sure about that? The more I interact with LLMs and learn how they operate, the more it seems to me like people operate on very similar principles and algorithms with their use of language.
Shameless self-promotion, I have recently written a blog about this. ChatGPT actually is usually a little bit worse than older models for these classical NLP tasks. Of course the older models are not zero-shot.
> then the amount of information we need to make billboards and signs make sense.
Subsequently, this applies to posters, letters, newspapers, and other types of text-heavy images, ultimately reducing the language modeling problem to an image generation problem.
OpenAI did exact the test for GPT-4. The raw, non-fine-tune GPT-4 is quite good at predicting confidence level ("highly calibrated" by their words). But the RLHF fine-tuning process seems ruin its calibration. Figure 8 on page 12 of GPT-4 Technical Report shows this dramatic changes before & after fine-tuning.
Yes that's likely.
But the idea is that even if that's the case, it is still better than no PGO.
Edit: I'd like to add that if the 10-20% mentioned is measured on the benchmark that was used to do the pgo, then that figure might indeed not be representative of the real gain.
Bonus: buy ticket from scalpers after price settled, you will get a determined price, no more guessing and find inner peace.
The problem for scalpers is that if they buy too many tickets, the final price may become too high to be attractive for real buyers.