More

sorz · 2025-05-26T14:32:19 1748269939

The time at which scalpers buy tickets becomes irrelevant. They can resell tickets at arbitrary-high-price + margin because buyer know they will receive refund for the exceed amount, thus pay only (final price + scalper margin) for the ticket eventually.

Bonus: buy ticket from scalpers after price settled, you will get a determined price, no more guessing and find inner peace.

The problem for scalpers is that if they buy too many tickets, the final price may become too high to be attractive for real buyers.

sorz · 2025-02-20T08:48:15 1740041295

In the mug-scrubbing video, the person clearly pretends to wash the cup but does not seem to want to get their hands wet anyway. I'm curious as to when models can figure out that subtle thing.

funnyAI · 2025-02-20T08:52:09 1740041529

It's all probabilistic, my guess. I.e. model produces probabilities for a set of actions from the same video. Even pretended action may look more like it than anything else. Thus getting higher probability.

regularfry · 2025-02-20T09:53:24 1740045204

You want that to still work so that the human can demonstrate an action without putting themselves in the path of a danger to squishy human bits that the robot is safe from.

sorz · on Aug 22, 2023

Did you try scancode map registry[1]? I use it to swap Ctrl/CapsLock and it works well even with RDP softwares and run-as-admin apps which PowerToys couldn't handle.

[1] https://superuser.com/questions/550679/where-can-i-find-wind...

sorz · on Aug 11, 2023

The whole post is on base on the fact that "TW" in their URL. But I think this is really just a internal implementation detail of Google Translate, which is not intended to display to users (e.g. complete hidden on their app). Everywhere on their UI shows "Chinese (Traditional)". That is what they try to communicate with users.

Sure, zh-TW is somewhat misleading. But they nor say that parameter is a ISO 639 or RFC 5646 conformed.

sorz · on May 23, 2023

They said the paper is still working in progress and will improve it.

https://twitter.com/AiEleuther/status/1660811180901019648

sorz · on March 28, 2023

Is NLP a solved problem now?

ninjin · on March 28, 2023

Obviously not. If we had a general solution to language intelligence we would have artificial intelligence at the level of at least human intelligence – which we do not. Rather, the right question to ask is which language intelligence tasks currently have acceptable performance and under which conditions (text domain, etc.). Clearly this is a much more difficult question and with a lot more nuance to it, even if it is undeniably that things have moved very quickly over the last few years.

Skimming the abstract as a senior academic in the area. This looks like preliminary work and a limited investigation for a single (non-standard) task. Thus far from a strong result published at say a top-tier conference or journal. Still, interesting direction and if expanded upon could absolutely be impactful. I should also mention that I am not familiar with the related literature, so it could very much be that there is similar (better?) work out there exploring the same question.

swyx · on March 28, 2023

as a senior academic in the area, might you have a list of the most influential papers in your field in the past year that you would recommend?

ninjin · on March 28, 2023

I am actually not quite sure what would be best to recommend, as the pandemic has seen me lag behind the zeitgeist somewhat. A recent favourite would be the Toolformer paper [1], that I intend to read in detail later today. If an LLM would be able to use external tools efficiently, it could be rather powerful and perhaps allow us to scale down the parameter sizes, which somewhat fascinates me.

[1]: https://arxiv.org/abs/2302.04761

Other research questions, but without concrete papers to reference to that current keeps me up at night: 1.) to which degree can we train substantially smaller LLMs for specific tasks that could be run in-house, 2.) it seems like these new breakthroughs may need a different mode of evaluation compared to what we have used since the 80s in the field and I am not sure what that would look like (maybe along the lines of HELM [2]?), and 3.) can AI academia continue to operate in the way it currently does with small research teams or is a change towards what we see in the physical sciences necessary?

[2]: https://arxiv.org/abs/2211.09110

swyx · on March 28, 2023

Toolformer seems to already have been productized in ChatGPT Plugins fwiw

> 1.) to which degree can we train substantially smaller LLMs for specific tasks that could be run in-house 2.) it seems like these new breakthroughs may need a different mode of evaluation compared to what we have used since the 80s in the field and I am not sure what that would look like (maybe along the lines of HELM [2]?)

so you are proposing a set of benchmarks for domain specific tasks? by definition they wont be shared benchmarks...

og_kalu · on March 28, 2023

We have artificial intelligence that is general and above average human intelligence for the majority of tasks it can perform. Near expert level for some. NLP is a solved problem. Bespoke models are out the door. Large enough LLMs crush anything else for any NLP task.

Honestly, this whole "they are not intelligent" argument is becoming ridiculous.

might as well argue that a plane isn’t a real bird or a car isn’t a real horse.

shock-value · on March 28, 2023

The debate over what kind of intelligence these models possess is rightly lively and ongoing.

It’s clear that at the least, they can decipher very numerous patterns across a wide range of conceptual depths — it’s an architectural advance easily on the the level of the convolutional neural network, if not even more profound. The idea that NLP is “solved” isn’t a crazy notion, though I won’t take a side on that.

That said, it’s equally obvious that they are not AGI unless you have a really uninspired and self-limiting definition of AGI. They are purely feedforward aside from the single generated token that becomes part of the input to the next iteration. Multimodality has not been incorporated (aside from possibly a limited form in GPT-4). Real-world decision-making and agency is entirely outside the bounds of what these models can conceive or act towards.

Effectively and by design these models are computational behemoths trained to do one singular task only — wring a large textual input though an enormous interconnected web of calculations purely in service of distilling everything down to a single word as output, a hopefully plausible guess at what’s next given what’s been seen.

og_kalu · on March 28, 2023

AGI is Artificial General Intelligence. We have absolutely passed the bar of artificial and generally intelligent. It's not my fault goal post shifting is rampant in this field.

And you want to know the crazier thing? Evidently a lot of researchers feel similarly too.

General Purpose Technologies ( from the Jobs Paper), General Artificial Intelligence (from the creativity paper). Want to know the original title of the recent Microsoft paper ? "First contact with an AGI system".

The skirting around the word that is now happening is insanely funny. Look at the last one. Fuck, they just switched the word order. Nobody wants to call a spade a spade yet but it's obvious people are figuring it out.

I can you show you output that clearly demonstrates understanding and reasoning. That's not the problem. The problem is that when I do, the argument Quickly shifts to "it's not true understanding!" What a bizzare argument.

This is the fallacy of the philosophical zombie. Somehow there is this extra special distinction between two things and yet you can't actually show it. You can't test for so called huge distinction. A distinction that can't be tested for is not a distinction.

The intelligence arguments are also stupid because they miss the point entirely.

What matters is that the plane still flies, the car still drives and the boat still sails. For the people who are now salivating at their potential, or dreading the possibility of being made redundant by them, these large language models are already intelligent enough to matter.

shock-value · on March 28, 2023

> ... these large language models are already intelligent enough to matter.

I'm definitely not contesting that.

I've always considered the idea of "AGI" to mean something of the holy grail of machine learning -- the point at which there is no real point in pursuing further advances in artificial intelligence because the AI itself will discover and apply such augmentations using its own capabilities.

I have seen no evidence that these transformer models would be able to do this, but if the current models can do so do then perhaps I will eat my words. (Doing this would likely mean that GPT-4 would need to propose, implement, and empirically test some fundamental architectural advancements in both multimodal and reinforcement learning.)

By the way, many researchers are equally convinced that these models are in fact not AGI -- that includes the head of OpenAI.

og_kalu · on March 28, 2023

See what you're describing is much closer to ASI. At least, it used to be. This is the big problem I have. The constant post shifting is maddening.

AGI went from meaning Generally Intelligent to as smart as Human experts and then now smarter than all experts combined. You'll forgive me if I no longer want to play this game.

I know some researchers disagree. That's fine. The point I was really getting at is that no researcher worth his salt can call these models narrow anymore. There's absolutely nothing narrow about GPT and the like. So if you think it's not AGI, you've come to accept it no longer means general intelligence.

YeGoblynQueenne · on March 28, 2023

>> The point I was really getting at is that no researcher worth his salt can call these models narrow anymore.

Are you talking about large language models (LLMs)? Because those are narrow, and brittle, and dumb as bricks, and I don't care a jot about your "No True Scotsman". LLMs can only operate on text, they can only output text that demonstrates "reasoning" when their training text has instances of text detailing the solutions of reasoning problems similar to the ones they're asked to solve, and their output depends entirely on their input: you change the prompt and the "AGI" becomes a drooling idiot, and v.v.

That's no sign of intelligence and you should re-evaluate your unbridled enthusiasm. You believe in magick, and you are loudly proclaiming your belief in magick. Examples abound in history that magick doesn't work, and only science does.

og_kalu · on March 28, 2023

Lol Okay

behringer · on March 28, 2023

I've been using chatgpt for a day and determined it absolutely can reason.

I'm an old hat hobby programmer that played around with ai demos back in the mid to late 90s and 2000s and chatgpt is nothing like any ai I've ever seen before.

It absolutely can appear to reason especially if you manipulate it out of its safety controls.

I don't know what it's doing to cause such compelling output, but it's certainly not just recursively spitting out good words to use next.

That said, there are fundamental problems with chatgpt's understanding of reality, which is to say it's about as knowledgeable as a box of rocks. Or perhaps a better analogy is about as smart as a room sized pile of loose papers.

But knowing about reality and reasoning are two very different things.

I'm excited to see where things go from here.

ipaddr · on March 28, 2023

It is predicting next most likely set of tokens not next word which is the game changer because the system can relate by group.

IanCal · on March 28, 2023

Have you tried out gpt4? If not and you can get access I'd really recommend it. It's drastically better than what you get on the free version - probably only a little on the absolute scale of intelligence but then so is the difference between an average person and a smart person is small on the scale from "worm" to "supergenius".

behringer · on March 28, 2023

yeah I'll definitely be checking it out

Mawr · on March 28, 2023

The market disagrees with you. How come there are billions of dollars spent on all these knowledge workers around the world every day when they could be replaced by this expert-level AI?

I'm not sure where this idea of LLMs being intelligent even comes from. It took me a whopping 9 prompts (genuine questions, no clever prompt engineering) of interacting with ChatGPT to conclude it does not understand anything. It doesn't understand addition, what length is, doesn't remember what it said a second ago, etc.

The output of ChatGPT is clearly just a reflection of its inner workings - predicting the next word based on training data. It's clever and undoubtely useful for a certain set of repetitive problems like generating boilerplate but it's not intelligence, not by any reasonable definition.

sebzim4500 · on March 28, 2023

I don't think any technology has been rolled out with the speed you are suggesting LLMs should have been rolled out.

It's like saying 4 months after the first useful car was manufactured. "If these are so good, how come there are still horses? Clearly the market disagrees with you".

emodendroket · on March 28, 2023

To give an example of the limitations of these things that's hopefully easy to understand, I got access to Bard this morning and asked it to write a limerick. It gave me what could charitably be called a free verse poem that happened to begin "there once was a man from Nantucket." I'm sure they can improve on it (ChatGPT was better at this kind of thing when I had access to it) but "solved problem" is clearly a long way off.

hallway_monitor · on March 28, 2023

Seems Pretty Good to me! Better than I could do anyway. Bard is a joke compared to GPT-4: "Write a limerick about a dog"

  There once was a dog from the pound
  Whose bark had a curious sound
  With a wag and a woof,
  He'd jump on the roof,
  Delighting the folks all around.

emodendroket · on March 28, 2023

Yes, much more compelling. But if this were a “solved problem” then any of them should be able to do it easily. It’s not like I need to compare the results of sorting between different programs. It just works. That is a solved problem.

sebzim4500 · on March 28, 2023

A solved problem means that someone has solved it, not that everyone has.

emodendroket · on March 28, 2023

You can use the term to mean whatever you want but in my mind it means it's boring with no particular room for improvement. Even the biggest booster isn't going to say that about this AI. And keep in mind, "write me a limerick" is a pretty easy prompt. We're not trying to do anything too novel or crazy there.

FeepingCreature · on March 28, 2023

Yeah, at least two years.

ninjin · on March 28, 2023

> We have artificial intelligence that is general and above average human intelligence for the vast majority of tasks it can perform.

Even when I give it the benefit of the doubt, this sentence makes no sense to me. Do we have a list of tasks a language model can perform? To the best of my knowledge, they can arguably perform any language task.

> Large enough LLMs crush anything else for any NLP task. and evidently they beat top humans too.

Yes, they are certainly (rightfully) the go-to model for most tasks at this point if your concern is outright performance. Have I indicated otherwise? As for beating “top” humans, I am sure that can be investigated, but it is a fairly nuanced research question. It is inarguable that they are amazingly good though, especially relative to what we had just a few years ago.

> Honestly, this whole "they are not intelligent" argument is becoming ridiculously obtuse. > > might as well argue that a plane isn’t a real bird or a car isn’t a real horse.

Which is a claim and argument that I never made – hallucinating? How about you calm down a little and get back on the ground? You are talking to someone that has argued in favour of these kinds of models for about a decade. But that does not mean that I am willing to spout nonsense or lose track of what we know and what we do not yet know.

og_kalu · on March 28, 2023

You said NLP is unsolved because we don't have human level artificial intelligence. We absolutely do. at least by any evaluations we can carry out.

no-one wants to call a spade a spade yet but the sentiment is obvious in recent research. directly being called General purpose technologies from the jobs paper, general artificial intelligence from the creativity paper. That last one is particularly funny, they just switched the two words.

ummonk · on March 28, 2023

> might as well argue that a plane isn’t a real bird or a car isn’t a real horse.

They aren’t though… They are far superior at specific things birds and horses are known for, but they can’t do everything that birds and horses can, so they aren’t even artificial birds and horses.

og_kalu · on March 28, 2023

Of course they aren't. The point is that it's irrelevant. what matters is that the plane still flies, the car still drives and the boat still sails.

For the people who are now salivating at their potential, or dreading the possibility of being made redundant by them, these large language models are already intelligent enough to matter.

Handwringing bout some non-existent difference between "true understanding" and "fake understanding" which by the way nobody seems to be able to actually distinguish (I mean wow such a supposed huge difference and you can't even show me what that is. a distinction you can't test for is not a distinction ) is so far beyond the point, it's increasingly maddening to read.

ummonk · on March 28, 2023

Okay I agree with you on that. The technology will be disruptive regardless of whether we attribute true understanding to it, and as we start adding long term memory and planning to these AIs, we will start seeing significant alignment risk as well. This is true regardless of whether we decide to cope by saying they have "fake understanding" and are "stochastic parrots".

mistrial9 · on March 28, 2023

no, a short answer to this is .. these models are probabilistic, therefore they will always have errors along with whatever else. Secondly "intelligence" is not one thing; no one has all of it or none of it, including computers.

nico · on March 28, 2023

> these models are probabilistic, therefore they will always have errors

There's nothing perfect. Even computers and computer networks need to have error-correcting code because information gets randomly corrupted.

Our whole reality is probabilistic.

And us humans are way worse than AI at consistency. We even overwrite our own memories all the time, so we can't even be sure what we remember is actually what happened! (btw, this is currently being used in therapy to re-write traumatic memories and help people overcome PTSD).

https://www.npr.org/sections/health-shots/2014/02/04/2715279...

YeGoblynQueenne · on March 28, 2023

This is clearly preliminary work. Not to be disparaging of the authors' background but their background is in political sciences, not in machine learning or NLP which should account for the limitations of the study. But anyway that's just an arxiv preprint so probably more like something exploratory than a research direction the authors are invested in.

sebzim4500 · on March 28, 2023

>If we had a general solution to language intelligence we would have artificial intelligence at the level of at least human intelligence

I don't see why this follows. We do loads of stuff other than language, it is entirely possible for an AI to be better than us at language but worse at everything else.

rtkwe · on March 28, 2023

I still think we're a long ways off. LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions. So far it's shown it's really really good at continuing a conversation with more text but as far as I understand them there's not a usable comprehension of what's actually being asked and answered.

The point that would say to me the LLM actually has any "understanding" of what it's saying would be when it's able to reliably say "I don't know the answer to that" instead of making up things from scratch. You see that a lot if you ask Bing/Bard "Who is _____?" Most of them are kind of right but a lot of large details are just completely fabricated. A lot of the facts it gets wrong are things Google is already able to produce when queried like where was Person X born or where did they go to school so the fact these LLMs can't slot in actual available facts says to me they're not really going to be that useful with the kind of tasks we've been working on NLP for.

pornel · on March 28, 2023

> LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions.

They can: https://arxiv.org/abs/2302.04761

> In this paper, we show that LMs can teach themselves to use external tools

ummonk · on March 28, 2023

“LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions.”

Both Bing chat and ChatGPT plugins are examples of being able to do just this.

You’re right about how they make up answers though, but humans are often quite prone to that too…

rtkwe · on March 28, 2023

A human, if not incentivized to lie or directly incentivized to be truthful, could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot. Once it can do that I think they'll be far more useful, at least then you can have a rough idea of how much you need to check the bots work. If I have to do that for every thing it spits out the best it can do for me is give me new words to use while searching.

Granted getting the name for something to search is often half the battle in tech.

smallnamespace · on March 28, 2023

> could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot.

In fact GPT-4 is quite good at catching hallucinations when the question-answer pair is fed back to itself.

This isn’t automatically applied already because the model is expensive to run, but you can just do it yourself (or automate it with a plug-in or LangChain) and pay the extra cost.

Remember that the model only performs a fixed amount of computation per generated token, so just asking it to think out loud or evaluate its own responses is basically giving it a scratchpad to think harder about your question.

nynx · on March 28, 2023

Mostly, yes. LLMs will turn natural language into really whatever form you want.

lisasays · on March 28, 2023

NLP is just a matter doing tasks at the accuracy level of an MTurk, you say?

rjh29 · on March 28, 2023

Idiomatic translation of text matching a human professional (e.g. free of errors for legal terms, interesting and natural for fiction) is unlikely to be achieved until we have AGI. So no.

og_kalu · on March 28, 2023

I won't comment on the first bit as i've not personally tested in that are but GPT-4 can absolutely make short work on the second. I don't think people realize how good Bilingual LLMs are at translations. Yes you have idioms transfer between languages. Feel free to test it yourself.

rjh29 · on March 28, 2023

I have tested it :) I've asked it to translate English fictional text into Japanese, it falls over often. It's unnatural and often makes no sense at all. It doesn't compare to a typical professional translation (which are often not that idiomatic either), let alone a really good one.

I'm sure it'll be doing that in five years, but not now.

One interesting thing is that's it's nondeterministic, so sometimes 'For chrissakes' turns to ちくしょう (Damn!) but sometimes to クリスのために (for Chris' sake). Sometimes 'the goddamn door' turns into クソドア ('shit door'), sometimes the goddamn changes the phrasing of the whole sentence instead. If you run it five times and take the best sentences out of all five runs it's probably quite good. Maybe prompting would help too, I said "idiomatic Japanese" but it still usually translated it in a very "foreigner Japanese" way typical of US drama/movie translations.

bumbledraven · on March 28, 2023

Are you giving it multiple paragraphs to translate at once so that it has enough context for a good translation? If so, would you mind sharing a sample input and output that you found unsatisfactory?

In "Can GPT-4 translate literature?" (Mar 18, 2023) [https://youtu.be/5KKDCp3OaMo?t=377], Tom Gally, a former professional translator and current professor at the University of Tokyo, said:

> …the point is, to my eye at least, the the basic quality of the translation [from Japanese to English] is as good as a human might do, and with some relatively mild editing by a sensitive human editor, it could be turned into a novel that would be readable and enjoyable.

rjh29 · on March 28, 2023

I don't think we disagree. The video says the translation will be "readable" but needs several days of an experienced editor passing over it. That's an amazing result, but again, it's not as good as a human yet. It's way faster and it'll make media accessible to tons of people.

Like he says, there's lots of ambiguity in Japanese that needs to be handled, gender not being specified until later, etc. and an editor would need to spend time going over it - but it saves months of traditional work. There are words and _concepts_ that are hard to translate, there are cultural issues, dialects, slang, registers. So yeah it'll make the media accessible, but it won't be as a good as a skilled translator.

bumbledraven · on March 28, 2023

He didn't say it was merely "readable"; he said (as I quoted in GP) "the basic quality of the translation is as good as a human might do."

raincole · on March 28, 2023

Last night I used GPT-4 to translate the first several pages of Ted Chiang's Lifecycle of Software Objects (a sci-fi piece) from English to Chinese. I'd say it's about as good as me, save a few minor errors. It's safe to say it performs better than a "tired me", and some translators I've seen on the market.

I'm a native speaker of Chinese, but not a professional translator.

og_kalu · on March 28, 2023

Mind sharing output ?

I mean i can if you want (chinese though) but enough people lie on the internet.

kolinko · on March 28, 2023

It may depend on a language. For Polish - which is considered one of the most difficult languages due to various forms of words, it works almost perfect - on par with average human translators.

lowefk · on March 28, 2023

Here is GPT-4's translation, and I find no issues: https://imgur.com/a/oOtf4RD

nico · on March 28, 2023

> I don't think people realize how good Bilingual LLMs are at translations

This.

GPT/ChatGPT is able to even translate between different "accents" or dialects of the same language. For example it can give you the same sentence in Mexican, Argentinean or Peruvian Spanish.

Example:

Me: Give me a sentence in spanish that is different in Mexican, Argentinean and Peruvian Spanish. Write the sentence in each dialect.

ChatGPT: Mexican Spanish: "¿Qué onda, güey? ¿Cómo estás?" Argentinean Spanish: "¿Qué onda, boludo? ¿Cómo estás?" Peruvian Spanish: "¿Qué tal, causa? ¿Cómo estás?"

These sentences mean "What's up, dude? How are you?" in English. The primary difference is the slang term used for "dude" or "friend" in each dialect: "güey" in Mexican Spanish, "boludo" in Argentinean Spanish, and "causa" in Peruvian Spanish.

zquzra · on March 28, 2023

I don't recommend you call random people "boludo" in Argentina.

nico · on March 28, 2023

It really depends on the tone and context. If you are a tourist and say it in a joking manner, people are probably going to laugh. If you say it in anger to someone, they might not like it very much.

Similar to how a lot of swear words work in many languages.

It's interesting to see how what matters is not the word, but the intention behind it. At the end we are trying to communicate meaning, and words are just one of our tools to do it.

gwoolhurme · on March 28, 2023

I am also multilingual as well and I've tested it personally. English <-> Portuguese does really well, but Portuguese <-> Japanese or even Japanese <-> English is not as good as a human translator by a long shot because of a lot of hidden subtext in conversation. Even something that a university student would probably pickup on in their first year of Japanese as a foreign language. It is still much better than GPT-3.5, so much so that it made a lot of waves here in Japan, but a few friends who work in translation of books and manga find it is not really a go-to tool yet (yet...).

og_kalu · on March 28, 2023

Oh for sure i don't mean to say it's excellent in every language. But i personally think a lot of that is training data representation. Doesn't need to be anywhere equal but for instance after English(93%), the biggest language representation for GPT-3's training corpus is french at...1.8 % Pretty wild.

Of Course i don't know the data for GPT-4

gwoolhurme · on March 28, 2023

I am sure it will improve even further as you pointed out the languages outside of English are fairly low in data represented. However, I guess you said you speak Chinese correct? How well does it do with certain things like older poetic Chinese hanzi? In Japanese if there is a string of kanji it tends to mess up the context. Another area of Japanese it seems poorest at is keigo or polite business Japanese. The way you speak to a superior is almost a different language. So I unfortunately still can't use GPT-4 to help me with business emails (yet).

og_kalu · on March 28, 2023

I didn't try with old poetic stuff. Passages sampled from 5 books released in the last 2 decades. You can see what I did thoroughly here. Before GPT-4. Basically a comparison between GLM-130b (English/Chinese model) vs Deepl, Google chatGPT(3.5) etc https://github.com/ogkalu2/Human-parity-on-machine-translati...

Mandarin isn't the second language I speak but I officially compared with it because I wanted to test also with a model that had more equivalent corpus training than the very lopsided gpt models. And Chinese/English is the only combo that has a model of note in that regard.

maeil · on March 28, 2023

What language pairs are you talking about? I don't think people realize just how much the difficulty level and the state of technology differ depending on that choice.

og_kalu · on March 28, 2023

English/Chinese is what i've tried on.

and you can see talk on English/Japanese here - https://youtu.be/5KKDCp3OaMo?t=377

jillesvangurp · on March 28, 2023

Which is to say that there are edge cases like legal texts or other fields where a high level of domain expertise is needed to interpret and translate text. Which most human translators would also not have.

For almost everything else, it seems to produce pretty decent and usable translations, even when used against relatively obscure languages.

I used it a some green landic article that was posted on hn yesterday (about Greenland having gotten rid of daylight saving time). I don't speak a word of that language but the resulting English translation looked like it matched the topic and generally read like correct and sensible English. I can't vouch for the correctness obviously. But I could not spot any weird errors or strange formulations that e.g. Google translate suffers from. That matches my earlier experience trying to get chat gpt to answer in some Dutch dialects, Frysian, Latin, and a few other more obscure outputs. It does all of that. Getting it to use pirate speak is actually quite funny.

The reason that I used Chat GPT for this is that Google translate does not understand greenlandic. Understandable because there are only a few tens of thousands of native speakers of that language and presumably there's not a very large amount of training material in that language.

lisasays · on March 28, 2023

I can't vouch for the correctness obviously.

Therein lies the rub. There's a huge gap between what LLMs can currently do (spit back something in a target language that gives you the basic idea, however awkwardly phrased, of what was said in the source language). And what is actually needed for idiomatic, reasonably error-free translation.

By "reasonably error-free" I mean, say, requiring a human correction for less than 5 percent of all sentences. Current LLMs are nowhere near that level, even for resource-rich language pairs.

jillesvangurp · on March 28, 2023

I've tried it between English and Dutch (which is my native language). It's pretty fluent, makes less grammar mistakes than google translate and seems to generally get the gist of the meaning across. It's not a pure syntactical translation. Which is why it can work even between some really obscure language pairs. Or indeed programming languages. Where it goes wrong is when it misunderstands context. It's not an AGI and may not pick up on all the subtleties. But it's generally pretty good.

I ran the abstract of this article through chat gpt. Flawless translation as far as I can see. To be fair, Google translate also did a decent job. Here's the chat GPT translation.

Veel NLP-toepassingen vereisen handmatige gegevensannotaties voor verschillende taken, met name om classificatoren te trainen of de prestaties van ongesuperviseerde modellen te evalueren. Afhankelijk van de omvang en complexiteit van de taken kunnen deze worden uitgevoerd door crowd-werkers op platforms zoals MTurk, evenals getrainde annotatoren, zoals onderzoeksassistenten. Met behulp van een steekproef van 2.382 tweets laten we zien dat ChatGPT beter presteert dan crowd-werkers voor verschillende annotatietaken, waaronder relevantie, standpunt, onderwerpen en frames detectie. Specifiek is de zero-shot nauwkeurigheid van ChatGPT hoger dan die van crowd-werkers voor vier van de vijf taken, terwijl de intercoder overeenkomst van ChatGPT hoger is dan die van zowel crowd-werkers als getrainde annotatoren voor alle taken. Bovendien is de per-annotatiekosten van ChatGPT minder dan $0.003, ongeveer twintig keer goedkoper dan MTurk. Deze resultaten tonen het potentieel van grote taalmodellen om de efficiëntie van tekstclassificatie drastisch te verhogen.

Translating the Dutch back to English using Google translate (to rule out model bias) you get something that is very close to the original that is still correct:

Many NLP applications require manual data annotations for various tasks, especially to train classifiers or evaluate the performance of unsupervised models. Depending on the size and complexity of the tasks, these can be performed by crowd workers on platforms such as MTurk, as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, point of view, topics, and frames detection. Specifically, ChatGPT's zero-shot accuracy is higher than crowd workers for four of the five tasks, while ChatGPT's intercoder agreement is higher than both crowd workers and trained annotators for all tasks. In addition, ChatGPT's per-annotation cost is less than $0.003, about twenty times cheaper than MTurk. These results show the potential of large language models to dramatically increase the efficiency of text classification.

I'm sure there are edge cases where you can argue the merits of some of the translations but it's generally pretty good and usable.

lisasays · on March 29, 2023

Thanks for counter-example; I'll confess to having spent far too much time with edge-case translations of late (on languages a bit farther apart), rather than on more generic cases like the above.

I will be re-assessing my view on general-case translation performance accordingly.

mistrial9 · on March 28, 2023

I wrote accepted corrections to state regulations law on a particular topic, and I can tell you that super-dense legalese for big-time industrial topics, had loopy and inconsistent language.

sorz · on March 30, 2023

Just watched a talk[0] about natural language understanding research in post-GPT-3 era. Old issues may has been solved, while new topics are coming to this area (quoted from the slides):

- Retrieval augmented in-context learning

- Better benchmarks

- Last mile for productive application

- Faithful, human-interoperable explanations

[0] https://www.youtube.com/watch?v=-lnHHWRCDGk

kolinko · on March 28, 2023

Not sure why you got downvotes.

It seems like it's a solved issue indeed. AI reasoning has still some way to go, but it seems language understanding is a finished subject.

tgv · on March 28, 2023

Regurgitating training data trigram by trigram is not how human language processing works.

maxdoop · on March 28, 2023

And how does it work, then?

Everyone shrugs and says, “nope, humans are different”. I’ve commented about 100 times recently asking for detail as to how human language / thought works, yet have not seen an answer.

tgv · on March 28, 2023

We interpret what we hear, make a mental representation of that (incrementally; this process sometimes fails), which links to concepts, which in turn can link to memories, then "look for the answer" (if it's a question) by association and puzzling, the former is pretty quick, the latter slow, check if the answer makes sense, and formulate a reply. We can start formulating a reply from similarly formed structures while completing the thought, because we monitor our speech. When that happens, you often say "er..."

That's basic linguistics and cognitive psychology. Nothing an LLM has done has invalidated that.

hallway_monitor · on March 28, 2023

You sure about that? The more I interact with LLMs and learn how they operate, the more it seems to me like people operate on very similar principles and algorithms with their use of language.

tgv · on March 28, 2023

The "if it walks like a duck" school of ontology.

empiko · on March 28, 2023

Shameless self-promotion, I have recently written a blog about this. ChatGPT actually is usually a little bit worse than older models for these classical NLP tasks. Of course the older models are not zero-shot.

https://www.opensamizdat.com/posts/chatgpt_survey/

est · on March 28, 2023

for stuff that before 2021-09, mostly.

sorz · on March 27, 2023

> then the amount of information we need to make billboards and signs make sense.

Subsequently, this applies to posters, letters, newspapers, and other types of text-heavy images, ultimately reducing the language modeling problem to an image generation problem.

sorz · on March 20, 2023

OpenAI did exact the test for GPT-4. The raw, non-fine-tune GPT-4 is quite good at predicting confidence level ("highly calibrated" by their words). But the RLHF fine-tuning process seems ruin its calibration. Figure 8 on page 12 of GPT-4 Technical Report shows this dramatic changes before & after fine-tuning.

sorz · on Oct 23, 2022

Is it possible that the profile is over-fitting to the benchmark tests?

Tuna-Fish · on Oct 23, 2022

Their main benchmark test is compiling every publicly released crate on crates.io. This is also their main regression test.

If you manage to overfit against that, it's still probably an amazing general purpose solution.

varajelle · on Oct 23, 2022

Yes that's likely. But the idea is that even if that's the case, it is still better than no PGO.

Edit: I'd like to add that if the 10-20% mentioned is measured on the benchmark that was used to do the pgo, then that figure might indeed not be representative of the real gain.

sorz · on July 13, 2022

Does it work for WPA3-Personal network?