Which is to say that there are edge cases like legal texts or other fields where...

lisasays · on March 28, 2023

I can't vouch for the correctness obviously.

Therein lies the rub. There's a huge gap between what LLMs can currently do (spit back something in a target language that gives you the basic idea, however awkwardly phrased, of what was said in the source language). And what is actually needed for idiomatic, reasonably error-free translation.

By "reasonably error-free" I mean, say, requiring a human correction for less than 5 percent of all sentences. Current LLMs are nowhere near that level, even for resource-rich language pairs.

jillesvangurp · on March 28, 2023

I've tried it between English and Dutch (which is my native language). It's pretty fluent, makes less grammar mistakes than google translate and seems to generally get the gist of the meaning across. It's not a pure syntactical translation. Which is why it can work even between some really obscure language pairs. Or indeed programming languages. Where it goes wrong is when it misunderstands context. It's not an AGI and may not pick up on all the subtleties. But it's generally pretty good.

I ran the abstract of this article through chat gpt. Flawless translation as far as I can see. To be fair, Google translate also did a decent job. Here's the chat GPT translation.

Veel NLP-toepassingen vereisen handmatige gegevensannotaties voor verschillende taken, met name om classificatoren te trainen of de prestaties van ongesuperviseerde modellen te evalueren. Afhankelijk van de omvang en complexiteit van de taken kunnen deze worden uitgevoerd door crowd-werkers op platforms zoals MTurk, evenals getrainde annotatoren, zoals onderzoeksassistenten. Met behulp van een steekproef van 2.382 tweets laten we zien dat ChatGPT beter presteert dan crowd-werkers voor verschillende annotatietaken, waaronder relevantie, standpunt, onderwerpen en frames detectie. Specifiek is de zero-shot nauwkeurigheid van ChatGPT hoger dan die van crowd-werkers voor vier van de vijf taken, terwijl de intercoder overeenkomst van ChatGPT hoger is dan die van zowel crowd-werkers als getrainde annotatoren voor alle taken. Bovendien is de per-annotatiekosten van ChatGPT minder dan $0.003, ongeveer twintig keer goedkoper dan MTurk. Deze resultaten tonen het potentieel van grote taalmodellen om de efficiëntie van tekstclassificatie drastisch te verhogen.

Translating the Dutch back to English using Google translate (to rule out model bias) you get something that is very close to the original that is still correct:

Many NLP applications require manual data annotations for various tasks, especially to train classifiers or evaluate the performance of unsupervised models. Depending on the size and complexity of the tasks, these can be performed by crowd workers on platforms such as MTurk, as well as trained annotators, such as research assistants. Using a sample of 2,382 tweets, we show that ChatGPT outperforms crowd workers for several annotation tasks, including relevance, point of view, topics, and frames detection. Specifically, ChatGPT's zero-shot accuracy is higher than crowd workers for four of the five tasks, while ChatGPT's intercoder agreement is higher than both crowd workers and trained annotators for all tasks. In addition, ChatGPT's per-annotation cost is less than $0.003, about twenty times cheaper than MTurk. These results show the potential of large language models to dramatically increase the efficiency of text classification.

I'm sure there are edge cases where you can argue the merits of some of the translations but it's generally pretty good and usable.

lisasays · on March 29, 2023

Thanks for counter-example; I'll confess to having spent far too much time with edge-case translations of late (on languages a bit farther apart), rather than on more generic cases like the above.

I will be re-assessing my view on general-case translation performance accordingly.