The black swan event includes the unexpected emergent capabilities of LLMs, whic...

neuronic · on March 20, 2023

Single stories of LLM success are just as problematic as single stories of LLM failures. As always. The fact is that we have a dangerous tool at hand that absolutely REQUIRES skepticism if you are trying to get *facts* out of it.

I agree that LLMs are extremely likely to impact many areas of work, particularly bullshit work. But as it stands you absolutely cannot use them as fact machines, the results can be catastrophic.

What it does well, among others:

- Scaffolding text, breaking writers block etc.

- Compose basic texts from minimal input, for example for bullshit tasks -> I generated an internal "vision statement" during a Miro workshop for my team by inputting a bunch of bullet points gathered from the team members brain storming. It created a concise, fluid text that everybody liked. It's now the vision statement.

- Point you in good directions, give you ideas

What it does NOT well, among others:

- provide factual responses. all responses MUST be scrutinized because they are likely containing false information. This is very dangerous for society ("Can I take this medicine with this other medicine?")

- Compose creative texts that are coherent and novel. ChatGPT texts can be quite fun but they rarely make sense beyond very superficial screening and convey no deeper message.

However, ChatGPT-like tools are used with a lot of naivety and often blind acceptance instead of using them as tools to aid your work.

textninja · on March 20, 2023

For the most part I agree, but will qualify that the caveats you listed apply to the out of the box version of ChatGPT. I expect these limitations will be overcome by using it as programming substrate and connecting it to other models and APIs.

I am impressed by what we’ve seen from ChatGPT so far, but am especially excited to see what industry does with LLMs as new type of building block.

neuronic · on March 21, 2023

Fully agreed! I suspect that there will always be this subtle risk of catastrophic failure though, something observed in a lot of AI systems. The scrutiny filter may become less relevant but will likely not be less needed to prevent 1 in 10,000 bad responses.

If 100,000 people ask critical questions then 10 people might run into potentially catastrophic consequences. ChatGPT is a powerful tool and will only become more so but it will probably not be perfectly reliable by any means due to the nature of the system.

I am excited for the generative AI future and whatever the hell is still coming. Only those who adapt will survive.

throwaway1851 · on March 20, 2023

I don’t put much stock in the claims about GPT4 “passing” professional exams. Many copies of previously administered exams are available, and the exams are formulaic in their construction (to make them stable, predictable targets).

A compressive copy of the internet brute-forcing its way through an exam (which it may even have digested already) is really not interpretable as performing well on the exam. It’s a meaningless measure because the tests were not designed with this use in mind.