This was excellent. Here's the notebook that accompanies the video: https://gith...

jph00 · on Sept 25, 2023

I'm glad you like it Simon! And thank you for all the marvellous learning material you've provided. :D

lazerjesus · on Sept 25, 2023

Hi Jeremy, is there a chance you’ll be tackling agents and agent systems soon too? best

pelasaco · on Sept 25, 2023

How often you hear that you look like "Rick Moranis"? :D

Thank you for your content!

jankovicsandras · on Sept 27, 2023

Thanks for the inspiring video! Can I have some questions / play devil's advocate a bit?

I hope you don't mind if I start with constructive criticism: It seems you talk about capabilities of the LLM based _applications_ (what ChatGPT, OpenAI API, Bard can or can't do) and not about the capabilities of the large language models itself in the first half of the video. I would have loved to see meaningful comparisons with opensource models (e.g. Llama2) way earlier, not just the last third.

In the part "What GPT-4 can do?" (around 17:20 in the video) you show that it answers correctly to the questions in the study (which claimed GPT4 can't answer correctly). Are you sure that ChatGPT does this with the model or by clever tricks?

I mean, if I were OpenAI and find this criticism study about GPT-4, I would of course employ a small team to fix this. But I guess the fix is not retraining / finetuning the model, just maybe adding a wrapper to identify logical / puzzle questions and add a guided prompt system to get correct results. With other words: "hardcoding" the solution (path) to several types of questions in an application layer (not the model itself).

Of course it's difficult to test this theory, because one need to invent new kinds of logical puzzles to test GPT4, because I assume OpenAI has the resources to create hardcoded solutions for all the existing types.

Abstracting the question a bit: Should we explain GPT4's (and similar systems') success (or failure) _only with the LLM_ or also consider a huge expert system around it? ( https://en.wikipedia.org/wiki/Expert_system ) Should we care? (I personally don't. I think it's the utility that matters.) How much "quality improvement" will be based on LLM training vs. better expert systems around the LLMs in the future?

----

I like the part where ChatGPT repeats the error to the modified wolf, goat, cabbage problem (around 29:30 in the video) and you say "once GPT4 starts to be wrong, it tends to be more and more wrong". I guess the reason is partly the "usual popular solution" has high probability, partly because the previous question-answer pairs are feeded into the current generation, so it reinforces the wrong solution. Your clever prompt fixes this, which validates the idea that "prompt engineering" is a relevant skill.

Can we assume that LLMs will be able to "work" alone in the future or will they always need guidance from humans to reach human level reasoning?

----

The coding example shows that while the model is generating the code, but the runtime/testing is handled by a wrapper / agent-like workflow. The OCR and charting examples are also likely prebuilt workflows (also in Bard). But it's still great, that it's working.

----

In the part "What can you do on your own computer?" (around 53:30) you say that "you're gonna need to use a GPU" and also the ending implies you must use GPU. I get OK results and 4+ tokens/s on my cheap laptop with Llama.cpp. It's not great; around half the speed of a Mac; but on a machine that is way cheaper than half of a Mac. Some comments on HN mentioned 12-15 tokens/s with 2 x 3090. For the same price one could buy many GPU-less machines with a combined speed that might be greater than this. I'm not saying that it's practical, but it's good to know that GPU-less solutions are not orders of magnitude worse, especially not in price/performance.

Thanks again and I hope you make more videos like this!