> It's unclear if narrow AI is as powerful as multimodal models with tools, as o...

> It's unclear if narrow AI is as powerful as multimodal models with tools, as of yet. Is an LLM which has access to narrow AI "tools" strictly more powerful, capable of running experiments or improving itself? See: AutoGPT, Langchain, et al.

It's probably a spectrum in reality, but I'm quite certain that general LLM's (even when given access to tools) are still considered narrow AI. I can see how that feels pedantic at this point and I myself can think of counterexamples that strain that point of view.

> It's the general instruction & tool tuned LLMs which are currently changing our expectations of what these models can do.

This seems opinionated as well. Instruction tuning is very cool from a UX perspective - but the success of un/self-supervised deep learning is what changed expectations about these models. The ability of deep learning to successfully generalize, interpolate between data points, and even accurate predict compositions of data points it never saw mixed together (e.g. avocado armchair) is absolutely doing the bulk of the work here. That RLHF and tools/plugins even _work_ is because the base model is so robust.

> Is there any evidence for a "topic specific" LLM being useful?

That's a great question. In general, self-supervised learning works best when the distribution your dataset captures is massive (and you have enough data for the model to learn that underlying distribution). So the bottleneck for "topic specific" LLM's is data - and when your humongous web-scrape actually captures more of that data (although it's challenging to filter it out), then yeah - it makes more sense to train the general model and just use it/finetune it for your downstream task.

Distillation of models is relevant here though. If you need a small model that works on a phone, it might be prudent to treat your general model as a teacher for a much smaller student model. Much of that is still active research though.