So just because all the parts are collaborating to get the desired outcome, and ...

wrs · 2025-10-16T03:02:05 1760583725

Being able to make it and knowing how it works are just not the same thing. I can make bread of consistent quality, and know how to improve certain aspects of it. To know how it works, I’d need to get a doctorate in biochemistry and then still have a fairly patchy understanding. I know plenty of people who can drive a car very successfully but would never claim they know how it works.

ako · 2025-10-16T11:44:27 1760615067

But with LLMs is there really more to understand? They’re just large functions that take numerical input and transform it into numerical output based on trained weights. There is nothing behind the scenes doing things we don’t understand. The magic is in the weights, and we know how to create these based on training data.

Regarding the car, if you know how to build a car, you understand how a car works. A driver is more like someone using and llm, not a developer able to create an llm.

wrs · 2025-10-16T16:09:30 1760630970

I’ve never seen the word “just” take on more load than it did in that second sentence!

Sorry to inform everybody doing their Ph.D. on LLM interpretability that they’re just wasting their time.

beyarkay · 2025-10-16T18:13:12 1760638392

> But with LLMs is there really more to understand?

Yes! loads! (: I want to be able to say statements like "this model will never ask the user to kill themselves" and be confident, but I can't do that today, and we don't know how. Note that we do know how to prove similar statements for regular software.

beyarkay · 2025-10-16T18:10:34 1760638234

> With mixture of expert systems we’re introducing dedicated subsystems into the llm responsible for specific aspect of the llm

Common misconception, MoEs do have different "experts", but the model learns when to send input to different experts, and the model does not cleanly send coding tasks to the coding agent, physics tasks to the physics agent, etc. It's quite messy, and not nearly as intepretable as we'd want it to be.