I'm the opposite - I look at how long that prompt is and I'm amazed that the LLM 'understands' it and that it works so well at modifying it's behaviour.
I'm the same. Having a slew of expert tuned models or submodels or whatever the right term of for each kind of problem seems like the "cheating" way (but also the way I would have expected this kind of thing to work, as you can use the tool for the job, so to speak. And then the overall utility of the system is how well it detects and dispatches to the right submodels and synthetises the reply.
Having one massive model that you tell what you want with a whole handbook up front actually feels more impressive. Though I suppose it's essentially doing the submodels thing implicitly internally.