I find that maintaining/developing code is not an ideal use case for LLMs and is...

I find that maintaining/developing code is not an ideal use case for LLMs and is distracting from the much more interesting ones.

Any LLM application that relies more-or-less on a single well-engineered prompt to get things done is entry level and not all that impressive in the big picture - 99% of the heavy lifting is in the foundation model and next token prediction. Many code assistants are based on something like this out of necessity of needing to support anybody's code. You can't rely on too many clever prompt chaining patterns to build optimizations for Claude Code because everyone takes different approaches to their codebase and has wildly differing expectations for how things should go down. Because the range of expectations is so vast, there is a lot of room to get disappointed.

The LLM applications that are most interesting have the model integrated directly with the product experience and rely on deep domain expertise to build sophisticated chaining of prompts, tool calling and nesting of conversations. In these applications, the user's experience and outcomes are mostly predetermined with the grey areas intended to be what the LLM is dealing with. You can measure things and actually do something about it. What was the probability of calling one tool over the other in a specific context of use? Placing these prompts and statistics alongside domain requirements will enable you to see and make a difference.