You can generate JSON or other machine readable formats to use as inputs to APIs...

You can generate JSON or other machine readable formats to use as inputs to APIs allowing the LLM to directly operate whatever software or hardware you want. You can’t remove next token prediction without fundamentally changing the architecture and losing all of the benefits (unless you invent the next big thing of course). Each generated token has to go back in order to get the next one. Perhaps if you could simplify your API to a single float value you could just do a single step but I doubt this would work as well. Progress will continue to be made through further training and fine tuning until another significant discovery is made.