I've actually written my own a homebrew framework like this which is a.) cli-coder agnostic and b.) leans heavily on git worktrees [0].
The secret weapon to this approach is asking for 2-4 solutions to your prompt running in parallel. This helps avoid the most time consuming aspect of ai-coding: reviewing a large commit, and ultimately finding the approach to the ai took is hopeless or requires major revision.
By generating multiple solutions, you can cutdown investing fully into the first solution and use clever ways to select from all the 2-4 candidate solutions and usually apply a small tweak at the end. Anyone else doing something like this?
There is a related idea called "alloying" where the 2-4 candidate solutions are pursued in parallel with different models, yielding better results vs any single model. Very interesting ideas.
I've been doing something similiar: aider+gpt-5, claude-code+sonnet, gemini-cli+2.5-pro. I want to coder-cli next.
A main problem with this approach is summarizing the different approaches before drilling down into reviewing the best approach.
Looking at a `git diff --stat` across all the model outputs can give you a good measure of if there was an existing common pattern for your requested implementation. If only one of the models adds code to a module that the others do not, it's usually a good jumping off point to exploring the differing assumptions each of the agents built towards.
This reminds me of an an approach in mcmc where you run mutiple chains at different temperatures and then share the results between them (replica exchange MCMC sampling) the goal being not to get stuck in one “solution”
My favorite pattern I've found is to write encode implementations manually, and then AI pretty easily is able to follow that logic and translate it into a decode function.
Watch how the "Cumulative encoding" row grows each iteration (that's where the BTC address will be encoded) and then look at the other rows for how the algorithm arrives at that.
In particular, under v0.1.0 see `decode-branch.md` prompt and it's associated generated diff which implements memoization for backtracking while performing decoding.
It's a tight PR that fits the existing codebase and works well, you just need a motivating example you can reproduce which can help me you quickly determine if the proposed solution is working. I usually generate 2-3 solutions initially and then filter them quickly based on a test case. And as you can see from the prompt, it's far from well formatted or comprehensive, just "slap dash" listing of potentially relevant information similar to what would be discussed at an informal whiteboard session.
I'm actually working on something similar to this where you can encode information into the outputs of LLM's via steganography: https://github.com/sutt/innocuous
Since I'm really looking to sample the only the top ~10 tokens, and I mostly test on CPU-based inference of 8B models, there's probably not a lot of worries getting a different order of the top tokens based on hardware implementation, but I'm still going to take a look at it eventually, and build in guard conditions against any choice that would be changed by an epsilon of precision loss.
I think the "magic" that we've found a common toolset of methods - embeddings and layers of neural networks - that seem to reveal useful patterns and relationships from a vast array of corpus of unstructured analog sensors (pictures, video, point clouds) and symbolic (text, music) and that we can combine these across modalities like CLIP.
It turns out we didn't need a specialist technique for each domain, there was a reliable method to architect a model that can learn itself, and we could already use the datasets we had, they didn't need to be generated in surveys or experiments. This might seem like magic to an AI researcher working in the 1990's.
"Unstructured data learners and generators" is probably the most salient distinction for how current system compare to previous "AI systems" examples (NLP, if-statements) that OP mentioned.
At a meta-level, I wonder if there's this un-talked about advantage of poaching ambitious talent out of an established incumbent to work a new product line in a new organization, in this case Apple Silicon disrupting Intel/AMD. And we've also seen SpaceX do this NASA/Boeing, and OpenAI do it to Google's ML departments.
It seems like large, unchallenged organizations like Intel (or NASA or Google) collect all the top talent out of school. But changing budgets, changing business objectives, frozen product strategies make it difficult for emerging talent to really work on next-generation technology (those projects have already been assigned to mid-career people who "paid their dues").
Then someone like Apple Silicon with M-chip or SpaceX with Falcon-9 comes along and poaches the people most likely to work "hardcore" (not optimizing for work/life balance) while also giving the new product a high degree of risk tolerance and autonomy. Within a few years, the smaller upstart organization has opened up in un-closeable performance gap with behemoth incumbent.
Has anyone written about this pattern (beyond Innovator's Dilemma)? Does anyone have other good examples of this?
I'm not sure it really takes that kind of breakthrough approach. Apple chips are more energy efficient, but x86 can be much faster on CPU or GPU tasks, and it's much more versatile. A main "bug and feature" issue is the PC industry relies on common denominator standards and components, whereas Apple has gone vertical with very limited core expansion. This is particularly important when it comes to memory speed, where the standards are developed and factories upgraded over years at huge cost.
I gather it's very difficult and expensive to make a board that supports more channels of RAM, so that seems worth targeting at the platform level. Eight channel RAM using common RAM DIMMs would transform PCs for many tasks, however for now gamers are a main force and they don't really care about memory speed.
The secret weapon to this approach is asking for 2-4 solutions to your prompt running in parallel. This helps avoid the most time consuming aspect of ai-coding: reviewing a large commit, and ultimately finding the approach to the ai took is hopeless or requires major revision.
By generating multiple solutions, you can cutdown investing fully into the first solution and use clever ways to select from all the 2-4 candidate solutions and usually apply a small tweak at the end. Anyone else doing something like this?
[0]: https://github.com/sutt/agro