Yes, team size really changes the dynamics. We ran into similar issues when reviews started taking over entire days. I’m building and testing a quieter review approach that only goes deeper when asked, and I’m trying to understand where that tradeoff works best. would you like to try it out? your feedback will help me to improve it further.
That makes sense. A lot of these tools seem to become useful once the extra noise is turned down. I’m actually building and testing a quieter review approach myself and trying to understand what really helps in day-to-day use. would you like to give it a try?
I’ve run into the same wall. The issue is not that AI “can’t code well”, it’s that it’s bad at large, aesthetic refactors unless you constrain the problem very tightly.
A few things that have helped me get closer to that 15–20 minute review window:
1. Stop asking it to translate a whole route
AI struggles when it has to infer intent, architecture, and taste at once. Instead, I break things down explicitly:
one prompt for semantic HTML structure only
one prompt for component boundaries
one prompt for state and data flow
I often tell it “do not worry about styling or completeness, just structure”.
2. Give it examples of what “idiomatic” means to you
Before asking it to generate anything, I paste a small, finished Svelte component from my codebase and say “match this level of abstraction and composition”. Without this, it will default to generic Svelte patterns.
3. Ask for constraints, not solutions
Prompts like:
“Do not use boolean flags for view switching. Use composable components.”
work better than:
“Refactor this to be idiomatic Svelte.”
The more you forbid, the better the output.
4. Use AI as a first-pass reviewer, not an author
I’ve had more success writing the initial version myself, then asking AI to:
point out anti-patterns
suggest component splits
identify places where logic can move to the server
This tends to align better with how humans actually refactor.
5. Accept that taste is still human
Things like semantic HTML, component boundaries, and long-term maintainability are closer to “design” than “coding”. Current models can assist, but they won’t replace your judgment yet.
One small thing that’s helped me personally is using quieter, review-focused AI tools instead of chat-style copilots. I’ve been building and using a GitHub app called MergeMonkey that gives short, structured PR feedback and only goes deeper when you ask. It’s been useful as a second set of eyes once the code exists, rather than trying to generate everything from scratch.
I’ve been working on a small GitHub app called MergeMonkey.
It’s an AI PR reviewer, but intentionally quiet. It runs on new commits, gives a short structured summary (and diagrams for complex changes), and only goes deeper when explicitly triggered via PR comments.
The main idea is to avoid noisy reviews and black-box behavior. You bring your own model via OpenRouter, so you can choose the trade-offs yourself.
It’s still an MVP and I’m mostly looking for feedback from people who review PRs regularly.