Gave Claude Code, Gemini CLI, and Codex CLI identical instructions: analyze 13 years of writing across three blogs (2 of them are in my regional language which is non english), create a style guide.
Observations:
1. Model-task matching matters. Codex's default code-specialized model struggled with writing analysis. Switching to GPT-5 improved output quality 4x.
2. Autonomy settings affect completion. Gemini with limited autonomy produced incomplete work—it kept pausing for approvals mid-task.
3. All three claimed "done." Output varied from 198 to 2,555 lines. Never trust completion claims without verification.
4. Deep reading beat clever shortcuts. Codex took an API-first approach (RSS, JSON endpoints). Valid methodology, but missed nuances that Claude caught by reading posts directly.
Claude won at 9.5/10, but the more interesting finding was how much configuration affected the other two agents' scores.
Full analysis with methodology in the post linked.
Totally agreed, tried agents for a lot of stuff (I started creating a team of agents, architect, frontend coder, backend coder and QA). Spent around 50 USD on a failed project, context contaminated and the project eventually had to be re-written.
Then I moved some parts in rules, some parts in slash commands and then I got much better results.
The subagents are like a freelance contractors (I know, I have been one very recently) Good when they need little handoff (Not possible in realtime), little overseeing and their results are a good advice not an action. They don't know what you are doing, they don't care what you do with the info they produce. They just do the work for you while you do something else, or wait for them to produce independent results. They come and go with little knowledge of existing functionalities, but good on their own.
Here are 3 agents I still keep and one I am working on.
1: Scaffolding: Now I create (and sometimes destroy) a lot of new projects. I use a scaffolding agents when I am trying something new. They start with fresh one line instruction to what to scaffold (e.g. a New docker container with Hono and Postgres connection, or a new cloudflare worker which will connect to R2, D1 and AI Gateway, or a AWS Serverless API Gateway with SQS that does this that and that), where to deploy. At the end of the day they setup the project with structure, create a Github Repo and commit it for me. I will take it forward from them
2: Triage: When I face some issues which is not obvious from reading code alone, I give them the place, some logs and the agent will use whatever available (including the DB Data) to make a best guess of why this issue happens. I often found out they work best when they are not biased by recent work
3: Pre-Release Check QA: Now this QA will test the entire system (Essentially calling all integration and end-to-end test suite to make sure this product doesn't break anything existing. Now I am adding a functionality to let them see original business requirement and see if the code satisfies it or not. I want this agent to be my advisor to help me decide if something goes to release pipeline or not.
4: Web search (Experimental) Sometimes, some search are too costly for existing token, and we only need the end result, not what they search and those 10 pages it found out...
I am working on a PromptLibrary (https://promptlib.prashamhtrivedi.in/) to organise my prompts and make it accessible from multiple clients (Including chatbots (via chrome extensions), CLIs, IDE extensions amongst the few).
I wanted a library to store my own prompts once and retrieve it in multiple locations (i.e. Try something on claude desktop and then once I wrinkle out the edges, load it in Roo code or claude code and use it.) Give some variables to the prompt and creating infinite versions of same prompt by providing the value. Or having the versions of each prompt.
Currently I have the landing page, soon (In max 10 days) I will make it live for everyone to use.
I've worked on SourceSailor, a CLI tool that tries to tackle this exact problem, though I should note it's still in early stages. While it can't yet fully map complex codebases like the Linux kernel (that's a significant challenge), it does provide some useful capabilities for understanding smaller to medium-sized codebases.
SourceSailor generates a structural understanding of your codebase and creates reports about dependencies and project architecture. It leverages LLMs (OpenAI, Anthropic, or Gemini) for analysis and allows you to ignore files you don't want to analyze (following how .gitignore is used and parsed) to focus on relevant parts of the codebase.
However, I should be clear about its limitations:
- It's not yet as interactive as Cursor or Aider, and I am not planning to make it like that
- Large codebases (like Linux) would be challenging due to token limits of current LLMs. Though gemini may help, but we all know it's privacy policy shenanigans.
- The analysis is more high-level rather than detailed implementation specifics. Though it helps you to understand the codebase, and it tries to explain interesting parts, but ymmv...
If you're specifically looking to understand massive codebases like Linux, SourceSailor probably isn't the straightforward yet, and there will be workarounds. But if you're working with smaller to medium projects and need help understanding their structure and dependencies, it might be worth trying.
The project is open source if you want to check it out or contribute: https://github.com/PrashamTrivedi/SourceSailor-CLI
Tried cursor twice, gone back to VSCode. As recently as last month.
I had my workflow around VSCode, I am using `code` cli, devcontainer and many plugins. First time I tried, they didn't have devcontainer support (iirc last year). This time the devcontainer is there but the CLI support is nowhere to be found. And I am sticking to VSCode as I have some QoL customization in VSCode and around VSCode. Right now I am experimenting with programmamble workspace (Like adding and removing some directories in the whim of command).
I have some shortcuts, and some scripts which directly calls `code` with some arguments, all in WSL. I will go back to trying cursor if I can try `cursor` with `code`. Till then, for AI Assisted coding, I am happy with Aider...
I've built SourceSailor A Node Based CLI [1], an open-source CLI [2] that leverages LLMs (OpenAI, Anthropic, and Google Gemini) to analyze and document codebases. Key features:
- Quick project structure and dependency analysis
- Smart file exclusion (respects .gitignore)
- Tailored analysis based on user expertise
- README generation from codebase insights
Recent updates include enhanced CLI aesthetics and multi-LLM provider support. Docs and additional tests are in progress.
Curious to hear your thoughts or feature suggestions! (And if you visit, I'd love to know if you came from this HN thread )
I've spent time on both sides of the API divide – consuming as a mobile dev and producing as a backend dev. This experience crystallized a key principle: share specifications, not execution details. I've written about how this plays out in the real world, especially when juggling tools like Swagger and Postman.
The post covers
- The inherent tension between Swagger's spec-focused approach and Postman's execution flexibility
- Real-world consequences of over-sharing execution details in API documentation
- How our team adapted our API development process to better adhere to this principle
- Thoughts on what an ideal API documentation tool might look like, combining the strengths of both approaches
I'm curious to hear from the HN community: How do you balance sharing API specs without getting bogged down in execution details? Have you found effective ways to use Swagger and Postman (or other tools) together?
I am not python developer. And I neither intend my career to go there in near future. But I asked ChatGPT what's wrong about this code, (Not sure it's my custom instruction or not) but it always starts assuming the imports are the issue.
Once I asked the imports are not the issue, It correctly pointed out, and explained the problemetic code at me...
I whish they could ask another question to LLM and have an issue pointed out..
An ex android developer here who spent 7 years in Android and moved to cloud 5 years ago and don't missing anything about Android.
The innovative period of Android Development where you can do anything is gone forever because of greedy corporations who can get away with any abuse in the system. And many of those "new" APIs are not being implemented, either not encouraged by business or the teams not aware about those features. Since the launch, I have developed and seen development of apps implementing shortcuts are exact 2, I have seen 12-14 apps being developed and launched with more than 7 screens and 8 defined workflows in that period and they are yet to develop a widget, a shortcut and they spent 2-3 versions in prod (with heavy marketing) without dark themes. This story is more or less same with every user facing features. Thus making more then half of the apps are just websites written in Kotlin instead of any JS frameworks. Atleast websites don't fight with Play Store and the restrictive policies.
Plus the confusing era of Google isn't helping anyone. I haven't seen excitement around any product and ecosystem launched since 2018 except flutter and Firebase. Assistant, Instant Apps there are many "launches" which are almost dead on arrival. For long Google is like a startup who sees what sticks on the wall, and wind down the others, except the OTHERs have no ecosystem and winding down affects hundreds of developers and thousands of users. Which is affecting android too.
"I haven't seen excitement around any product and ecosystem launched since 2018 except flutter and Firebase."
Hard disagree, lol. Android Jetpack was launched in 2018. It was basically the second generation of Android development, it was a proper "tech stack" instead of everyone cobbling together third party libraries. Architectures switched over to the one true MVVM, instead of being this mess of MVC, MVP, etc.
Jetpack Compose is pretty amazing, it cut down our LOC by about half because we don't need adapters for every list. Kotlin coroutines & flow is nice. You get reactive programming, no more complex nested if/else conditions. We're not using Fragments or Activities much now either, navigation is via Compose, meaning DI is less necessary. So builds can be much faster too with a 2023 stack, and you can get UI changes rendered in the emulator immediately without even needing to update the build.
- Perplexity ai. I tried to work with Bing but this isn't a tool to rely on, it straight out refuses to answer a question or it removes the answer midway. I discovered Perplexity AI, purchased pro (ChatGPT Pro doesn't accept any of my credit cards here), and it's also a part of my workflow now.
Observations:
1. Model-task matching matters. Codex's default code-specialized model struggled with writing analysis. Switching to GPT-5 improved output quality 4x.
2. Autonomy settings affect completion. Gemini with limited autonomy produced incomplete work—it kept pausing for approvals mid-task.
3. All three claimed "done." Output varied from 198 to 2,555 lines. Never trust completion claims without verification.
4. Deep reading beat clever shortcuts. Codex took an API-first approach (RSS, JSON endpoints). Valid methodology, but missed nuances that Claude caught by reading posts directly.
Claude won at 9.5/10, but the more interesting finding was how much configuration affected the other two agents' scores.
Full analysis with methodology in the post linked.