Hacker News new | past | comments | ask | show | jobs | submit | jwpapi's comments login

It’s not just about saving money or making less mistakes its also about iteration speed. I can’t believe this process is remotely comparable to aider.

In aider everything is loaded in memory I can add drop files in terminal, discuss in terminal, switch models, every change is a commit, run terminal commands with ! at the start.

Full codebase is more expensive and slower than relevant files. I understand when you don’t worry about the cost, but at reasonable size pasting full codebase can’t be really a thing.


I am at my 5th project in this workflow and these are of different types too:

- an embedded project for esp32 (100k tokens)

- visual inertial odometry algorithm (200k+ tokens)

- a web app (60k tokens)

- the tool itself mentioned above (~30k tokens)

it has worked well enough for me. Other methods have not.


Use a tool like repomix (npm), which has extensions in some editors (at least VSCode) that can quickly bundle source files into a machine readable format


I have read 30 MCP articles now and I still don’t understand why we not just use API?


MCP allows LLM clients you don’t control—like Claude, ChatGPT, Cursor, or VSCode—to interact with your API. Without it, you’d need to build your own custom client using the LLM API, which is far more expensive than just using existing clients like ChatGPT or Claude with a $20 subscription and teaching them how to use your tools.

I built an MCP server that connects to my FM hardware synthesizer via USB and handles sound design for me: https://github.com/zerubeus/elektron-mcp.


But couldn't you just tell the LLM client your API key and the url of the API documentation? Then it could interact with the API itself, no?


Not all clients support that—currently limited to ChatGPT custom GPT actions. It’s not a standard. Fortunately, Anthropic, Google, and OpenAI have agreed to adopt MCP as a shared protocol to enable models to use tools. This protocol mainly exists to simplify things for those building LLM-powered clients like Claude, ChatGPT, Cursor, etc. If you want an LLM (through API calls) to interact with an your API, you can’t just hand it an API key and expect it to work—you need to build an Agent for that.


In some sense that is actually what MCP is. A way to document APIs and describe how to call them, along with some standardized tooling to expose that documentation and make the calls. MCP hit a sweet spot of just enough abstraction to wrap APIs without complicating things. Of course, since they didn't add a bunch of extra stuff ... that leads allowing users to footgun themselves per the article.


You could do that. But then you need to explain to the LLM how to do the work every time you want to use that tool.

And you also run into the risk that the LLM will randomly fail to use the tool "correctly" every time you want to invoke it. (Either because you forgot to add some information or because the API is a bit non-standard.)

All of this extra explaining and duplication is also going to waste tokens in the context and cost you extra money and time since you need to start over every time.

MCP just wraps all of this into a bundle to make it more efficient for the LLM to use. (It also makes it easier to share these tools with other people.)

Or if you prefer it. Consider that the first time you use a new API you can give these instructions to the LLM and have it use your API. Then you tell it "make me an MCP implementation of this" and then you can reuse it easily in the future.


> You could do that. But then you need to explain to the LLM how to do the work every time you want to use that tool

This reeks of a fundamental misunderstanding of computers and LLMs. We have a way to get a description of APIs over http, it's called an open API spec. Just like how MCP retrieves it's tool specs over MCP.

Why would an llm not be able to download an openai spec + key and put it into the context like MCP does with its custom schema?


> Why would an llm not be able to download an openai spec + key and put it into the context like MCP does with its custom schema?

NIH syndrome, probably.


Yes, but then you have to add that yourself to every prompt. It would be nice to tell your LLM provider just once "here is a tool you can use" along with a description of the API documentation so that you could use it in a bunch of different chat's without having to remind it every single time. That way, when you want to use the tool you can just ask for the tool without having to provide that detail again and again.

Also, it would be kind of cool if you could tell a desktop LLM client how it could connect to a program running on your machine. It is a similar kind of thing to want to do, but you have to do a different kind of processes exec depending on what OS you are running on. But maybe you just want it to ultimately run a Python script or something like that.

MCP addresses those two problems.


Yes but it's not as revolutionary as MCP. You don't get it...


elektron user here. wow thank you :)


ChatGPT still doesn't support MCP. It really fell behind Google or Anthropic in the last months in most categories. Gemini pro blows o1 pro away.


> why we not just use API

Did you meant to write "a HTTP API"?

I asked myself this question before playing with it a bit. And now I have a slightly better understanding, I think the main reason was created as a way to give access of your local resources (files, envvars, network access...) to your LLM. So it was designed to be something you run locally and the LLM has access.

But there is nothing preventing you making an HTTP call from a MCP server. In fact, we already have some proxy servers for this exact use-case[0][1].

[0] - https://github.com/sparfenyuk/mcp-proxy

[1] - https://github.com/adamwattis/mcp-proxy-server


I'm not sure I get it too. I get the idea of a standard api to connect one or more external resources providers to an llm (each exposing tools + state). Then I need one single standard client-side connector to allow the llm to talk to those external resources- basically something to take care of the network calls or other forms of i/o in my local (llm-side) environment. Is that it?


Sounds mostly correct. The standard LLM tool call 'shape' matches the MCP tool call 'shape' very closely. It's really just a simple standard to support connecting a tool to an agent (and by extension an LLM).

There are other aspects, like Resources, Prompts, Roots, and Sampling. These are all relevant to that LLM<->Agent<->Tools/Data integration.

As with all things AI right now, this is a solution to a current problem in a fast moving problem space.


I have an API, but I built an MCP around my API that makes it easier for something like Claude to use — normally something that's quite tough to do (giving special tools to Claude).


Because you need mainly a bridge between the Function calling schema defined that you expose to the AI model so you can leverage them. The model need a gateway as API can't be used directly.

MCP core power is the TOOLS and tools need to translate to function calls and that's mainly what MCP do under the hood. Your tool can be an API, but you need this translation layer function call ==> Tool and MCP sits in the middle

https://platform.openai.com/docs/guides/function-calling


I played around with MCP this weekend and I agree. I just want to get a users X and then send X to my endpoint so I can do something with it. I don't need any higher level abstraction than that.


If you are a tool provider, you need a standard protocol for the AI agent frontends to be able to connect to your tool.


I think the commenter is asking "why can't that standard protocol be http and open api?"


MCP is a meta-api and it basically is that, but with further qualifications that the endpoints and how they work themselves are part of the spec so LLMs can work with them better.


I think it's fine if you only need a standalone API or know exactly which APIs to call. But when users ask questions or you're unsure which APIs to use, MCP can solve this issue—and it can process requests based on your previous messages.


It is an API. You can implement all this stuff from scratch with raw requests library in python if you want. Its the idea of a standard around information interchange, specifically geared around agentic experiences like claude code, (tools like aider, previously, that are much worse) - its like a FastAPI web app framework around building stuff that can helps LLMs and VLMs and model wrapped software in ways that can speak over the network.

Basically like Rails-for-Skynet

I'm building this: https://github.com/arthurcolle/fortitude


Right. And if you use OpenAPI the agent can get the api spec context it needs from /openapi.json.


I feel like this is being pushed to get more of the system controlled by the provider's side. After a few years, Anthropic, google, etc might start turning off the api. Similar to how Google made it very difficult to use IMAP / SMTP with Gmail


It is an API. It's an API standardization for LLMs to interact with outside tools.


I just want to mention something in a chat in 5 seconds instead of preparing the input data, sending it to the API, parsing the output for the answer, and then doing it all again for every subsequent message.


Here's the kicker. It is an API.



Personal anecdote getting an airthings that reminds me to open the window (along with a pushover api setup) has probably been the biggest productivity improvement in 10 years.


It's kinda worrying you need an app / appliance to remember to open the window. Rule of thumb is to have the windows open at least 10 minutes a day, every day.


Sometimes I’m just too much in the zone working and I forget, leading to 10 mins more in the zone, but less overall focused work through the day.


You realize a lot of us live in places where that would be very expensive? Not to mention that the HVAC plants are usually not sized with that in mind.


Why is opening the window expensive? Climate control?


Yep, it's cold outside. So there's a CO2, energy, comfort (choose two) tradeoff.


Did we run out of textual tasks that are easy for humans but hard for AI, or why are the examples all graphics?


You can easily convert these tasks to token strings. The reason why ARC does not use language as part of its format is that it seeks to minimize the amount of prior knowledge needed to approach the tasks, so as to focus on fluid intelligence as opposed to acquired knowledge.

All ARC tasks are built entirely on top of "Core Knowledge" priors, the kind of elementary knowledge that a small child has already mastered and that is possessed universally by all humans.


Can you explain to me? Would the token strings be as easy to solve for humans as well?

Or let me ask differently. Can we still design text questions that are easy for humans and tough for AI?


ARC tasks are language-independent


Those that have tested it and liked it. I feel very confident with Sonnet 3.7 right now,if I would wish for something its it to be faster. Most of the problems I’m facing are like execution problems I just want AI to do it faster than me coding everything on my own.

To me it seems like o1-pro would be to be used as a switch-in tool or to double-check your codebase, than a constant coding assistant? (Even with lower price), as I assume I would need to get done a tremendous amount of work including domain knowledge done to come up for the 10x more speed (estimated) of Sonnet?


o1-pro can be very useful but it's ridiculously slow. If you find yourself wishing Sonnet 3.7 was faster, you really won't like o1-pro.

I pay for it and will probably keep doing so, but I find that I use it only as a last resort.


I have to agree this is a bit too simple for being anything of substance. That is not what really agentic means. This is basically implementing ChatGPT into Zapier.

When you work with agentic LLMs you should worry about prompt chaining, parallel execution, deciding points, loops and more of these complex decisions.

People who didn’t know what’s in first article shouldn’t use Pocketflow and go with N8N or even Zapier.


I do agree what you said except the first sentence. The design of the Graph is super important. Pocketflow is for those with technical background.


ugh I like it but I had to close the tab right away


Yes the confidence tbh is getting a bit out of hand. I see the same thing with coding with our SAAS, once the problems get bigger I find myself more often than not starting to code the old way, even over "fixing ai’s code", because the issues are often too much.

Ithink more certainty communication could help. Especially when they talk about docs or 3rd party packages etc. Regularly even Sonnet 3-7 just invents stuff...


Thanks hn community for being so smart


I can't tell if that's sincere or one of the sickest burns on this site.


I remember when I was in school and our religion teacher told us that 200 years ago there were still people that knew almost everything that was known to humanity back then. This wasn’t possible anymore.

However now I feel like it’s possible again.

https://www.phind.com/search/cm73yk4wc0002336u6brx4jp1

It helps greatly to learn new topics.


> It helps greatly to learn new topics.

I guess when you want to learn 33% of something, but be told you learned all of it.

The link used to say there are 2 quarks, though it is now simply a display of poor security management.


When I click on that link it gives me this meme image for some reason as the first image in the response. What did you see when you made the query?

https://www.citizen.org/wp-content/uploads/these-new-ai-frie...


There seems to be some mixup with cached images.

https://i.imgur.com/Bw9DlxW.png


I think those people 200 years ago mostly succeeded by having a very narrow definition of what was considered 'known' to 'humanity'.


for anyone arriving - phind has an issue where anyone can edit those links and they persist.

I'll tag whoever was claiming to be from phind up above.

I verfied this by copying the link in OP to my computer and opening it there, and it had my edits, not 1=2 edits, and assuredly not what jwpapi wanted to link.


While partly debated, the first part sounds like a prime candidate for Dunning Kruger effect in (historic) action


Studying science history basically my entire adult life im pretty sure you'd have to go back many thousands of years for this to be true


Such confidence ratings over unavailable/lost/missing data appear like a luxury to where I'm from (science wise). Does historical analysis come with heuristic privileges over other fields in that regard?


There was certainly a time when discrete knowledge didn't collectively aggregate over generations.

But even in prehistoric times there were things like boat making, midwifery, metallurgy, sea navigating, animal husbandry, farming and seed cultivation, tool and weapon making, navigational shortcuts, the wide variety of spoken languages, medicinal knowledge, knowledge of fresh water sources and hunting techniques ...

Squaring the number to 40,000 years isn't far back enough. 200,000 years ago, for instance, people weaved grass beddings out of on insect repellant plants, wore clothes, had shoes ... Seafaring is at least 130,000 years ago.

Specialization and collaboration is a core part of being human.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: