> when dealing with a lot of unknowns it's better to allow divergence and exploration
I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.
For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).
I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.
The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.
Reading the "Why another CRDT / OT library?" I like that you seem to have taken a "Pareto approach": going for a simpler solution, even if not theoretically perfect. In the past few months I've been building a local-first app, and I've been a bit overwhelmed by the complexity of CRDTs.
The goal that I have with my app is to allow syncing between devices via Dropbox / Google Drive / iCloud or any another file-syncing server that the user is already using. I don't want to host a sync server for my users, and I don't want my users to need to self-host either.
Do you think it would be possible to use Dropbox as the sync "transport" for DocNode documents? I'm thinking: since a server is needed, one device could be designated as the server, and the others as clients. (Assuming a trusted environment with no rogue clients.)
1. Do you care about resolving concurrent conflicts? That is, if two users modify the same document simultaneously (or while one is offline), is it acceptable if one of their changes is lost? If that’s not a problem, then using Dropbox or something similar is totally fine, and you don't need a CRDT or OT like DocNode at all. Technologies like Dropbox aren't designed for conflict resolution and can't be integrated with CRDT or OT protocols.
2. If you do want to resolve conflicts, you have two options.
(a) Use a CRDT, which doesn’t require a server. One downside is that clients must be connected simultaneously to synchronize changes. Personally, I don’t think most people want software that behaves like that, and that’s one of the reasons I didn’t focus on building a CRDT. If you’re going to end up needing a server anyway, what’s the point?
(b) Use a server, either hosted by you or by your users. The good news is that it’s extremely simple. With DocNode Sync, you can literally set it up with one line of code on the server.
That doesn't apply if you're using a CRDT with "a server as an always-present client". But in that scenario, DocNode will be more efficient.
I think I do want to solve conflicts. My use case is for a personal database, which simplifies things a bit: sync is between devices of a single person, so it's unlikely for concurrent offline changes to occur.
What I have in mind is a setup like the one from this experiment: https://tonsky.me/blog/crdt-filesync/ . I don't know if it's at all possible in my use case though, or–in case it is possible–if it ends up being practical. As you said, the resulting user experience might be so strange that it's not something users want.
Anyway, thanks again for the info and good luck with DocNode. :)
A "bring your own cloud" provider could be used for DocNode Sync. For example, Dropbox.
Dropbox doesn't have the ability to resolve conflicts, but it can be used to deterministically store the order of operations. Then, clients would reconcile and combine the operations into a single state file.
Authentication might be a bit tricky, since permissions would have to reside elsewhere. But I think it's doable.
I see MCP as fundamentally limited: even if we had an LLM that knew how to use it perfectly, at the end of the day MCP workflows are integrations between many different APIs that were not designed to be composed and to work together.
What if $TOOL_X needs $DATA to be called, but $TOOL_Y only returns $DATA_SUBSET? What happens when $TOOL_Z fails mid-workflow, after $TOOL_W has already executed?
> What if $TOOL_X needs $DATA to be called, but $TOOL_Y only returns $DATA_SUBSET? What happens when $TOOL_Z fails mid-workflow, after $TOOL_W has already executed?
Aren’t these situations current models are quite good at?
No but mainly because passing actual data through the context is janky and expensive. So it's only useful as a one of at most.
This whole idea of doing unsupervised and unstructured work with unstructured data at scale with some sort of army of agents or something sounds ridiculous to me anyway. No amount of MCP or prompting or whatever is going to solve it.
Like if interesting problems are on the boundry of obvious and chaotic this is just like some janky thing that's way too far into the chaotic regime. You won't go anywhere unless you magically solve the value function problem here.
If TOOL_X needs $DATA and that data is not available in context, nor from other tools, then the LLM will determine that it cannot use or invoke TOOL_X. It won't try.
About the TOOL_Z and TOOL_W scenario. It sounds like you're asking about the concept of a distributed unit-of-work which is not considered by MCP.
> If TOOL_X needs $DATA and that data is not available in context, nor from other tools, then the LLM will determine that it cannot use or invoke TOOL_X. It won't try.
I didn't explain myself very well, sorry. What I had in mind is: MCP is about putting together workflows using tools from different, independent sources. But since the various tools are not designed to be composed, scenarios occur in which in theory you could string together $TOOL_Y and $TOOL_X, but $TOOL_Y only exposes $DATA_SUBSET (because it doesn't know about $TOOL_X), while $TOOL_X needs $DATA. So the capability would be there if only the tools were designed to be composed.
Of course, that's also the very strength of MCP: it allows you to compose independent tools that were not designed to be composed. So it's a powerful approach, but inherently limited.
> About the TOOL_Z and TOOL_W scenario. It sounds like you're asking about the concept of a distributed unit-of-work which is not considered by MCP.
Yes, distributed transactions / sagas / etc. Which are basically impossible to do with "random" APIs not designed for them.
I've done something similar in a React project I'm working on to avoid dealing with the insanity that is react-router.
Call me naïve, but routing in a single page application is just not that hard of a problem. At the core it's about having a piece of state¹ (your active route) which determines which part of the app you want to render–something you can do with a switch statement². On top of that, you want to synchronize that state to the page URL³.
Doing it yourself requires more boilerplate code, no question about it. But it's not that much code tbh (not very complex either), and you get back control over that important piece of state, which otherwise remains opaque and difficult to work with–i.e., its "shape" is pre-determined by the routing library you use. For example, react-router doesn't support parallel routes.
I also basically re-invent a tiny 50 line router for every web project. Hardly ever go beyond "Map URL -> Page" and "Map URL + Parameters -> Page with properties", and when you do, knowing 100% how it works helps a lot.
I also agree it isn't a hard problem, but personally I'd say you got the flow the wrong way around. You don't want to "synchronize state to the page URL" but rather treat the page URL as something you create state from, so it works both when you navigate there by pressing anchor tags, or the user manually enters the URL, and it gets a bit easier to manage.
Basically, URL is the top-level state, and from there you derive what page, then render that page, rather than the other way around.
Yeah, implementing it with data flowing one-way only from the URL to the state is cleaner.
Conceptually, however, I prefer to think of my state being at the center of things. I mean, that's where I define (via types) what the state is. The URL is just one serialization of that state that is convenient to use in a web browser (making it work with links, back/forth buttons, etc). Maybe in another environment another serialization would be needed. Or maybe no serialization could be needed at all (making it a memory router).
If you are just working on a small website and you don't care about SEO or supporting more than, I dunno, a thousand users, then sure. But if you ever expect to do more than that, I might call you naive, yeah. One big thing: what if you want to support SSR, which I think is a pretty basic requirement these days? I'd submit that supporting SSR will take you a bit more than 50 lines.
That's why I emphasized _routing in a single page application_. If one needs SEO, a client-rendered single page application is the wrong choice, regardless of the router.
> One big thing: what if you want to support SSR, which I think is a pretty basic requirement these days?
I agree it's a basic requirement for a certain class of apps and websites, but there are tons of apps for which SSR is not relevant and even detrimental (in the sense that it adds complexity that is not offset by the benefits it brings).
- It exposes the "database metaphor": your data is organized in collections of documents, each collection having a well-defined schema.
- It's all local in an app (no server component to self-host).
- It has an AI assistant on top that you can use to explore / create / update.
- It allows you to create small personal apps (e.g., a custom dashboard).
- It allows you to sync data from external sources (Strava, Google Calendar, Google Contacts.)
Cons:
- The database metaphor is quite "technical". A "normal" user is not comfortable with the idea of creating their own collections, defining a schema, etc. In fact, right now I only have developers and techies as a target audience.
- It's not optimized for any one use case. So, for example, as a notes-keeper Notion is obviously much better.
- It's still in early stages (I'm working on it alone), so:
- There's no mobile app yet.
- It doesn't yet support syncing between devices.
- There are just 3 connectors to sync from external sources.
I agree when talking about Facebook and other ad-tech companies. But "my data" is also my runs that I track with Garmin, my notes in Notion, my meals on MyFitnessPal, my events on Google Calendar...
I want to have _that_ data locally. And why are all these companies making it so incredibly difficult for me to get it? It's MY data after all!
But the default approach of every app is:
> We'll manage your data for you, in our cloud! Ah, btw, you'll only have access to it when online, and only for as long as you pay the subscription fee. Also, if we go out of business, sorry, it's gone.
> <fineprint> You _can_ request a copy of it (damn GDPR), but only once every 30 days, it'll take us 48 hours to prepare it, and we'll send (some of) it to you as a badly-formatted CSV carefully crafted to make it as useless as possible. </fineprint>
> > We'll manage your data for you, in our cloud! Ah, btw, you'll only have access to it when online, and only for as long as you pay the subscription fee. Also, if we go out of business, sorry, it's gone.
gone for you. advertisers and "partners" still can have it. storage is cheap
> Imagine a world where your data isn’t trapped in distant data centers. Instead, it’s close to home—in a secure data wallet or pod, under your control. Now imagine pairing that with a loyal personal AI assistant, a private, local tool that lives with you, learns from you (with your permission), and acts on your behalf. Your AI. Not theirs.
This is almost exactly what I say on the landing page¹ of the product I'm building (an open-source personal database, with an AI assistant on top).
I want to believe this can be a reality, and I'm trying to make it become one, but there are two significant challenges:
1. AI = cloud. Taking my app as an example, it'll be at least 2 years before consumer hardware will be able to run the smallest model that performs somewhat decently (gpt-oss-20b). And of course in 2 years that model will be beyond obsolete. Would a regular user pay the price of a subpar experience in order to get data ownership and privacy? It's a very hard sell.
2. Apps/services are very jealous of their users' data. As a user I have to jump through incredible hoops just to get a point-in-time copy of my data. If I can get that at all. There is no incentive for apps to allow their users to own their data. On the contrary, it's better if they don't, so they remain locked in the app. Also, regular Joe and Jane users are not really asking to have access to their data, because there's no benefit for them either.
That is, I think, the key to overcome challenge #2: giving regular Joes and Janes an immediate and obvious benefit. If they see that only by owning their data they can do $INCREDIBLY_VALUABLE_THING, then they will themselves start demanding companies access to it, or they will jump through the hoops to get it. (That's the way I'm going about it. I'm nowhere near the end goal, of course, but I see promising results.²)
I have no idea how to overcome challenge #1 yet. Mainly because currently there aren't really any big downsides to using cloud models. Or, at least, we haven't seen them yet. Maybe if OpenAI starts injecting ads in GPT-8 responses, people will reconsider using a "stupider" but local, ad-free model.
(Sorry for the shameless self-promotion.) I'm building an app _conceptually similar_, but with an AI on top, so you get a chat/assistant with your personal context. https://github.com/superegodev/superego (Warning: still in alpha.)
Really nice. Thanks for self promo! Will definitely keep an eye on your project.
What is the ideal final state you want to achieve?
Do you agree that data capture is the main issue here?
My latest experiments:
2 days ago I started to capture screenshots of my mac every 5 seconds and later use it to approximately tell me what I'm doing. - a lot of issues with this approach.
Yesterday I setup ActivityWatch. It captures a lot of stuff from the get go. TBD if it can capture background YouTube video playing in addition to the active tab.
The main value I want to extract is to be able to see what was the plan vs what actually happened. And if I can make what actually happens closer to the plan - this is WIN.
But capturing what you are thinking of, what you are working on etc. - quite challenging and often happens offline, on the phone, another computer, messenger, email etc.
My quick take: companies were closing their data for decades and now it will bite them in their back - AI needs as full of a context as possible across devices, services and programs, to make AI as powerful as it can be and current architecture works against it. May be one day AI is so powerful that it will ETL all this data for itself, but for now it is painful to try to build something like this.
Hey! Your project looks very similar, from a conceptual point of view, to what I'm doing with https://github.com/superegodev/superego. Would you like to have a chat about it? Email in my profile, if you're interested.
An open-source, local database which collects all your personal data, hooks it to an LLM (BYO), and gives you an assistant that can answer any question about your life.
It also allows you to vibe-code (or just code) small apps on top of your data (e.g., your custom dashboard for your expenses).
I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.
For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).
¹ https://fly.io/blog/everyone-write-an-agent/