Syncing files between browser and disk using Yjs and the File System Access API

superlopuh · on May 9, 2022

I think CRDTs are super cool, and likely future of offline-first experiences. My main issue with Taskpaper documents in iCloud drive has been the awful conflict resolution experience, and would love an industry standard way of doing the merges automatically, in a syntax-aware way.

Funny you should post this article now, as I'm just getting ready to start on this work myself, would love recommendations about how to do this automatic conflict resolution on semi-structured text files if anyone has some!

mfester · on May 9, 2022

Yes, this is an interesting topic. In Motif, we are using MDX for the page content, which has an associated AST and a nice set of tools (Unified.js [1]) to manipulate it. We plan to use this to track semantic changes in the content, and act in an appropriate way. For instance, if the same block of JS code is changed, instead of merging, we can prompt the user with a diff and allow them to edit the final version manually (effectively transitioning from a synchronous to an asynchronous workflow). In simpler scenarios, such as text markup, we can use heuristics like the ones presented in Peritext [2].

[1] https://unifiedjs.com/ [2] https://www.inkandswitch.com/peritext/

superlopuh · on May 9, 2022

Definitely seems like an asynchronous workflow, kind of like with git, is the way. I wonder whether something like this for markdown could be doable, with a GUI being able to render not just the markdown in either of the conflicting versions, but with an editing suggestion, like the ones in Google Docs. Not sure whether this would benefit from an extension of the MDX syntax, or could be handled by the runtime diff of the ASTs.

lagrange77 · on May 10, 2022

Hey, Motif looks really interesting, but i'm a bit confused.

Could you please explain how Motif touches the concepts of 'SSG', 'CMS', 'IDE'?

mfester · on May 10, 2022

Sure! Motif is an MDX editor running in the browser, with a full-fledged JS build system inside (based on esbuild). It allows you to publish your content instantly (in fact, as a Next.js app, deployed on Vercel, benefitting from things like ISR to make your pages fast, SEO-ready, etc.). For instance, the entire Motif website is built on Motif, including the blog, as well as our docs: https://motif.land/docs.

lagrange77 · on May 11, 2022

Cool, thanks!

toomim · on May 10, 2022

> would love an industry standard way of doing the merges automatically

We've been working on a standard for merging conflicts over in the https://braid.org group! This will let different tools (like filesystems and web apps) interoperate with p2p synchronized state.

We've made great progress already— JosephG's Diamond-Types algorithm can synchronize perfectly with Yjs, Automerge, and Sync9, just by swapping out a single function in the source code. Standardized sync is possible!

Come join us if you're interested! We have open video meetings every two weeks.

jitl · on May 9, 2022

I like the approach of using differential synchronization at the edges of a CRDT system. It seems like this approach can work well for plaintext data, or data easily reduced to a fixed set of plaintext fields.

For more advanced formats - like a tree structure document - I’m intimidated by the problem of computing a good semantic diff from the plaintext format, and how to apply that semantic diff to the CRDT format. Adding an `id` to every node in the tree is helpful for this purpose - but that makes it harder to write such documents by hand in plaintext.

Have you encountered any troubles with weird diffs that put documents into invalid states? For example, plaintext updates that somehow merge in a way that breaks the MDX or JS syntax? Do you have or foresee having a “resolve merge conflicts” flow?

mfester · on May 9, 2022

Indeed, there's always the risk of breaking a document, since we are dealing with raw edits to a Markdown/MDX/code file. This is very well described by the Ink & Switch team working on Peritext (https://www.inkandswitch.com/peritext), when it comes to WYSIWYG editor operations (bold, links, etc.). This is a more general issue of making semantically wise decisions in your merge strategy, and it's exacerbated in real-time collaborative settings (we sort of mitigate that in an asynchronous, Git-like workflow by reviewing commits / ensuring they compile before they are merged). We haven't dealt with this yet, but this will be really interesting, especially when it comes to merging code blocks.

robmccoll · on May 9, 2022

This is a fantastic direction. It's interesting to me that you would rather revision control be implemented outside of this replication topology in something like git and that the interface between tools is strictly files rather than the CRDT events themselves. Integrating other applications in the CRDT event stream and maintaining the stream as the revision history would seem more efficient, less error-prone, and more open. Is it an eventual goal to expose something like this?

mfester · on May 9, 2022

The starting point for this work is really: how can we take what exists (text files) and make it work in a CRDT setting, without tainting the data in any ways. For instance, if you decide to host your data on Git, we want the repo to look exactly the same as if it was a regular code project. We don't look to add another standard on top of existing ones – for instance, we are not aiming to enable "CRDT version history" in the file system. This could of course constitute future work (something similar to a .git folder – CRDTs can be stored efficiently). But our main goal here is to show that browser apps have a way to free their data from the tool, something Excalidraw started doing [1], and which was a source of inspiration to us.

[1] https://blog.excalidraw.com/browser-fs-access/

ankrgyl · on May 9, 2022

This is mind blowing stuff. I was skeptical that you could use the browser effectively as an intermediary to the file system, and that CRDTs would work reasonably well on files, but they seem to have overcome both obstacles. I’m curious, how do you envision this approach working with version control systems? What would it mean to “explore” a branch of a git repository, for example? Would that overwrite the global version of the file system?

mfester · on May 9, 2022

We're in an exploratory phase on this. We have been experimenting with reading the .git folder to determine the current branch, and store this info alongside the CRDT in the client. Still early to say whether this is a fruitful approach or not.

gklitt · on May 9, 2022

I'm excited about this work. Traditionally it's been hard to bridge local files and cloud documents, meaning that cloud data tends to stay siloed in specific editing UIs. This seems like the right first step towards letting traditional file-based editors work with realtime-collaborative cloud data.

Ultimately, in the long term, I think the filesystem probably provides the wrong abstraction for this use case though. The API we really need is "make these changes", (w/ changes represented thoughtfully in a mergeable way) not "here's the new final state." For now, diffing filesystem states is a reasonable workaround.

mfester · on May 9, 2022

Yes, the goal was to make pragmatic choices in order to make the data freely available, here and now, to other apps (most of which operate on plain text files), despite running in the browser. I do have hope that the POSIX file system still has its place in a "change-aware" setup, for instance by adding a folder, similar to .git, alongside the "final state materialization". Do you see a reason for this not to work?

I'm really thinking in practical terms, i.e. of how we can make this happen incrementally, without forcing a new standard that everyone needs to adhere to in order for it to work.

XavierPladevall · on May 9, 2022

This looks pretty interesting! Really cool to see progress in this space!

Out of curiosity how do you deal with moving files in the file system?

slaymaker1907 · on May 9, 2022

Not the poster, but I wrote a Tiddlywiki plugin that uses these APIs (https://slaymaker1907.github.io/tiddlywiki/plugin-library.ht...) and I detect multiple write scenarios by keeping track of the file hash (this obviously only works for files which can be read into memory). Assuming you have a folder and not just a single file handle, you could scan the folder periodically and compare the known hashes to the ones on disk to handle moving files. Things would get much more complicated if you want to support both moving and modifying files and you'd need a system like Git for detecting renames.

karencarits · on May 9, 2022

Very nice, thank you for making this! It is so satisfying to see new technologies (eg filesystem API) actually solve old problems (eg tiddlywiki saving)

mfester · on May 9, 2022

That's still a shortcoming in the current setup. The API does not yet allow us to "track" a file as it is moved around on disk. So the best thing we can do is upload it from its new destination, keeping the original version where it is (we want to avoid guesswork and ensure no data is ever lost). We can likely optimize this. Would love to see the API evolve so that we can keep persistent file handles even after moving and renaming (including parent folders).

thedg · on May 9, 2022

+1 this looks so cool

oulipo · on May 9, 2022

Great write-up! I was wondering, do you run into any issues with corrupted formatting, like what is described in Peritext?

https://www.inkandswitch.com/peritext/

mfester · on May 9, 2022

No, we currently don't run into the issues that Peritext are addressing, simply because we are dealing with plain text. These issues will come up when we start working on WYSIWYG editing, but if we keep plain text as the underlying data format, it should not affect the setup described here.

alexarena · on May 10, 2022

I've been using this for a bit and it's incredibly cool to write plain markdown files VS Code and see them reflected immediately online. Well done + exciting step forward in Bring Your Own Client.

mfester · on May 10, 2022

Awesome! Happy to hear this :)

eternityforest · on May 10, 2022

File system access is going to make some amazing things possible. With that plus service workers, Web is finally catching up to native.

I'm glad to see that other browsers(Except FF) have it.

lharries · on May 9, 2022

The demo is incredible and really cool to see you supporting the open source contributors behind the libraries you are using. Is it possible to handle going offline?

mfester · on May 9, 2022

Yes that's one of the goals, and CRDTs are pretty much built for this. Also, browsers are starting to support Progressive Web Apps (PWAs), which enables the websites/apps themselves to be opened while offline.

stedman · on May 10, 2022

> hybrid browser-cloud-file data architectures

Epic.

Thanks for open sourcing this!

mfester · on May 10, 2022

With pleasure. Feedback welcome! :)

gailees · on May 9, 2022

Wow. How long did it take yall to pull this off?

mfester · on May 9, 2022

Once we figured out the "CRDT trick", i.e. simulating file changes as CRDT update operations, it was surprisingly quick to implement. If you look at our repo [1], you will see that the code is fairly straightforward and succinct. But this is very much thanks to the excellent work of Kevin Jahns on Yjs [2], which has made it a breeze to work with CRDTs in an efficient way!

[1] https://github.com/motifland/yfs [2] https://yjs.dev/

alexmasmej · on May 10, 2022

Congrats! Love it

mfester · on May 11, 2022

Thank you!

tangjeff0 · on May 9, 2022

Amazing work!

mfester · on May 9, 2022

Thanks!

timconnors · on May 10, 2022

super slick write up. nice job!

mfester · on May 10, 2022

Thank you!

sigmonsays · on May 9, 2022

no firefox love?

mfester · on May 9, 2022

We wish! Currently, the API is supported on Chrome 86+, Edge 86+ and Opera 72+: https://developer.mozilla.org/en-US/docs/Web/API/File_System.... It's all quite new, and we hope other browsers will follow.

iggldiggl · on May 11, 2022

Mozilla doesn't even want to give add-ons some sort of file system access API (which means that since the switch to webextensions, mass downloaders/download managers have become rather gimped and cannnot download files to outside of the system download folder without prompting for each individual download, can't intelligently handle conflicts between new downloads and already existing files [1], etc. etc.), so I wouldn't hold my breath for them allowing such an API for websites in general…

[1] With the Webextensions API, you have to decide on an action before starting the download, and the only choices are "overwrite", "rename the new download" or "prompt the user". There's no "skip the download" option for example, and Firefox doesn't even support the "prompt the user" option, either.

aaaaaaaaata · on May 9, 2022

Firefox/Mozilla loving PWA related APIs is where the love is behind schedule.

This is a controversial and debated topic.