I’m building a local-first web app, and SQLite works well for my case since a single project can be contained in one database file, just like users are used to with existing desktop applications.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.
SQLite has a session extension that can record changes on a local database into a changeset and you can replay those changes on another SQLite instance. Note that it replays what the changes were, not the queries that resulted in the changes. When applying changes you provide a conflict handler. (You can also invert changesets making a handy undo/redo feature.)
You can save conflicts to another changeset. There is also a rebaser to help deal with multiple way syncing.
there's also a CRDT version of this, which allows two databases to be sync'ed to each other in real time (aka, updates to one will eventually make it to the other, and both database would eventually contain the same data).
It's https://vlcn.io/docs/cr-sqlite/intro , and i find it amazing that this is doable in sqlite. It is perfect for small scale collaboration imho, but it also works to sync across local client and remote server (for a single db per user scenario).
Interesting link, it'd be great if their solution meets expectations.
Right now, the proof-of-concept they've provided seems simplistic. Their progress seems to have shifted from cr-sqlite into "Zero" instead. I'm guessing it has something to do with CRDTs being quite app-specific and hard to generalize.
I would want to see this library used in production first before hyping it
in a sense it is quite specific. In a difference sense, this is as generic a CRDT as you can get - it's CRDT on table(s). There's no merging of rows iirc (unless you write a custom merge, which is supported but probably need some tweaking and could lead to poor results?).
Maybe I am misunderstanding which part you want in the cloud, but that sounds like litestream. Let’s you transparently backup a live SQLite database to a remote destination.
I depend on litestream for production backups and as the months wear on without any releases I am getting more nervous. To be clear, I don’t feel entitled to anything with an open source project like this, but bug reports and fixes seem to be accumulating. I have flirted with the idea of building from main.
I’ve also flirted with the idea of forking litestream and stripping it down dramatically. The reason why is that I don’t like the idea of the production server being in charge of rotation and deletion. It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised. I might even go so far as to propose that the “even liter stream” process merely writes to a different local volume and then some other process does the uploading but I haven’t gotten beyond the daydream stage.
> It seems like the thing getting backed up shouldn’t have the privilege of deleting backups in case it gets compromised.
(agreed)
> For backups, I added a nightly cron job which > exports my SQLite db to a write-only S3 bucket.
Why not only do this and use an s3 sync instead? You can safely backup SQLite databases while they're being written to, so no need to export (dump) them; just copy the files themselves.
This might mean that your entire backup/restore strategy is just to copy some files. If so, that's ideal.
(Of course, s3 sync does require reading as well as writing, so perhaps just increase your cron job to run more often so it fits within your RPO)
I think the implication isn't that there are bugs they are immediately concerned about, but that other issues not being addressed might mean that should they run into a bug that does cause problems there may not be a timely solution, if any.
Offline-first databases are a hard problem because there isn't just one copy of the database on the user's side, there are N copies - every browser tab or device on which the user can open the local database and make an edit. It's basically an AP multi-master database (= the same row can be edited at different nodes at the same time), and you likely cannot achieve good results without a database that natively supports multi-master operations.
That’s not necessarily true; if you use Origin Private Filesystem along with a Web Worker that acts as a local database server and works off a single SQLite database, you at least have a single DB file per device. From there on, your problem becomes state reconciliation on the server, which CRDTs should help solving.
Not an easy problem for sure, but the web platform is surprisingly capable these days.
CRDTs are so-so and likely cause issues with maintaining relational DBs' transactional consistency. There's a reason none of the NewSQL databases (to my knowledge) are multi-leader.
I too think that CRDT databases are probably something you should explore. You generally have the whole database locally, and changes get synced pretty easily (but you have to live within the rules of your CRDT).
I have long since lost touch with the state of it, but at the time the syncing to their server was fast and had worked with a long list of environments/languages.
The one thing I will caution: their model was that you almost had to have a database-per-customer. You could have a second one that contained common information, but they had no concept of only syncing part of a database based on some logic. So many customer implications had the clients syncing multiple databases, and then a back-end client that would aggregate the data from all of those databases into one for backend processes. Extra complexity that I always thought was a real killer.
Isn't the simplest way to "sync" to just replace the remote database file with the local database file? One of the nice things about each database being encapsulated as a single file.
You could do a checkpoint first though I believe? And if the database is only being updated on your local client I don’t think WAL mode would have much benefit since it’s probably not getting many concurrent writes.
> What I’d really like is an easy way to sync the SQLite database state to a cloud service.
Don't do this, but an application I used to work on (to replace it) copied the sqlite file to a folder then used rsync to sync it with a backup node. Apparently it worked and was good enough for that use case (inefficient php backend application with at most a dozen concurrent users).
100.000 rows is only a few megabytes at most, right? Should be fine.
What's wrong with that? Of course it will work fine; SQLite, with or without WAL, has a ton of protections against corruption from writes-in-progress, which is what makes hot backups work.
How about: Have 1 + N separate SQLite database-files.
Each user would have their own database-file which contains only information about that user. Then 1 shared database-file which contains info needed for all users.
Users would update their own data, which is a small database file which can be easily uploaded. They would not need to update the shared data.
Not knowing your app I don't know what the shared data would contain, presumably something. Perhaps the shared data-file would be updated on the server based on what individual user-data the users upload.
> expect users to connect to the service using multiple devices (clients).
But probably using only one device at a time by a single user?
My thought, and it is just a thought, here is that instead of trying to provide a GENERAL solution for all kinds of data-update patterns, it is often possible to think in terms of what my current application specifically needs. It is easier to come up with such a solution with SQLite per app because SQLite is so "lite".
I can't speak for the "general solution" except to say that many times you don't need an all-encompassing general solution, just a solution for your current app.
> But probably using only one device at a time by a single user?
It depends on your expectations of concurrent use. Computer + tablet + phone means many users may use different devices within seconds of each other. If you want to support offline-first usage, concurrent updates from different clients for the same user becomes more likely.
A simple, manual backup would be fine I think. You can just put an "upload" or "backup to cloud" button to allow user push a full version with timestamp to S3.
Synchronization may introduce a lot more problems, especially when you want to automatically sync the database to some other place. You will need to deal with sync errors, inconsistency, version conflicts, rollbacks...
If your users could accept that, a simple full version backup is the best solution.
Dolt would do that for you. It has push/pull semantics like git. As a bonus you can use its version control features to implement sophisticated undo/redo features.
I've wanted to use SQLite a few times for the simplicity. I always end up using Postgres though because I don't understand how multiple services / replicas can make use of it. If another piece of infrastructure is needed to support it (even nfs), that seemingly counters any simplicity gains.
What I’d really like is an easy way to sync the SQLite database state to a cloud service. Most existing options expect you to query against a remotely hosted database and charge per read/write.
Since the database will have around 100,000 rows and you're typically working with all the data at once, streaming parts of it doesn’t make sense for my use case.
The closest I’ve found is Turso, which has offline writes in private beta, and SQLite Cloud, which lists local-first and offline sync as "coming soon."
The simplest approach might be letting users push to S3 storage with versioning. Ideally, it would also support point-in-time restores, tracking incremental updates alongside full snapshots.
Even better, I’d manage minimal server-side infrastructure and just pull the SQLite database from a service that handles syncing and management.