Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yup, they win. My biggest SQLite database is 1.7TB with, as of just now 2314851188 records (all JSON documents with a few keyword indexes via json_extract).

Works like a charm, as in: the web app consuming the API linked to it returns paginated results for any relevant search term within a second or so, for a handful of concurrent users.




I think FS-level compression would be a perfect match. Has anyone tried it successfully on large SQLite DBs? (I tried but btrfs failed to do so, and I didn't get to the bottom of why).


I did a small benchmark for compression with VFS and column compression a while ago: https://logdy.dev/blog/post/part-3-log-file-compression-with... https://logdy.dev/blog/post/part-4-log-file-compression-with... It all depends on the use cases and read/write patterns. Imo if well designed could yield added value


> I think FS-level compression would be a perfect match. Has anyone tried it successfully on large SQLite DBs?

I've had decent success with `sqlite-zstd`[0] which is row-level compression but only on small (~10GB) databases. No reason why it couldn't work for bigger DBs though.

[0] https://github.com/phiresky/sqlite-zstd


> My biggest SQLite database is 1.7TB with

What do you run this on? Just some aws vpc with a huge disk attached?


I can see that you're a user of AWS. Check some prices on dedicated servers one day. They're an order of magnitude cheaper than similar AWS instances, and more powerful because all compute and storage resources are local and unshared.

They do have a higher price floor, though. There are no $5/month dedicated servers anywhere - the cheapest is more like $40. There are $5/month virtual servers outside of AWS which are cheaper and more powerful than $5/month AWS instances.


A Windows Server VM on a self-hosted Hyper-V box, which has a whole bunch of 8TB NVMe drives; this VM has a 4TB virtual volume on one of those (plus a much smaller OS volume on another).


How do you backup a file like that?


Using the SQLite backup API, which pretty much corresponds to the .backup CLI command. It doesn't block any reads or writes, so the performance impact is minimal, even if you do it directly to slow-ish storage.


> It doesn't block any reads or writes.

That's neat! I bet it keeps growing a WAL file while the backup is ongoing right?


Hard to imagine doing it any other way, which is probably fine up until you hit some larger files sizes.


That copies the entire file each time (not just deltas).

You may find sqlite_rsync better.


I use zfs snapshots, they work in diffs so they're very cheap to store, create, and replicate.


sqlite_rsync is new tool created by sqlite team. It might be useful.


Ctrl-c, Ctrl-v


That's not great advice for a large database (and I wouldn't recommend it for small databases either). That incidentally works when the db is small enough that the copy is nearly atomic, but with a big copy, you can end up with a corrupt database. SQLite is designed such that a crashed system at any time does not corrupt a database. It's not designed such that a database can be copied linearly from beginning to end while it's being written to without corruption. Simply copying the database is only good enough if you can ensure that there are no write transactions opened during the entire copy.

A reliable backup will need to use the .backup command, the .dump command, the backup API[0], or filesystem or volume snapshots to get an atomic snapshot.

[0]: https://www.sqlite.org/backup.html


You should use mongodb. It’s web scale



This hits so much harder 15 years on.

Additionally Xtranormal missed out on the generative video curve


Someone might take that as advice.


MongoDB earns $1.7b a year in revenue.

A whole lot of people have already taken that advice.


Captive customers my friend.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: