I was going to say I expect much of the gain to come from "pragma synchronous=normal", which tells your MBP M1 to not bother committing the data to SSD robustly before telling the application that the insert is committed. From Sqlite docs:
> WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.
But then I realised "pragma fullfsync=off" is the default, which is a MacOS-specific pragma.
So, on MacOS only, WAL with synchronous=FULL (the default for that pragma) has the same durability issue as synchronous=NORMAL, that committed transactions might be rolled back following power loss or system crash, albeit with different probabilities.
If you really want to maximize sqlite inserts, prepare a statement with sqlite3_prepare_v3, then in your inner loop call sqlite3_bind_* + sqlite3_step. 100k easily. Unfortunately this interface is not exposed in most language bindings.
A fair number of ORMs will do this for you fwiw, but then you'll spend more CPU in the ORM stack so it is a bit of a tradeoff for micro-benchmarks like this.
If you think about this in latency terms, you are able to insert a row and be done with the entire ceremony in about 12 microseconds. This is serialized throughput too.
I think it is unlikely you would get this kind of performance with a hosted solution across the network. You could cheat with batching & caching, but for a cold, one-record non-query, nothing comes close to the latency of SQLite on a local NVMe device.
I tried the same test on Hetzner VPS (4VCPU AMD EPYC, 8 GB RAM, 160 GB NVMe SSD), and the result is about 20K inserts per sec. It's not 80K, but still exciting, I would even be happy with 10K. I am sure there is a big room for optimizations, but I am planning to use SQLite3 with Python in a very simple way.
If you (more commonly) run on a different host, you’re looking more at 20x-50x and this is assuming you’re in the same region.
To even get close to the SQLite level of throughput you’d have to stagger/pipeline your requests over a number of concurrent connections (proportional to the earlier x).
You’ll eventually succeed at doing just that, with multiple machines. Congratulations, but now you have to also consider the N+1 problem, even for small Ns. In SQLite, you can afford much more back-and-forth with simple queries.
* He’s comparing an in-memory SQLite DB against a non-temp table in Postgres. Of course the RTT is lower when you don’t have to interact with the filesystem or flush to disk in the critical path.
* He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket. Of course the RTT is higher when requests and responses needlessly traverse the TCP/IP stack.
> He’s comparing an in-memory SQLite DB against a non-temp table in Postgres.
That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).
EDIT: I think I ran my benchmarks with synchronous=normal. Not sure what he did.
> He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket.
I agree unix sockets would be better for same host setup, but still, the more representative number for “typical use” would be networked, no?
Also, for fairness: SQLite setups come with downtime, that’s hard to avoid. Worth mentioning..
> That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).
I was looking at the benchmark results at 13:27, but now I see there's another set of benchmark results shown at 29:13. He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.
> I agree unix sockets would be better for same host setup, but still, the more representative number for "typical use" would be networked, no?
If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases. If you want to compare their performance, it only makes sense to eliminate confounding variables, of which the TCP/IP stack is significant.
> He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.
No doubt that we deserve reproducible benchmarks and this could use more details. As for caching that’d be the same for Postgres in theory.
> If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases.
I agree and disagree! We already know Postgres is performant enough for a majority of server-side use cases. Ben is trying to show what you can expect from SQLite if you use it for those cases, which is novel to most people. SQLite doesn’t have the same amount of features, and partly because of that drawback, the extremely low latency (in a typical setup) is a redeeming factor. I think it’s not misleading to say SQLite has significantly lower app->db _latency_ than networked databases. Unix socket is a more fair microbench yea, but at least to me that’s not as useful because the Postgres setup wouldn’t be same-host.
The relative performance difference of no-op queries (RTT) is unlikely to have a high impact on application performance since they're so quick in absolute terms.
> Many programs use fopen(), fread(), and fwrite() to create and manage files of data in home-grown formats. SQLite works particularly well as a replacement for these ad hoc data files. Contrary to intuition, SQLite can be faster than the filesystem for reading and writing content to disk.
For a project a couple of years ago, I used Go language and what I did was a very simple thing: I took advantage its channel mechanism and created a slice to save (memoize more or less) 100K generated data which I inserted with a BEGIN TRANSACTION / END; in other words, every 100K I would insert them once and repeat.
If my memory serves me well, in approximately 6.8 seconds (give or take), I could insert 10M rows and bear in mind I didn't use WAL at all.
Were those individual inserts or as a transaction? The latter should significantly bump those numbers up. Also consider doing a prepared statement while you're at it.
Which, for this use case - bulk generating a read-only data file from a source dataset in a batch job - seems like a pretty good performance tradeoff, no? In the case of a failure of some kind - no big deal, just restart the process anew.
This is one of those use cases where SQLite isn’t replacing <database> - it’s replacing fopen.
Exactly, this use case is write-once then read-only after that. Random updates to the sqlite file after that would need more “normal” settings, but they wouldn’t also need as much throughput.
I hate the way so much of programming is just fiddling knobs on some old systems. It always gets to a point where I would prefer to rewrite the whole thing myself with only the functionality I need and to truly understand what is going on under the hood. I just wish these projects were built in a modular way so you can re-assemble parts of them back together easily.
1. Ugh this thing is too integrated, we should break it up into modular parts so we can re-assemble them using only the parts we need. (Examples: most OS kernels, sqlite apparently, big excel spreadsheets, web frameworks like Rails)
2. Ugh this thing is too modular, you need a bazillion parts just to get a useful system. (Examples: analytics platforms consisting of 20+ unix tools strapped together in a pipeline, most microservices architectures, everything in the node ecosystem)
SQLite is a really remarkable piece of software -- decades of peak human capital have been invested in it. Why wouldn't you want to make the most of that to achieve your objectives?
> decades of peak human capital have been invested in it
Measuring software in human capital isn't a good idea. Software becomes crusty very quickly. Once you get a huge user base re-architecting becomes almost impossible without a re-write.
A lot of velocity can come from completely freeing yourself from an existing solution and it's customers in the short-term.
A product written by some notorious rock stars at Google, and with a reputation for corrupting data...
[1] - "...LevelDB has history of database corruption bugs.[15][16][17][18][19][20] A study from 2014 has found that, on older (non-checksummed) file systems, the database could become corrupted after a crash or power failure.[21]..."
You seem to believe a description like that is a recommendation, but it might be the opposite. Any corporation is a place where Not-Invented-Here virus spreads rapidly, because a) there are numerous problems and edge cases that develop at large scale; b) there is manpower to charge at them; c) there are incentives to do just so, and appear as definitely hard working to the management. That aura even spreads externally, in particular, to this conversation we have. If something is made by a hip company, it's a gods' gift to mortals, if it's from uncool place, it's definitely a rusty tech working on punch cards. (I am sure there exist at least one example in which the same person or team did both.)
Of course, they must have had valid reasons to use LevelDB for some components in Chrome. It seems that Minecraft-like world generating games are also a good use case for it. What I'm saying is that Google doing something does not automatically mean you need to do the same at all.
It's worth pointing out that once you reach Google size and domination you can pretty much do whatever you want*. Things that if done by "ordinary" companies would destroy them or wipe out their customers/users overnight.
if i was an sqlite user, i really wouldn't care at all about google switching from sqlite to that thing in chrome.
From the readme:
> LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.
it's a completely different thing from sqlite. sqlite is a relational database (which can also do key-value storing, I guess).
my educated guess is that google started with sqlite, then reached the peak usefulness *within chrome* and only then switched to developing its own custom solution.
if anything, this is a testament to sqlite and how far you can go with it before needing a custom solution.
Thats nice and all, but Im a Chromium user who wants to manually audit and edit my \User Data\Default\Local Storage. That wasnt a problem in sqlite days, tons of tools available. Switch to leveldb not only made that difficult (one paid GUI tool in existence afaik), but also introduced a change in localstorage behavior. Chrome localStorage used to return entries sorted by key, this is no longer the case.
I believe Chrome has deprecated LevelDB at this point. It turns out that LevelDB doesn't work well when you need many different databases in a single application. So all new Chrome storage is SQLite. At least until they find something better...
A key-value store can underpin a SQL layer. Look at CockroachDB's usage of RocksDB (which is a descendant of LevelDB). Seems like a fair comparison to compare BTrees and LSM approaches. The latter work better because they align more closely with how solid state storage works. BTrees require a fair bit of pointer chasing for reads and are expensive to modify.
I definitely feel you. What's interesting is that it seems like a lot of the big popular tools everyone uses all started out this way. Someone thought "man, all the existing solutions suck and I don't understand them - let's make a new one that does only what I need." Then it turns out, many other people also needed that. Then people who need slightly different things come in and knobs are added.
FWIW, the whole reason for SQLite is the observation that getting storage correct was extremely hard. Even with a filesystem and knowing to do fsync+rename, you can have data loss in surprising scenarios. The defaults for SQLite are tuned for durability and even still perform reasonably well. A lot of the tuning is about turning off that durability default.
These knobs by the way are part and parcel for generic storage systems. Including filesystems a bit too and definitely all databases I've ever seen (because different HW will have different optimal settings).
> I hate the way so much of programming is just fiddling knobs on some old systems.
actually, i believe it's quite the contrary: most of what we need nowadays it's already been invented and developed.
you can be incredibly successful as an engineer at many companies just by knowing what knobs are there and how to turn them properly.
and i think this is relevant because most people end up reinventing a square wheel that only does 10% of what the most used, widely spread, already-open-source wheel does.
To be honest, the author is literally fiddling knobs to enable existing features described in documentation and mentioned everywhere on the net. All the “performance tuning” here is done by SQLite developers.
SQLite is so modular that someone could write a replacement for the filesystem layer that ended up sending requests across HTTP to query a database on another server [1]. Without touching any other layer of the code. How much more modular do you want a database to be, without making other compromises?
WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint.
I believe that WAL2 fixes this:
Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted
d’oh, I kind of speculated that this pattern might be possible to apply to WAL above without having read far enough down to see it was implemented in WAL2. Though I mentioned RCU the hot/cold partitions were something I was also thinking of.
I wonder if this could be further extended to better support concurrent writes. Depending on the implementation, with wal2 readers may be reading from both hot and cold files without blocking. So this may potentially be extendable to read from two hot files, or two hot files and two cold files.
It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:
Based on https://www.sqlite.org/wal.html it seems the WAL index is mmapped as a workaround to some edge cases not relevant to many application developers. They say it shouldn’t matter, but with the larger page sizes you’re using, using the original implementation approach they describe (volatile shared memory) actually might improve performance slightly - do you know if your WAL index ever exceeds 32KiB? Not sure as to the difficulty of resurrecting that old functionality though.
Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:
> When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.
Both the WAL docs and the article mention blocking checkpointing/a need for reader gaps to ensure the WAL flushes, or a possibility that WAL files will grow indefinitely. I had some speculation that this was an implementation limitation, and it turns out another comment mentions WAL2 may relax this requirement by using two WAL files split between “hot” and “cold”. Curious how the performance might compare with this: https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md
I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.
I’m thinking about doing this as well. Sometimes SQLite is a better tool for the job. For example, there’s a lot of software that uses a database as their file format. SQLite is ideal for this.
To answer the person’s question, there are easy ways and hard ways with all the trade offs you would expect. The easy way is to use an ODBC connector. That makes it easier to change the DB engine but it’s going to hurt performance. Chances are though that the performance will still be good enough for applications like database-as-file-format.
The fact that he doesn't understand why his wal file is growing without bounds should be a warning to take his suggestions with a grain of salt. This is yet another lazy benchmark with lazy suggestions by someone who is uselessly retreading the path laid down in the SQLite documentation.
It's _easier_ to write IO with mmap, but hand-optimized file IO could do even faster. And DBMSes historically cared a lot about optimizing file access.
So I'd expect SQLite to be faster without mmap, as I expect their developers to nurture the file access, instead of relying on OS-provided mmap.
PS: For example, say your code has a byte array, memory-mapped from a file. If the code needs to do a random read from it, it has no way of knowing whether that read would require waiting for a page to be read from disk, or it's already cached. Hand-optimized file IO have an option to maybe do some other things instead of waiting for disk (or during, or before).