SQLite performance tuning: concurrent reads, multiple GBs and 100k SELECTs/s

Maksadbek · on April 12, 2023

I was benchmarking SQLite3 with Python on my MBP M1 and with default settings could achieve 2-4000 inserts per sec. Then applied suggested settings:

  pragma journal_mode = WAL;
  pragma synchronous = normal;
  pragma temp_store = memory;
  pragma mmap_size = 30000000000;

As a result the average number inserts per second was 80K

jlokier · on April 13, 2023

I was going to say I expect much of the gain to come from "pragma synchronous=normal", which tells your MBP M1 to not bother committing the data to SSD robustly before telling the application that the insert is committed. From Sqlite docs:

> WAL mode is always consistent with synchronous=NORMAL, but WAL mode does lose durability. A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

But then I realised "pragma fullfsync=off" is the default, which is a MacOS-specific pragma.

So, on MacOS only, WAL with synchronous=FULL (the default for that pragma) has the same durability issue as synchronous=NORMAL, that committed transactions might be rolled back following power loss or system crash, albeit with different probabilities.

deepsun · on April 13, 2023

Curious what the performance would be with "pragma fullfsync=on", as it is on other platforms.

infogulch · on April 13, 2023

If you really want to maximize sqlite inserts, prepare a statement with sqlite3_prepare_v3, then in your inner loop call sqlite3_bind_* + sqlite3_step. 100k easily. Unfortunately this interface is not exposed in most language bindings.

Groxx · on April 13, 2023

A fair number of ORMs will do this for you fwiw, but then you'll spend more CPU in the ORM stack so it is a bit of a tradeoff for micro-benchmarks like this.

bob1029 · on April 12, 2023

80K is a good figure in my experience.

If you think about this in latency terms, you are able to insert a row and be done with the entire ceremony in about 12 microseconds. This is serialized throughput too.

I think it is unlikely you would get this kind of performance with a hosted solution across the network. You could cheat with batching & caching, but for a cold, one-record non-query, nothing comes close to the latency of SQLite on a local NVMe device.

Maksadbek · on April 13, 2023

I tried the same test on Hetzner VPS (4VCPU AMD EPYC, 8 GB RAM, 160 GB NVMe SSD), and the result is about 20K inserts per sec. It's not 80K, but still exciting, I would even be happy with 10K. I am sure there is a big room for optimizations, but I am planning to use SQLite3 with Python in a very simple way.

Source code of Python test file: https://gist.github.com/maksadbek/2385b002b439e03dc948b05593...

Scarbutt · on April 13, 2023

Putting postgres and the app in the same server will shrink the latency and in the end it will win in throughput as your data and QPS increases.

klabb3 · on April 13, 2023

Postgres over localhost is ~10x slower (as in RTT) than SQLite, I think mainly due to serialization and perhaps context switches?

https://youtu.be/XcAYkriuQ1o

If you (more commonly) run on a different host, you’re looking more at 20x-50x and this is assuming you’re in the same region.

To even get close to the SQLite level of throughput you’d have to stagger/pipeline your requests over a number of concurrent connections (proportional to the earlier x).

You’ll eventually succeed at doing just that, with multiple machines. Congratulations, but now you have to also consider the N+1 problem, even for small Ns. In SQLite, you can afford much more back-and-forth with simple queries.

10000truths · on April 13, 2023

The benchmark in the video seems flawed:

* He’s comparing an in-memory SQLite DB against a non-temp table in Postgres. Of course the RTT is lower when you don’t have to interact with the filesystem or flush to disk in the critical path.

* He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket. Of course the RTT is higher when requests and responses needlessly traverse the TCP/IP stack.

klabb3 · on April 13, 2023

> He’s comparing an in-memory SQLite DB against a non-temp table in Postgres.

That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).

EDIT: I think I ran my benchmarks with synchronous=normal. Not sure what he did.

> He’s connecting to Postgres over a loopback interface, instead of using a UNIX socket.

I agree unix sockets would be better for same host setup, but still, the more representative number for “typical use” would be networked, no?

Also, for fairness: SQLite setups come with downtime, that’s hard to avoid. Worth mentioning..

10000truths · on April 13, 2023

> That’d surprise me, where does he say that? (Even if he doesn’t it’s important to iron out though, so thanks for being critical!) The ~18us sound reasonable from my own experiments, and that’s with flush to file (using wal mode).

I was looking at the benchmark results at 13:27, but now I see there's another set of benchmark results shown at 29:13. He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.

> I agree unix sockets would be better for same host setup, but still, the more representative number for "typical use" would be networked, no?

If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases. If you want to compare their performance, it only makes sense to eliminate confounding variables, of which the TCP/IP stack is significant.

klabb3 · on April 13, 2023

> He doesn't link to a repo for that second one, but in-memory vs persistent doesn't matter as much when measuring reads if the results are cached in memory.

No doubt that we deserve reproducible benchmarks and this could use more details. As for caching that’d be the same for Postgres in theory.

> If we're going by what's "representative", then a benchmark isn't useful, because Postgres and SQLite have dissimilar use cases.

I agree and disagree! We already know Postgres is performant enough for a majority of server-side use cases. Ben is trying to show what you can expect from SQLite if you use it for those cases, which is novel to most people. SQLite doesn’t have the same amount of features, and partly because of that drawback, the extremely low latency (in a typical setup) is a redeeming factor. I think it’s not misleading to say SQLite has significantly lower app->db _latency_ than networked databases. Unix socket is a more fair microbench yea, but at least to me that’s not as useful because the Postgres setup wouldn’t be same-host.

ricardobeat · on April 13, 2023

Loopback interfaces in Linux bypass the network stack (and maybe in windows server as well?).

withinboredom · on April 13, 2023

I believe you still have TCP serialization (including packet fragmentation) in a loopback, but not in a unix socket. I could be wrong though.

fulafel · on April 13, 2023

The relative performance difference of no-op queries (RTT) is unlikely to have a high impact on application performance since they're so quick in absolute terms.

jbverschoor · on April 13, 2023

What about postgresql using a file socket

electric_mayhem · on April 13, 2023

Does SQLite do direct disk I/o or something?

I’m open to being convinced otherwise, but I would expect that inserts/s would vary a bunch by filesystem.

But I do agree that local nvme with any file system is absurdly better than what you’re likely to find in typical cloud envs.

alberth · on April 13, 2023

> Many programs use fopen(), fread(), and fwrite() to create and manage files of data in home-grown formats. SQLite works particularly well as a replacement for these ad hoc data files. Contrary to intuition, SQLite can be faster than the filesystem for reading and writing content to disk.

https://www.sqlite.org/whentouse.html

dundarious · on April 13, 2023

That quotation is about a SQLite file db being a better choice than multiple ad-hoc binary files. It’s not saying whether SQLite uses direct IO.

stefanos82 · on April 14, 2023

For a project a couple of years ago, I used Go language and what I did was a very simple thing: I took advantage its channel mechanism and created a slice to save (memoize more or less) 100K generated data which I inserted with a BEGIN TRANSACTION / END; in other words, every 100K I would insert them once and repeat.

If my memory serves me well, in approximately 6.8 seconds (give or take), I could insert 10M rows and bear in mind I didn't use WAL at all.

Not bad I would say, not bad at all.

pstuart · on April 13, 2023

Were those individual inserts or as a transaction? The latter should significantly bump those numbers up. Also consider doing a prepared statement while you're at it.

Maksadbek · on April 13, 2023

I assume it were individial inserts. https://docs.python.org/3/library/sqlite3.html#sqlite3.Curso...

Scarbutt · on April 13, 2023

What was the insert command?

Maksadbek · on April 13, 2023

It's just :)

    conn.execute("insert into cell_towers values (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)", row)

zylepe · on April 13, 2023

I spent a while optimizing sqlite inserts for planetiler, this is what I came up with:

https://github.com/onthegomap/planetiler/blob/db0ab02263baaa...

It batches inserts into bulk statements and is able to do writes in the 500k+ per second range, and reads are 300-400k/s using those settings.

alberth · on April 13, 2023

You’re eliminating the “D” from “ACID” with these settings, and risk data loss on power failure and/or entire database corruption.

I can make a car go really fast if I eliminate the weight of having safety equipment on it like a bumper, seatbelt and airbag.

thadt · on April 13, 2023

Which, for this use case - bulk generating a read-only data file from a source dataset in a batch job - seems like a pretty good performance tradeoff, no? In the case of a failure of some kind - no big deal, just restart the process anew.

This is one of those use cases where SQLite isn’t replacing <database> - it’s replacing fopen.

zylepe · on April 13, 2023

Exactly, this use case is write-once then read-only after that. Random updates to the sqlite file after that would need more “normal” settings, but they wouldn’t also need as much throughput.

forinti · on April 13, 2023

It would be nice to have a compendium of SQLite use-cases and their respective configurations.

vaughan · on April 13, 2023

I hate the way so much of programming is just fiddling knobs on some old systems. It always gets to a point where I would prefer to rewrite the whole thing myself with only the functionality I need and to truly understand what is going on under the hood. I just wish these projects were built in a modular way so you can re-assemble parts of them back together easily.

WJW · on April 13, 2023

That's the eternal pendulum of programming:

1. Ugh this thing is too integrated, we should break it up into modular parts so we can re-assemble them using only the parts we need. (Examples: most OS kernels, sqlite apparently, big excel spreadsheets, web frameworks like Rails)

2. Ugh this thing is too modular, you need a bazillion parts just to get a useful system. (Examples: analytics platforms consisting of 20+ unix tools strapped together in a pipeline, most microservices architectures, everything in the node ecosystem)

vaughan · on April 13, 2023

Spot on.

bomewish · on April 13, 2023

SQLite is a really remarkable piece of software -- decades of peak human capital have been invested in it. Why wouldn't you want to make the most of that to achieve your objectives?

vaughan · on April 13, 2023

> decades of peak human capital have been invested in it

Measuring software in human capital isn't a good idea. Software becomes crusty very quickly. Once you get a huge user base re-architecting becomes almost impossible without a re-write.

A lot of velocity can come from completely freeing yourself from an existing solution and it's customers in the short-term.

criddell · on April 13, 2023

SQLite’s source code is well structured and pretty clean. It’s often held up as an example of a big codebase that’s great to learn from.

rasz · on April 13, 2023

Google didnt, thus Chrome started replacing sqlite with https://github.com/google/leveldb

belter · on April 13, 2023

A product written by some notorious rock stars at Google, and with a reputation for corrupting data...

[1] - "...LevelDB has history of database corruption bugs.[15][16][17][18][19][20] A study from 2014 has found that, on older (non-checksummed) file systems, the database could become corrupted after a crash or power failure.[21]..."

https://github.com/google/leveldb/issues?utf8=%E2%9C%93&q=co...

https://bugs.chromium.org/p/chromium/issues/detail?id=261623

https://forum.syncthing.net/t/panic-leveldb-table-corruption...

https://github.com/google/leveldb/issues/333

https://ethereum.stackexchange.com/questions/1159/corruption...

[1] - https://en.wikipedia.org/wiki/LevelDB

ogurechny · on April 13, 2023

You seem to believe a description like that is a recommendation, but it might be the opposite. Any corporation is a place where Not-Invented-Here virus spreads rapidly, because a) there are numerous problems and edge cases that develop at large scale; b) there is manpower to charge at them; c) there are incentives to do just so, and appear as definitely hard working to the management. That aura even spreads externally, in particular, to this conversation we have. If something is made by a hip company, it's a gods' gift to mortals, if it's from uncool place, it's definitely a rusty tech working on punch cards. (I am sure there exist at least one example in which the same person or team did both.)

Of course, they must have had valid reasons to use LevelDB for some components in Chrome. It seems that Minecraft-like world generating games are also a good use case for it. What I'm saying is that Google doing something does not automatically mean you need to do the same at all.

rasz · on April 13, 2023

I think their reason was yolo. Cookies are still stored in sqlite so now they have two storage engines doing same thing to worry about.

ranting-moth · on April 13, 2023

It's worth pointing out that once you reach Google size and domination you can pretty much do whatever you want*. Things that if done by "ordinary" companies would destroy them or wipe out their customers/users overnight.

* Some limitations still apply.

znpy · on April 13, 2023

if i was an sqlite user, i really wouldn't care at all about google switching from sqlite to that thing in chrome.

From the readme:

> LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

it's a completely different thing from sqlite. sqlite is a relational database (which can also do key-value storing, I guess).

my educated guess is that google started with sqlite, then reached the peak usefulness *within chrome* and only then switched to developing its own custom solution.

if anything, this is a testament to sqlite and how far you can go with it before needing a custom solution.

rasz · on April 13, 2023

Thats nice and all, but Im a Chromium user who wants to manually audit and edit my \User Data\Default\Local Storage. That wasnt a problem in sqlite days, tons of tools available. Switch to leveldb not only made that difficult (one paid GUI tool in existence afaik), but also introduced a change in localstorage behavior. Chrome localStorage used to return entries sorted by key, this is no longer the case.

IX-103 · on April 13, 2023

I believe Chrome has deprecated LevelDB at this point. It turns out that LevelDB doesn't work well when you need many different databases in a single application. So all new Chrome storage is SQLite. At least until they find something better...

znpy · on April 16, 2023

Whatever, blame it on chrome/chromium.

We’re talking sqlite here, your comment is off-topic.

vlovich123 · on April 13, 2023

A key-value store can underpin a SQL layer. Look at CockroachDB's usage of RocksDB (which is a descendant of LevelDB). Seems like a fair comparison to compare BTrees and LSM approaches. The latter work better because they align more closely with how solid state storage works. BTrees require a fair bit of pointer chasing for reads and are expensive to modify.

resonious · on April 13, 2023

I definitely feel you. What's interesting is that it seems like a lot of the big popular tools everyone uses all started out this way. Someone thought "man, all the existing solutions suck and I don't understand them - let's make a new one that does only what I need." Then it turns out, many other people also needed that. Then people who need slightly different things come in and knobs are added.

vlovich123 · on April 13, 2023

FWIW, the whole reason for SQLite is the observation that getting storage correct was extremely hard. Even with a filesystem and knowing to do fsync+rename, you can have data loss in surprising scenarios. The defaults for SQLite are tuned for durability and even still perform reasonably well. A lot of the tuning is about turning off that durability default.

These knobs by the way are part and parcel for generic storage systems. Including filesystems a bit too and definitely all databases I've ever seen (because different HW will have different optimal settings).

znpy · on April 13, 2023

> I hate the way so much of programming is just fiddling knobs on some old systems.

actually, i believe it's quite the contrary: most of what we need nowadays it's already been invented and developed.

you can be incredibly successful as an engineer at many companies just by knowing what knobs are there and how to turn them properly.

and i think this is relevant because most people end up reinventing a square wheel that only does 10% of what the most used, widely spread, already-open-source wheel does.

ogurechny · on April 13, 2023

To be honest, the author is literally fiddling knobs to enable existing features described in documentation and mentioned everywhere on the net. All the “performance tuning” here is done by SQLite developers.

joppy · on April 13, 2023

SQLite is so modular that someone could write a replacement for the filesystem layer that ended up sending requests across HTTP to query a database on another server [1]. Without touching any other layer of the code. How much more modular do you want a database to be, without making other compromises?

[1]: https://phiresky.github.io/blog/2021/hosting-sqlite-database...

withinboredom · on April 13, 2023

hmmm, running the dom demos causes a segfault in the browser. That probably shouldn't happen...

zackees · on April 13, 2023

    WAL mode has some issues where depending on the write pattern, the WAL size can grow to infinity, slowing down performance a lot. I think this usually happens when you have lots of writes that lock the table so sqlite never gets to doing wal_autocheckpoint.

I believe that WAL2 fixes this:

    Wal2 mode does not have this problem. In wal2 mode, wal files do not grow indefinitely even if the checkpointer never has a chance to finish uninterrupted

https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

freddw · on April 13, 2023

d’oh, I kind of speculated that this pattern might be possible to apply to WAL above without having read far enough down to see it was implemented in WAL2. Though I mentioned RCU the hot/cold partitions were something I was also thinking of.

I wonder if this could be further extended to better support concurrent writes. Depending on the implementation, with wal2 readers may be reading from both hot and cold files without blocking. So this may potentially be extendable to read from two hot files, or two hot files and two cold files.

datadeft · on April 13, 2023

I am not sure if this works on my end:

  sqlite> PRAGMA journal_mode = delete;
  delete
  sqlite> PRAGMA journal_mode = wal2;
  delete

Does this mean wal2 is not available?

polyrand · on April 13, 2023

wal2 mode is not part of the main development branch of SQLite, you need to compile SQLite yourself from that branch. Same with BEGIN CONCURRENT [0]

[0]: https://www.sqlite.org/src/doc/begin-concurrent/doc/begin_co...

datadeft · on April 13, 2023

Thank you!

freddw · on April 13, 2023

It might be helpful to link the docs for some of these configs in the section where you mention them. Some thoughts:

Based on https://www.sqlite.org/wal.html it seems the WAL index is mmapped as a workaround to some edge cases not relevant to many application developers. They say it shouldn’t matter, but with the larger page sizes you’re using, using the original implementation approach they describe (volatile shared memory) actually might improve performance slightly - do you know if your WAL index ever exceeds 32KiB? Not sure as to the difficulty of resurrecting that old functionality though.

Also, this case seems like something that could happen from time to time depending on what you’re doing. Did you encounter it? Case:

> When the last connection to a particular database is closing, that connection will acquire an exclusive lock for a short time while it cleans up the WAL and shared-memory files. If a second database tries to open and query the database while the first connection is still in the middle of its cleanup process, the second connection might get an SQLITE_BUSY error.

Both the WAL docs and the article mention blocking checkpointing/a need for reader gaps to ensure the WAL flushes, or a possibility that WAL files will grow indefinitely. I had some speculation that this was an implementation limitation, and it turns out another comment mentions WAL2 may relax this requirement by using two WAL files split between “hot” and “cold”. Curious how the performance might compare with this: https://sqlite.org/cgi/src/doc/wal2/doc/wal2.md

avinassh · on April 13, 2023

I am experimenting with SQLite, where I try inserting 1B rows in under a minute. The current best is inserting 100M rows at 23s. I cut many corners to get performance, but the tweaks might suit your workload.

I have explained my rationale and approach here - https://avi.im/blag/2021/fast-sqlite-inserts/

the repo link - https://github.com/avinassh/fast-sqlite3-inserts

iveqy · on April 13, 2023

You probably already seen it but here's a very good write up on how to improve insert performance: https://stackoverflow.com/questions/1711631/improve-insert-p...

datadeft · on April 13, 2023

We are in the process of moving from Postgres to SQLite so these suggestions are very useful.

alberth · on April 13, 2023

Would you mind elaborating on how you came to that decision and why?

(Not suggesting it’s wrong, just interesting to hear from others as to why)

criddell · on April 13, 2023

I’m thinking about doing this as well. Sometimes SQLite is a better tool for the job. For example, there’s a lot of software that uses a database as their file format. SQLite is ideal for this.

To answer the person’s question, there are easy ways and hard ways with all the trade offs you would expect. The easy way is to use an ODBC connector. That makes it easier to change the DB engine but it’s going to hurt performance. Chances are though that the performance will still be good enough for applications like database-as-file-format.

d1l · on April 13, 2023

The fact that he doesn't understand why his wal file is growing without bounds should be a warning to take his suggestions with a grain of salt. This is yet another lazy benchmark with lazy suggestions by someone who is uselessly retreading the path laid down in the SQLite documentation.

yencabulator · on April 14, 2023

> Normal is still completely corruption safe in WAL mode, and means only WAL checkpoints have to wait for FSYNC.

I would consider data loss on crash to be "corruption", for sure. And synchronous=normal + journal_mode=WAL can lose data:

https://www.sqlite.org/pragma.html#pragma_synchronous

> A transaction committed in WAL mode with synchronous=NORMAL might roll back following a power loss or system crash.

Scarbutt · on April 13, 2023

'Multiple GB' is ambiguous(is it 20GB or 200GB?) and the article doesn't specify the size.

deepsun · on April 13, 2023

Same with mmap > "Should be much faster".

It's _easier_ to write IO with mmap, but hand-optimized file IO could do even faster. And DBMSes historically cared a lot about optimizing file access.

So I'd expect SQLite to be faster without mmap, as I expect their developers to nurture the file access, instead of relying on OS-provided mmap.

PS: For example, say your code has a byte array, memory-mapped from a file. If the code needs to do a random read from it, it has no way of knowing whether that read would require waiting for a page to be read from disk, or it's already cached. Hand-optimized file IO have an option to maybe do some other things instead of waiting for disk (or during, or before).

Dave3of5 · on April 13, 2023

Doesn't this put the entire dataset into memory no wonder it's so fast.

withinboredom · on April 13, 2023

I'd be curious for a similar tuning with Dqlite: https://github.com/canonical/dqlite