SQLite-loadable-rs: A framework for building SQLite Extensions in Rust

alexgarcia-xyz · on Dec 13, 2022

Author here, happy to answer any questions! Here's a few links that may be useful:

Direct GitHub link: https://github.com/asg017/sqlite-loadable-rs

A soon-to-be-released sqlite-loadable extension that's the fasted CSV parser for SQLite, and rivals DuckDB's parser at non-analytical queries: https://github.com/asg017/sqlite-xsv

A regular expression extension in Rust, the fastest SQLite regex implementation: https://github.com/asg017/sqlite-regex

And in the near future, expect more extensions for postgres, parquet, XML, S3, image manipulation, and more!

CGamesPlay · on Dec 14, 2022

The API for virtual tables is tricky, because you don't ever consume the API in Rust: SQLite does it. However, as it is presented the interface is unsafe, I think.

In the sqlite-xsv VTab::open impl for XsvTable [0], you pass a &'vtab XsvTable to XsvCursor. But the means that if open is called again, to create a second cursor, then a `&mut XsvTable` and a `&XsvTable` reference exist at the same time, which is unsafe. Note that sqlite-xsv doesn't actually keep the `&XsvTable`, so it's fine, but it serves as an example where a safety problem could surface.

The same problem also applies to VTabWriteable::update. The fix is to make both of these methods receive an immutable reference and force implementors to use interior mutability. Note this isn't hypothetical, it's actually unsafe, and would appear if you had a unit test that implemented a writable virtual table backed by a rust data structure, and attempted to iterate and update at the same time.

I have an as-yet-unpublished code base that experiments with this. It's not published yet because it doesn't cover everything I want to and I'm trying to avoid publishing something where the API might have to break. My goals are explicitly safety first, performance second. https://github.com/CGamesPlay/sqlite3_ext

[0] https://github.com/asg017/sqlite-xsv/blob/main/src/xsv.rs#L9...

CGamesPlay · on Dec 14, 2022

I went ahead and published the crate so that it is browsable via docs.rs. Feel free to adopt anything from it; the code is public domain. https://docs.rs/sqlite3_ext/latest/sqlite3_ext/

fedux · on Dec 14, 2022

I think "undefined behavior" is the standard term for the "is unsafe" situation that you mention.

teruakohatu · on Dec 14, 2022

> And in the near future, expect more extensions

Please consider GIS/spatial extension. SpatiaLite is pretty bad, for example the KNN feature is deprecated and the replacement has not yet made it to a release yet.

https://www.gaia-gis.it/fossil/libspatialite/wiki?name=KNN

dfee · on Dec 13, 2022

Where can I get a good understanding of the limitations of SQLite for production use cases?

I’ve used it frequently for prototyping, but the dialect is different enough from Postgres, that I usually switch over pretty early. But for production, it seems people are really advancing it as a solution for applications, which doesn’t fit with the “When To Use” guide on the website. https://www.sqlite.org/whentouse.html

simonw · on Dec 13, 2022

The idea that SQLite shouldn't be use for web applications is about a decade out-of-date at this point.

But... actual guidance as to how to use it there is still pretty thin on the ground!

Short version: use WAL mode (which is not the default). Only send writes from a single process (maybe via a queue). Run your own load tests before you go live. Don't use it if you're going to want to horizontally scale to handle more than 1,000 requests/second or so (though vertically scaling will probably work really well).

I'd love to see more useful written material about this. I hope to provide more myself at some point.

Here are some notes I wrote a few months ago: https://simonwillison.net/2022/Oct/23/datasette-gunicorn/

tkhattra · on Dec 13, 2022

> Only send writes from a single process

depending on your use case, using multiple writer processes can be fine, e.g. multiple writer processes storing periodic metrics (once per minute) into a single sqlite database. the occasional "database is locked" error (it does happen a few times a day) is handled by simply returning and trying again at the next interval. that's fine for this particular use case - the metrics for the last interval are just delayed a bit.

bawolff · on Dec 13, 2022

One thing that confused me initially with sqlite - if you want it to wait for writing locks to be free instead of erroring on write contention, you have to start write transactions with BEGIN IMMEDIATE. Since if you start with a read transaction that gets upgraded to write in the middle of it, you can't really wait for locks because you cant have a consistent snapshot of an old version of the DB that is read/write.

tkhattra · on Dec 14, 2022

yes, we use BEGIN IMMEDIATE with a 2 second busy timeout. that has a worst case wait time of 2 seconds, but the 95th percentile wait time is around 100 to 1000 microseconds and the 99th percentile is around 1 to 10 milliseconds. for a high throughput and/or low latency service, a worst case wait time of 2 seconds might be unacceptable, but it's fine in our case.

Thaxll · on Dec 13, 2022

> The idea that SQLite shouldn't be use for web applications is about a decade out-of-date at this point

I think it's still very relevant. I mean you have service XYZ replicated two times behind a load balancer, how do you use SQLite out of the box? You can't, you have to use one of those service that will proxy / queue queries, so then why even trying to use SQLite.

The fact that it's just a file on disk make it a bad solution for a lot of simple use cases.

emadda · on Dec 13, 2022

You only need a pool of servers behind a load balancer for extreme reliability or for scaling out.

Most small to medium web applications can handle:

- Only being reliable for 99.99% of the time (using a single cloud VM where the cloud provides the hardware level failover).

- Scaling up instead of out (adding CPU, RAM and faster disk).

ketralnis · on Dec 13, 2022

Sure, if that's a requirement then it's not a great fit. But it might not be a requirement.

simonw · on Dec 14, 2022

If you're worried about high availability should your instance fail (which is something that's worth worrying about, though many smaller web apps will likely work perfectly fine on a single instance with good scheduled backups) you should absolutely investigate https://litestream.io - it's the missing piece of the SQLite-for-web-apps story.

ok_dad · on Dec 13, 2022

You could partition your data to 2 parts, and direct your users to whichever service serves their partition of that data. Every database turns out to be just a set of files on disk, but for using SQLite you have to think differently than most databases due to the extreme locality of the data.

adpande · on Dec 22, 2022

Did any one tried https://github.com/rqlite/rqlite How well it can handle horizontal scalability. With added distributed features, it is not good as any other RDBMS (postgres or mysql)

otoolep · on Dec 22, 2022

rqlite[1] author here, happy to answer any questions.

rqlite offers horizontal and vertical real scalability, but only vertical for writes. See https://rqlite.io/docs/guides/performance/

[1] https://rqlite.io/

dgb23 · on Dec 13, 2022

The listed use cases for applications in the link you provided are:

- Application file format

- Websites

- Server-side database

Under not recommended they essentially say, very high volume and highly concurrent access.

eatonphil · on Dec 13, 2022

I worked on a set of extensions for mattn/go-sqlite3, written in Go, inspired but a little further along (in functionality and test coverage) than SQLean. Useful if you want to see examples of this in Go or are using Go rather than Rust to embed SQLite.

https://github.com/multiprocessio/go-sqlite3-stdlib

senthilnayagam · on Dec 13, 2022

I love sqlite and rust extension which author has given gets me excited.

Last week saw a similar project in rust to built Postgres extensions

https://github.com/tcdi/pgx

Hope I can play something interesting during vacations

maxpert · on Dec 13, 2022

Just in time! I've been planning a SQLite extension written in Rust for Marmot (https://github.com/maxpert/marmot) for evaluation purposes. I didn't wanted to touch C/C++ because I feel more comfortable with rust tooling. Thank you for doing this!

josephg · on Dec 13, 2022

I'm quietly confident that you could compile sqlite with a rust extension into a single .wasm file.

The first thing I'd try is to use crate-type = ["staticlib"] to compile the sqlite extension to a static .a file (cargo build --target wasm32-unknown-unknown), and compile sqlite to a staticlib separately then link them together into a .wasm file and run wasm-opt & friends on that.

There might be some configuration needed to let the extension know the sqlite functions its calling will be linked together later, but that should all be possible. Maybe with some whack-a-mole of wading through compiler versions and build flags and whatnot.

faizshah · on Dec 14, 2022

This is really nice, just wanted to add to your list that you can extend SQLite in python with peewee orm: https://docs.peewee-orm.com/en/latest/peewee/sqlite_ext.html...

The workflow is basically prototype your virtual tables in python and then write some optimized code in a lower level language later. Just an idea for those interested.

Kinrany · on Dec 13, 2022

SQLite has a big advantage over Postgres here if installing an extension is trivial.

michaelmior · on Dec 13, 2022

I've never written an extension for Postgres, but for the ones I've installed, it's amounted to copying some files and running "CREATE EXTENSION" This feels pretty trivial to me already.

giovannibonetti · on Dec 13, 2022

You might want to have a look at Pgx:

https://github.com/tcdi/pgx

matthewmueller · on Dec 14, 2022

Awesome project!

Does SQLite's virtual table API support writes?

panzerboiler · on Dec 14, 2022

Yes, it does. https://www.sqlite.org/vtab.html