sqlite-utils - CLI & Python utility functions for manipulating SQLite databases

simonw · on Feb 23, 2022

I've been developing this set of tools for a few years now - here's a series of blog posts about the feature highlights: https://simonwillison.net/series/sqlite-utils-features/

I'm particularly proud of the `sqlite-utils memory` command, which lets you run joins across data from CSV and JSON files as a shell one-liner: https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/

danso · on Feb 24, 2022

Tagential thing I learned from your blog post:

> The max(t1.date) and group by t1.state is a useful SQLite trick[0]: if you perform a group by and then ask for the max() of a value, the other columns returned from that table will be the columns for the row that contains that maximum value.

I had no idea that "bare columns" were an actual advertised feature of SQLite — I just assumed the bare column results were a happy accident (a consequence of SQLite's easy-going conventions) that seems to work (i.e. not throw an error) — but should otherwise be avoided, especially as it can't be depended on when migrating to stricter standards like pgsql or bigquery.

I'll think I'll continue to avoid it, but good to know that the seemingly correct-by-coincidence results are actually by intentional design.

[0] https://www.sqlite.org/draft/lang_select.html#bareagg

throwaway81523 · on Feb 24, 2022

Yeah I remember noticing that too. I use it but will have to deal with some rewriting if I move my current sqlite stuff to another db. I'm not even sure what the best way to do it is.

MaknMoreGtnLess · on Feb 24, 2022

Would this be possible using SQLite:

- I point it to a bunch of JSON files on disk. They have similar schema but not exact (slight variations)

- I SQLite import them into one virtual table (I would rather not literally import them - think PostgreSQL JSON FDW)

- Then index certain fields so looking up records by certain fields is very fast (faster than having to "FTS" through all the files)

- Allow fuzzy text search on certain fields (say a field was company name and other fields were city, street and human names)

All the while (best case) not actually having to import the files into the DB (it's ok if the indices need to be rebuilt everytime)

simonw · on Feb 24, 2022

There's a SQLite virtual table mechanism for reading from CSV files on disk but I haven't seen the equivalent for JSON. In any case provided you have the disk space importing JSON is fast and the single resulting .db file is easy to tolerate in my opinion.

Unless you're talking about 10+GBs of JSON I'd recommend importing them and seeing how far you get.

Once the JSON has been loaded into tables the other things you want to do should all be very feasible.

- Fuzzy text search can be done using SQLite trigram indexes https://www.sqlite.org/fts5.html#the_experimental_trigram_to...

- I'd split JSON columns that you want to index out into indexed regular columns - there are a bunch of tricks in sqlite-utils for doing that, see https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/

MaknMoreGtnLess · on Feb 28, 2022

Do you think unqlite would be a better fit?

Have you looked at unqlite?

electroly · on Feb 24, 2022

Not really. I mean, yes, with unlimited effort you can but you'd have to write your own virtual table implementation. There is no existing JSON virtual table module. You'd have to do the indexing yourself in your virtual table implementation; it's not possible to add an SQLite index on a virtual table. If you imported the JSON data into SQLite tables you still can't do it, because SQLite doesn't support GIN indexes. You would have to extract the data from the JSON into separate columns and then index those columns. Now you've got an index, but you still can't do a fuzzy text search on it. You'd have to use FTS5, or create yet more columns with something like a Soundex of the original value. I don't think SQLite is a good fit here. PostgreSQL is the clear winner.

MaknMoreGtnLess · on Feb 28, 2022

Do you think unqlite would be a better fit?

Have you looked at unqlite?

tibudiyanto · on Feb 24, 2022

Yes! I tried this exact usecase using sqlite-utils as a python lib.

I basically made a python script that open each of the json file and insert it into a sqlite inmemory db using sqlite-utils insert.

Then you have a regular sqlite db (in memory) that you can work it!

zhengiszen · on Feb 23, 2022

Kudos and thank you !

john-tells-all · on Feb 24, 2022

Thanks for this. Being able to run SQL directly on CSV spreadsheets or JSON blobs is wonderful! Like ultra-easy ETL instead of fiddling with code to import data for further use.

pabs3 · on Feb 24, 2022

SQLite itself can deal directly with CSV.

mark_l_watson · on Feb 24, 2022

Thanks for this. I had never seen the Python library before, and I will probably use it now. I really liked being able to attach multiple databases and the SQL query format for supporting that.

knighthack · on Feb 24, 2022

I've always wondered: how is this different from packages like Dataset (https://dataset.readthedocs.io/en/latest/) or IDEs that offer ways of doing things like Jetbrains' Datagrip?

I've read your site, but could you offer a brief summary?

simonw · on Feb 25, 2022

It's in the same category as Dataset (confusing, since that name is similar to my other package https://datasette.io/ which does something very different!

I've not worked with Jetbrains Datagrip.

tunedmystic · on Feb 24, 2022

Awesome set of tools!