Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
sqlite-utils - CLI & Python utility functions for manipulating SQLite databases (datasette.io)
134 points by punnerud on Feb 23, 2022 | hide | past | favorite | 16 comments


I've been developing this set of tools for a few years now - here's a series of blog posts about the feature highlights: https://simonwillison.net/series/sqlite-utils-features/

I'm particularly proud of the `sqlite-utils memory` command, which lets you run joins across data from CSV and JSON files as a shell one-liner: https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/


Tagential thing I learned from your blog post:

> The max(t1.date) and group by t1.state is a useful SQLite trick[0]: if you perform a group by and then ask for the max() of a value, the other columns returned from that table will be the columns for the row that contains that maximum value.

I had no idea that "bare columns" were an actual advertised feature of SQLite — I just assumed the bare column results were a happy accident (a consequence of SQLite's easy-going conventions) that seems to work (i.e. not throw an error) — but should otherwise be avoided, especially as it can't be depended on when migrating to stricter standards like pgsql or bigquery.

I'll think I'll continue to avoid it, but good to know that the seemingly correct-by-coincidence results are actually by intentional design.

[0] https://www.sqlite.org/draft/lang_select.html#bareagg


Yeah I remember noticing that too. I use it but will have to deal with some rewriting if I move my current sqlite stuff to another db. I'm not even sure what the best way to do it is.


Would this be possible using SQLite:

- I point it to a bunch of JSON files on disk. They have similar schema but not exact (slight variations)

- I SQLite import them into one virtual table (I would rather not literally import them - think PostgreSQL JSON FDW)

- Then index certain fields so looking up records by certain fields is very fast (faster than having to "FTS" through all the files)

- Allow fuzzy text search on certain fields (say a field was company name and other fields were city, street and human names)

All the while (best case) not actually having to import the files into the DB (it's ok if the indices need to be rebuilt everytime)


There's a SQLite virtual table mechanism for reading from CSV files on disk but I haven't seen the equivalent for JSON. In any case provided you have the disk space importing JSON is fast and the single resulting .db file is easy to tolerate in my opinion.

Unless you're talking about 10+GBs of JSON I'd recommend importing them and seeing how far you get.

Once the JSON has been loaded into tables the other things you want to do should all be very feasible.

- Fuzzy text search can be done using SQLite trigram indexes https://www.sqlite.org/fts5.html#the_experimental_trigram_to...

- I'd split JSON columns that you want to index out into indexed regular columns - there are a bunch of tricks in sqlite-utils for doing that, see https://simonwillison.net/2021/Aug/6/sqlite-utils-convert/


Do you think unqlite would be a better fit?

Have you looked at unqlite?


Not really. I mean, yes, with unlimited effort you can but you'd have to write your own virtual table implementation. There is no existing JSON virtual table module. You'd have to do the indexing yourself in your virtual table implementation; it's not possible to add an SQLite index on a virtual table. If you imported the JSON data into SQLite tables you still can't do it, because SQLite doesn't support GIN indexes. You would have to extract the data from the JSON into separate columns and then index those columns. Now you've got an index, but you still can't do a fuzzy text search on it. You'd have to use FTS5, or create yet more columns with something like a Soundex of the original value. I don't think SQLite is a good fit here. PostgreSQL is the clear winner.


Do you think unqlite would be a better fit?

Have you looked at unqlite?


Yes! I tried this exact usecase using sqlite-utils as a python lib.

I basically made a python script that open each of the json file and insert it into a sqlite inmemory db using sqlite-utils insert.

Then you have a regular sqlite db (in memory) that you can work it!


Kudos and thank you !


Thanks for this. Being able to run SQL directly on CSV spreadsheets or JSON blobs is wonderful! Like ultra-easy ETL instead of fiddling with code to import data for further use.


SQLite itself can deal directly with CSV.


Thanks for this. I had never seen the Python library before, and I will probably use it now. I really liked being able to attach multiple databases and the SQL query format for supporting that.


I've always wondered: how is this different from packages like Dataset (https://dataset.readthedocs.io/en/latest/) or IDEs that offer ways of doing things like Jetbrains' Datagrip?

I've read your site, but could you offer a brief summary?


It's in the same category as Dataset (confusing, since that name is similar to my other package https://datasette.io/ which does something very different!

I've not worked with Jetbrains Datagrip.


Awesome set of tools!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: