> The max(t1.date) and group by t1.state is a useful SQLite trick[0]: if you perform a group by and then ask for the max() of a value, the other columns returned from that table will be the columns for the row that contains that maximum value.
I had no idea that "bare columns" were an actual advertised feature of SQLite — I just assumed the bare column results were a happy accident (a consequence of SQLite's easy-going conventions) that seems to work (i.e. not throw an error) — but should otherwise be avoided, especially as it can't be depended on when migrating to stricter standards like pgsql or bigquery.
I'll think I'll continue to avoid it, but good to know that the seemingly correct-by-coincidence results are actually by intentional design.
Yeah I remember noticing that too. I use it but will have to deal with some rewriting if I move my current sqlite stuff to another db. I'm not even sure what the best way to do it is.
There's a SQLite virtual table mechanism for reading from CSV files on disk but I haven't seen the equivalent for JSON. In any case provided you have the disk space importing JSON is fast and the single resulting .db file is easy to tolerate in my opinion.
Unless you're talking about 10+GBs of JSON I'd recommend importing them and seeing how far you get.
Once the JSON has been loaded into tables the other things you want to do should all be very feasible.
Not really. I mean, yes, with unlimited effort you can but you'd have to write your own virtual table implementation. There is no existing JSON virtual table module. You'd have to do the indexing yourself in your virtual table implementation; it's not possible to add an SQLite index on a virtual table. If you imported the JSON data into SQLite tables you still can't do it, because SQLite doesn't support GIN indexes. You would have to extract the data from the JSON into separate columns and then index those columns. Now you've got an index, but you still can't do a fuzzy text search on it. You'd have to use FTS5, or create yet more columns with something like a Soundex of the original value. I don't think SQLite is a good fit here. PostgreSQL is the clear winner.
Thanks for this. Being able to run SQL directly on CSV spreadsheets or JSON blobs is wonderful! Like ultra-easy ETL instead of fiddling with code to import data for further use.
Thanks for this. I had never seen the Python library before, and I will probably use it now. I really liked being able to attach multiple databases and the SQL query format for supporting that.
I've always wondered: how is this different from packages like Dataset (https://dataset.readthedocs.io/en/latest/) or IDEs that offer ways of doing things like Jetbrains' Datagrip?
I've read your site, but could you offer a brief summary?
It's in the same category as Dataset (confusing, since that name is similar to my other package https://datasette.io/ which does something very different!
I'm particularly proud of the `sqlite-utils memory` command, which lets you run joins across data from CSV and JSON files as a shell one-liner: https://simonwillison.net/2021/Jun/19/sqlite-utils-memory/