You can also create arbitrary indices on derived functions of your JSONB data, which I think is something that a lot of people don't realize. Postgres is a really, really good NoSQL database.
Sure. Suppose that we have a trivial key-value table mapping integer keys to arbitrary jsonb values:
example=> CREATE TABLE tab(k int PRIMARY KEY, data jsonb NOT NULL);
CREATE TABLE
We can fill this with heterogeneous values:
example=> INSERT INTO tab(k, data) SELECT i, format('{"mod":%s, "v%s":true}', i % 1000, i)::jsonb FROM generate_series(1,10000) q(i);
INSERT 0 10000
example=> INSERT INTO tab(k, data) SELECT i, '{"different":"abc"}'::jsonb FROM generate_series(10001,20000) q(i);
INSERT 0 10000
Now, keys in the range 1–10000 correspond to values with a JSON key "mod". We can create an index on that property of the JSON object:
example=> CREATE INDEX idx ON tab((data->'mod'));
CREATE INDEX
And we can check that the query is indexed, and only ever reads 10 rows:
example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tab (cost=5.06..157.71 rows=100 width=40) (actual time=0.035..0.052 rows=10 loops=1)
Recheck Cond: ((data -> 'mod'::text) = '7'::jsonb)
Heap Blocks: exact=10
-> Bitmap Index Scan on idx (cost=0.00..5.04 rows=100 width=0) (actual time=0.026..0.027 rows=10 loops=1)
Index Cond: ((data -> 'mod'::text) = '7'::jsonb)
Planning Time: 0.086 ms
Execution Time: 0.078 ms
If we did not have an index, the query would be slower:
example=> DROP INDEX idx;
DROP INDEX
example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on tab (cost=0.00..467.00 rows=100 width=34) (actual time=0.019..9.968 rows=10 loops=1)
Filter: ((data -> 'mod'::text) = '7'::jsonb)
Rows Removed by Filter: 19990
Planning Time: 0.157 ms
Execution Time: 9.989 ms
Hence, "arbitrary indices on derived functions of your JSONB data". So the query is fast, and there's no problem with the JSON shapes of `data` being different for different rows.