You can also create arbitrary indices on derived functions of your JSONB data, w...

cooperaustinj · on Oct 11, 2023

Can you expand on this? Documentation or an example so I can learn?

tczMUFlmoNk · on Oct 11, 2023

Sure. Suppose that we have a trivial key-value table mapping integer keys to arbitrary jsonb values:

    example=> CREATE TABLE tab(k int PRIMARY KEY, data jsonb NOT NULL);
    CREATE TABLE

We can fill this with heterogeneous values:

    example=> INSERT INTO tab(k, data) SELECT i, format('{"mod":%s, "v%s":true}', i % 1000, i)::jsonb FROM generate_series(1,10000) q(i);
    INSERT 0 10000
    example=> INSERT INTO tab(k, data) SELECT i, '{"different":"abc"}'::jsonb FROM generate_series(10001,20000) q(i);
    INSERT 0 10000

Now, keys in the range 1–10000 correspond to values with a JSON key "mod". We can create an index on that property of the JSON object:

    example=> CREATE INDEX idx ON tab((data->'mod'));
    CREATE INDEX

Then, we can query over it:

    example=> SELECT k, data FROM tab WHERE data->'mod' = '7';
      k   |           data            
    ------+---------------------------
        7 | {"v7": true, "mod": 7}
     1007 | {"mod": 7, "v1007": true}
     2007 | {"mod": 7, "v2007": true}
     3007 | {"mod": 7, "v3007": true}
     4007 | {"mod": 7, "v4007": true}
     5007 | {"mod": 7, "v5007": true}
     6007 | {"mod": 7, "v6007": true}
     7007 | {"mod": 7, "v7007": true}
     8007 | {"mod": 7, "v8007": true}
     9007 | {"mod": 7, "v9007": true}
    (10 rows)

And we can check that the query is indexed, and only ever reads 10 rows:

    example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
                                                      QUERY PLAN                                                   
    ---------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on tab  (cost=5.06..157.71 rows=100 width=40) (actual time=0.035..0.052 rows=10 loops=1)
       Recheck Cond: ((data -> 'mod'::text) = '7'::jsonb)
       Heap Blocks: exact=10
       ->  Bitmap Index Scan on idx  (cost=0.00..5.04 rows=100 width=0) (actual time=0.026..0.027 rows=10 loops=1)
             Index Cond: ((data -> 'mod'::text) = '7'::jsonb)
     Planning Time: 0.086 ms
     Execution Time: 0.078 ms

If we did not have an index, the query would be slower:

    example=> DROP INDEX idx;
    DROP INDEX
    example=> EXPLAIN ANALYZE SELECT k, data FROM tab WHERE data->'mod' = '7';
                                                QUERY PLAN                                             
    ---------------------------------------------------------------------------------------------------
     Seq Scan on tab  (cost=0.00..467.00 rows=100 width=34) (actual time=0.019..9.968 rows=10 loops=1)
       Filter: ((data -> 'mod'::text) = '7'::jsonb)
       Rows Removed by Filter: 19990
     Planning Time: 0.157 ms
     Execution Time: 9.989 ms

Hence, "arbitrary indices on derived functions of your JSONB data". So the query is fast, and there's no problem with the JSON shapes of `data` being different for different rows.

See docs for expression indices: https://www.postgresql.org/docs/16/indexes-expressional.html