I use jsonb heavily. While it is amazing, I definitely wouldn’t rely on it as a general purpose replacement for NoSQL/schemaless data storage.
An example of an issue I am dealing with currently: while you can create a gin index to speed up containment queries, Postgres doesn’t keep any statistics about jsonb columns. This means the query planner will sometimes do stupid things, like using the index even for very non-selective overlap conditions, which is a lot slower than just doing a sequential scan.
Less of an issue for me but worth considering: the size of the gin index in my use case seems to be about 5x bigger than the size of the unindexed data. I was surprised by the size increase. I only use the containment operator so I could make a smaller/faster index using the jsonb_path_ops operator class. This is on my todo list :)
Like all non-btree indexes in Postgres, the index is unordered. That means sorting by values in the jsonb column will always be slow. This doesn’t matter for selective queries, but exacerbates my already slow non-selective queries that return large result sets.
That said, if your queries are selective, jsonb + gin indexes are surprisingly performant (in the 0.5-10ms range for small result sets). My use case is a mix of structured relational data with jsonb for user-defined values (which of course they want to use for querying/sorting and I was dumb enough to say “sure, why not?”)
In terms of the magnitude of data, there’s roughly 10 million rows. Each team using this service has the query scoped to about 500k-1 million records, and then additional filters (on the jsonb column) will scope that down to anywhere between 60k-0 results.
Thanks for the detailed response. I'm curious, is the NoSQL store a canonical store of its own data? If not, how do you replicate from postgres to the secondary store?
I ask because where I work we sync postgres to a secondary store for search, but the way it's done in a piecemeal, application-specific way gives me the heebie jeebies. It almost certainly will result in that secondary store drifting. Unfortunately we can't use something like zombodb [1] as we're on amazon RDS. It seems like you know your stuff, and seeing non-deterministic consistency irritates the heck out of me!
Thank you for raising this question. Well, for my things I use MongoDB because of very convenient integration with programming languages (Go, C++ via Mongo C++) and zero hassle with schema
But I'll be happy to replace it by something else, my load is extremely small and only single requirement is to have DB as network daemon, not as embedded storage as it will be used by 2 applications (main daemon an API).
If you have ever used Mongo Atlas, you would understand why people use mongo. The administration ease that the platform provides is unmatched by any other DBaaS I am aware of.
Surprisingly, I feel Mongo has been kind of surging back into developers' minds. I thought Mongo was utterly dead, and I also don't know why would one use Mongo instead of JSONB. The last I heard (3-4 years ago?), there were some fundamental problems with Mongo.
That said, I've incidentally heard a lot about Mongo in the last half a year. Might be my bubble. Might be MongoDB actually maturing and getting really good. I hope it's the latter.