I adore tagging systems and have worked on them in several different applications and implementations, but there are always pitfalls and trade offs, and it’s possible to bury yourself
Nowadays I nearly always store the assigned tags as an integer array column in Postgres, then use the intarray extension to handle the arbitrary boolean expression searches like “((1|2)&(3)&(!5))”. I still have a tags table that stores all the metadata / hierarchy / rules, but for performance I don’t use a join table. This has solved most of my problems. Supertags just expand to OR statements when I generate the expression. Performance has been excellent even with large tables thanks to pg indexing.
Yes, it’s explained in the intarray doc here. GiST is the one I use, but as it states GIN should be faster on reads. I haven’t really thought about that in many years, I should run some perf tests.
Read perf would be good in isolation, sure; but what about the lock contention over those rows due to the frequent writes to the tags column? Usually "taggings" have a much higher rate-of-change than the objects they tag. If you're not at least keeping a thing_taggings has-one table (thing bigserial, tags intarray) separate from your things table, I could see this degrading performance for any query that wants to touch the table.
Can you search by tag expressions like “((1|2)&(!3))” with your method? I’m not understanding it completely. I often use json_data but I didn’t consider tagging methods because of the overhead it has.
The tradeoff here is that you lose the foreign key constraint, correct? So if you delete a tag, there is no way for the database to automatically remove all references to it. Or is there some way to do this now?
> So if you delete a tag, there is no way for the database to automatically remove all references to it.
I'm not sure about implementation/support in Postgres specifically, but in the general case of a column of tag bitfields, the database could easily maintain a global popcount (ie, "number of rows with this tag") and soft-delete flag for each tag, and clear any soft-deleted tags on (possibly-only-write-)access. When a soft-deleted tag reaches popcount == zero, it counts as garbage collected and can be reused for a new tag.
But then again, the reasons for _deleting_ a tag are very low. What really happens is that you want your searches on that tag to just return nothing, and that's more of an application level responsibility. Your #absolutely_vile_tag_that_would_get_you_in_jail just enters a blacklist, and you're done.
Yes, but that’s easily handled with a trigger. My first implementation actually had a regular join table of items_tags which used a trigger to update the items.tags intarray. Wasn’t super performant but let us use our existing templates for 1-many to implement the UI.
Nowadays you can just use a tagging component with integrated search for the UI.
Nowadays I nearly always store the assigned tags as an integer array column in Postgres, then use the intarray extension to handle the arbitrary boolean expression searches like “((1|2)&(3)&(!5))”. I still have a tags table that stores all the metadata / hierarchy / rules, but for performance I don’t use a join table. This has solved most of my problems. Supertags just expand to OR statements when I generate the expression. Performance has been excellent even with large tables thanks to pg indexing.