Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How Postgres Triggers Can Simplify Your Back End Development (themythicalengineer.com)
93 points by sks147 on April 23, 2023 | hide | past | favorite | 110 comments


My only hesitation with methods like this is it ends up splitting the business rules into two places, where one is sort of obscured.

It's obvious to look at `add_new_payment` for the code that runs when adding a new payment, but then the code isn't there, so you have to know/ask or search in either migrations, a fresh structure dump or poke at the actual db (!).

I think they're great for other, well, effects when needed. PostgreSQL is a real powerhouse.


Yeah I was expecting the post author to discuss the trade-off being made here because it’s really important to do so. The biggest complaint I have with these pithy articles is that they try to sell you on a particular trade-off without explaining what the deal is. It makes me think the author:

1. Just discovered this. 2. Implemented a bunch of them. 3. Hasn’t been maintaining this solution for more than a couple of months.

This is a trade-off that only manifests itself after maybe a couple years when you’ve built a system and you have many hands maintaining it. Either there’s some performance problem that makes it worth it or you’ve just obfuscated half your code for no reason. In the latter case those many hands are going to wreak havoc on your system before you figure out how to make it maintainable.


> The biggest complaint I have with these pithy articles is that they try to sell you on a particular trade-off without explaining what the deal is.

This is basically the clickbait in the wider world affecting software development, even though it might not look like it. Boiled down to the essentials, we are telling each other the software version of "Here's How to Lose 10 Pounds in Time for Summer." way more often than "How To Balance Diet and Exercise to Remain Healthy Over A Decade".

But it's up to us to those of us who like more discussion to reverse these trends. In that vein -- do you have good sources that typically talk about tradeoffs with technology rather than promoting a specific one? Kleppmann's book on Data Intensive applications is one I have found in the past.


It’s unfortunately hard for me to point to a source because everything I know I absorbed from more experienced engineers on hardware projects.


I'm not sure if the post was updated after your comment, but from the article:

"Triggers should be used with caution since they can obscure critical logic and create an illusion of automatic processes. While this can be advantageous in certain scenarios, it can also pose a challenge in terms of debugging, testing, and monitoring since they are not readily visible to developers."


DB upgrades are a pain. Triggers must be deployed with “UPDATE” instructions, it’s not as easy as keeping code in git.


Triggers are DDL, unless you've implemented them in the app layer.


I think this problem can be robustly solved if you have the right mechanisms in place:

1. Migrations. Your schema needs to live in version control, and changes to your schema must be applied by an automated system. Django migrations are the gold standard here in my opinion, but you can stitch together a custom system if you need to, one that tracks which migrations have been run already and provides a mechanism to apply new ones.

2. Automated tests. Your triggers MUST be covered by automated tests - call them unit tests or integration tests depending on your preferences, but they need to be exercised. Don't fall into the trap of mocking your database in your tests!

With these two measures in place I'm 100% comfortable using triggers that split my business logic between my application code and my database code.


I don't think migrations (at least as done by Django et al.) solve it - you want a declarative source of truth for what the schema looks like today, not a chain of changes that only tell you that after computing the combined effect.

Even if they just created a generated file of the final schema, that sat in version control and errored the makemigrations check (just like a missing migration) if it was out of sync, that would be a significant improvement IMO. But I think the Django maintainers at least would say they want the ORM DSL to be that. (But it's way too incomplete, you'd be limited to a tiny subset of postgres, and even then you have to be on board with that being a reasonable description of your schema, not wanting the actual db schema anywhere.)


If you really, really need to be able to see a SQL schema representing the current state, a cheap trick is to run an automation on every deploy that snapshots the schema and writes it to a GitHub repository.

I do a version of that for my own (Django-powered) blog here: https://github.com/simonw/simonwillisonblog-backup/blob/main...


This comes up a lot, and I'm always surprised that more people don't know about Flyway and "repeatable migrations": https://flywaydb.org/documentation/tutorials/repeatable

> Repeatable migrations are very useful for managing database objects whose definition can then simply be maintained in a single file in version control. Instead of being run just once, they are (re-)applied every time their checksum changes.


One of my pet projects addresses these points you raise about Django + migrations, have a look:

https://pypi.org/project/DBSamizdat

In a nutshell it allows you to keep DB functions, triggers and views around as Python classes, so you can version them together with the rest of your application code. The DB state gets updated for you (in the right dependency order) whenever you change them.

It can also run without Django.


This is exactly what I would use dbt for


Had a thought recently. What's stopping someone from separating table migrations from others like functions/triggers?

The standard tables can use the standard process while triggers, etc are run declaratively. When you change something, a tool could simply tear everything down and rebuild it. No need to worry about data since those are handled separately.

Would performance be a concern? Or is there something I'm missing?


> you want a declarative source of truth for what the schema looks like today, not a chain of changes that only tell you that after computing the combined effect

Applying a series of migrations to get a final db schema is not much different than a version control system like git.


Ok, so how do I checkout a version and view the file/schema tree?

If you like, you can view my comment above as saying 'if they included that tooling, not just the similar tree of changes stored, it would be better'.

How to get from one state to the next is interesting to the computer, not to me (not after I've initially written it/done it in prod anyway), I'm interested in the result, where do we stand after all of the migrations (currently on disk having checked out whatever).


You checkout that version, then run the migrations from scratch against a fresh database.

In Django that's "./mange.py migrate".


And then inspect the db, exactly. In other words you can't, there is no declarative 'current state' checked in or provided by the migration tooling.


If you are going to use some kind of meta-schema tool to manage migrations, you should store your declarative schema in whatever syntax this tool understands and allow it to generate the schema-mutating migration commands as runtime artifacts (more like a build output than a versioned source).

If not using such a tool, you might adopt a little meta-schema logic in your own DDL. With PostgreSQL, you can try writing idempotent and self-migrating schemas. Using syntax variants like CREATE OR REPLACE for functions and views, DROP ... IF EXISTS, CREATE ... IF NOT EXISTS, and ALTER ... ADD IF NOT EXISTS allows some of your DDL to simply work on an empty DB or a prior version. Wrapping all this within a pl/pgsql block allows conditional statements to run DDL variants. Conditionals are also useful if you need to include some custom logic for data migration alongside your schema migration, i.e. create a new structure, copy+transform data, drop old structure.

For smaller DBs, you may be able to afford some brute-force techniques to simplify the DDL. Things like DROP IF EXISTS for multiple earlier versions of views, indexes, or constraints to clear the slate and then recreating them with the latest definitions. This may add IO costs to execute it, but makes the DDL easier to read as the same DDL statements are used for the clean slate and the migration. Similarly, a little pl/pgsql logic could loop over tables and apply triggers, policies, etc. that you want to use systematically in a project.

If possible, you can also prune your code so that any specific version in your source control only has logic to handle a few cases, i.e. clean slate builds and a migration of N-1 or N-2 to latest. This minimizes the amount of branched logic you might have to understand when maintaining the DDL. The approach here depends a lot on whether you are versioning a product where you control the deployment lifecycle or versioning a product release which may encounter many different prior versions "in the wild".

In any case, you have a more complicated test requirement if you want to validate that your system works from multiple starting points. I.e. clean-slate builds as well as migrations from specific earlier versions. I think this is true whether you are using some higher level migration management tooling or just rolling your own conditional and idempotent DDL.


I don't understand. What are you looking for here?

If you want a plain text SQL file to look at you can have that with a bit of extra automation - for example, every time you tag a release you could have a script that runs "./manage.py migrate" against that checkout and then dumps the schema out to a file somewhere (an asset attached to the release on GitHub as one example).


That's pretty much what I was describing in my initial reply to your top-level comment that would be an improvement, just not built in.

Ideally though I'd like a tool with first class declarative schema as the source of truth - migrations that Django can correctly autogenerate don't need to concern me at all; if I need to vary them or resolve some ambiguity in a possible path (renamed a column vs. dropped and added one, for a simple example) then there can be some way of specifying that migration. Essentially as Django targets its models as the intended result of migrations (and complains of a missing migration if they don't get there, etc.) I'd prefer instead to target SQL schema. Still check ORM consistency - but against the schema, not the migrations of it.

Starting from scratch it seems obvious that you'd have the schema, and the mapping into python models. Django does away with the former by inference, but in so doing limits its scope to the subset of possibly desired SQL that it understands.

If you have SQL and Python, then you can have DSL for the understood subset without losing migrations, tracking in a single place, etc. of the rest. You could also give Django ownership of the entire database/schema, so that any manual experiments/adjustments would be blown away on migration, they'd have to be documented in code to be persisted.


It’s completely different as git does not store deltas but snapshots of whole files and file trees.


> Django migrations are the gold standard here in my opinion

Intriguing. In JavaWorld it's really only Flyway and Liquibase, and the important self-imposed rule, "Thou shalt never remove a column from a table."


I've always resisted the Django migration approach personally. One of the things I disliked most about it.


But what problem is moving to triggers solving? Is keeping all your logic in a single place actually a problem, or is moving part of it to Postres tempting because it's nifty? How much training will you have to provide for new hires to be able to run with Postgres triggers instead of just Django logic?


Triggers protect you against bugs.

If your application logic needs to remember to update a denormalized column somewhere it's very easy for a bug to slip in in the future when someone adds a new piece of Python code but forgets to update the denormalized column.

With triggers you can eliminate that source of bugs entirely.

This is even more valuable if you have more than one system interacting with a single database.


Triggers can also be significantly more performant depending on what you are doing.

But yeah having the business logic split like this is a hard sell. There needs to be an extremely good reason to do it. It adds developer overhead because not only do you have to maintain another test harness and deal with an entirely separate system encapsulating yours, every time you touch business logic you’re now wasting some brainpower deciding where it should live.


I did this with robust DB constraints and selective use of triggers and stored functions. It can feel like a strait jacket long term. Though it did help protect against less competent DBAs breaking things and avoid some footguns.


Table triggers are the ultimate foot gun in this respect. They are highly obscured! Stored procedures split your business logic too, but when you want to go and look at your database logic, it’s at least where you’re expecting to find it, and not built into a table.

I’d highly recommend people avoid them, unless you feel that you really need them _and_ you have very robust development processes. As soon as you deploy your first table trigger, from that moment you have to check every DML statement for unintended side effects.


> from that moment you have to check every DML statement

If that's the case, you have a documentation problem. It should be easy to decide if there are side effects or not just by looking at the DML and the metadata you need for it.

In fact, the case here is that the trigger on the article is an incredibly bad one. It's not a natural consequence of the table, or the database structure. It's probably not even always true to the business rules, what is the one property triggers must have no matter what.


The problem isn’t really that it’s hard to document. The problem is that when you don’t use triggers, you don’t have to consult documentation to know what’s going to happen after an insert (or an update, or delete…). People who haven’t used triggers would assume that a row will simply be inserted. But if you use triggers this is no longer a safe assumption.


Business rules are already often in multiple separated places in a codebase. If you mean "it should be in the same git repository", then triggers should be already in migrations.

But IMO, even migrations are a stopgap solution to this problem. In some cases, such as in ORMs like Rails' ActiveRecord, even the column names aren't present in the model by default, they are only migration files.

The real solution is putting the definition of the triggers (and all other database things too) closer to the code that operates on them. In models, for example. And models should enforce that only the defined triggers are present in the database. Of course, since most ORM users aren't really big DB users, this isn't exactly common. Until we get features like this, I agree that triggers can be confusing.


If you are using some JPA implementation, eg Hibernate, then JPA events are more or less what you are talking about except JPA can't enforce the condition that JPA-defined events are the only triggers present. If you write custom triggers then you run the risk of stale entity data being presented to the user.

But otherwise, this fulfills the condition of triggered events residing in the codebase. Of course, this means you have to be comfortable with using an ORM in the first place.


You can say similar things about any API based development though. Triggers are just a part of the database API.

Databases exist to manage data, not just store and retrieve it.


it is also a very restrictive environment, you either only trigger on a single in transaction table, or you risk having triggers tripping other triggers, limiting the approach scale anyway. at which point a well managed transaction from an active record or a data gateway will do miles better.


You have exactly the same problem in microservice architecture while calling a remote endpoint.


Why is this the top story? This is a major foot gun. Don’t write business logic in the database. You may think you are simplifying things but in fact you are making them more complex.

Instead adopt a solution for structuring your business logic in a sane way, such as using a workflow engine. Your code will become simpler and well organized that way without creating a tangled web of distributed rules, as well as exist all in one place.


> Don’t write business logic in the database. You may think you are simplifying things but in fact you are making them more complex.

Alternatively, write all the business logic in the database. This way you can better leverage the DB features and ensure that logic only needs to be written once.


I have worked with systems with nearly 100k lines code in Oracle stored procs, and another legendary place with around 4 million.

It was a nightmare. Deployments were very difficult, there was little tooling, reasoning about the system was difficult, and of course running so much code in Oracle required very expensive licenses.

And it is much harder to hire hard core PL/SQL devs over Java, C#, Python or whatever.

You really don’t want significant code in the DB. It can be useful for some cases, like automating audit tables, but that is about it.

And for triggers - I feel for anyone maintaining business logic located in triggers. What a debugging hell that can be.


Wow. I was going to say that no one in their right mind would even consider putting all business logic at the database layer. I’m willing to bet there were tons of Oracle-specific features being used too.


Oh, yes, absolutely.

At one point Terradata claimed they had a tool that could auto convert to something else with 99% success rate, or something like that. The reality on our samples we tried was like 40%.

Plus, Pl/SQL code is often very “chatty” with the data, if you try to port it as is you will often suddenly see performance issues as you see the amount of data that has to be sent over the wire from the DB to your code tier.


This is a problem of SQL, not a database problem.


I worked for a short while at a place that tried following a similar dogma. Hiring was incredibly difficult, as was retaining people (such as myself). Writing business logic in code instead of DB functions is much more approachable than keeping it in the DB.


There's also a real friction here with modern devops tooling. We have great off the shelf patterns now for doing blue/green deployments, monitoring, etc. Having to run a migration every time you want to update some business logic feels a lot worse even if it has some marginal benefits in terms of single source of truth.


I spent the better half of a decade trying to rid us of the business logic in the database. At some point, the hole was too deep as we had stored procedures calling each other, such a mess. I could go on and on but also found hiring difficult, we were a small team so a database-only developer was a hard pill to swallow.

I eventually left. We pivoted to a new product and leadership agreed the old system was legacy to remain untouched. Eventually this thinking changed and the old product was to be integrated with the new. Sprocs were back baby! My battle was lost, I was done.


Using SQL for business logic is probably worst then using Cobol in 2023 - good luck with automated tests, debug, documentation etc. Yes, I know it all technically can be done but you relly need to be masohistic and mindless bastard in order to do so.

Rather, lets rejoice with latest C#, zig, rust, PoweShell or whatever-beautiful-language-and-ecosystem-perfected-in-latest-decade we have, instead of horrible SQL.


You don’t have to write business logic in the db. You essentially listen for notifications from workers who implement the business logic depending on the db changes. It is essentially like a poor man’s message queue for us.


Workflow engines are neither simple, nor they are widely available for all programming languages.


Can you give some more detail plaese?


We use PG Notify at work extensively and it's the source of a lot of pain and suffering. Not because of the functionality itself, more so because what we did to ourselves by using it in the way we did.

I think this could be great for certain projects, but there is a lot room to put yourself into a situation that's hard to maintain if you/you're team doesn't possess the right amount of discipline around documentation, developer tooling, observability etc..


> more so because what we did to ourselves by using it in the way we did.

what was the cause of pain?


What's old is new again. In 2007 we were using triggers and stored procedures heavily with mysql and a java app. Unfortunately we were also reliant on read replicas. Some of this replication behaviour did not translate well with the mix of these functions and auto incrementing IDs. Sometimes it would result in foreign key constraint violations and all of a sudden our replication would stop. This was even worse when we tried multi master setups. I spent years dealing with this. Ultimately we dropped the use of the most complex queries and shifted them into code which made the replication more stable but at the cost of Dev time.

Morale of the story, use it with caution. I know postgres is different but when you start turning you database into a ball of mud things get dangerously difficult to debug and fix.


90% of the time this kind of thing should be done in stored procedures, not triggers. You know when you are calling a stored procedure; you cannot do it accidentally. Triggers can cause things to happen "by magic" if you aren't keenly aware that they are there. They also complicate large updates.

Triggers to do something that's simple and always required, e.g. updating a primary key index for a new row (before autoincrement was available) can be OK, but use them sparingly.

I like putting business logic in the database, because you only write it once and not for each client application. Client applications and platforms and their development languages come and go a lot more frequently than databases. But I use stored procedures almost always, and rarely triggers.


I feel like we're coming full circle 30+ years and have to re-learn the perils of db triggers. When to use them and when to not. These days most of the db has been abstracted away from developers using ORMs so it must seem like the discovery of something new when in fact we already know.

Don't write business logic in the datastore.


> Triggers should be used with caution since they can obscure critical logic and create an illusion of automatic processes

Summarizes why in my opinion using triggers is rather risky and confusing. You introduce side effects to operations that one might suspect are CRUDlike. Your code is made non-atomic, in that you need knowledge of what happens elsewhere to guess why it's behaving a certain way. On small projects it's rather tempting, but small projects become large projects that then get given to someone else to maintain, and 4y later someone will spend a week trying to understand why the amount column gets updated to another value that they're pushing.

The only use that I find safe is for database metadata, say if you're using triggers to keep track of write origins, or schema metadata. For everything that's business logic, I'd stay away from them


I love these types of techniques. Need a basic no nonsense queue? Postgres. Need a basic reporting infrastructure? Postgres. Need a document store? Postgres.

But every single time this comes up, people on the engineering teams Ive been on all throw their hands up and accuse folks of overengineering or underengineering. You need rabbit or kafka. We should move to mongo. Etc.

Thats the part thats hard.


Just had this conversation with one of my juniors. He brought up scalability concerns with a PG queue and wanted to know if EG REDIS was a better choice. My whole tack was to lay out the actual context for the system—we’re going to soft launch, it’s for a hardware product, and we’ll have some one post-launch to make changes. Then I brought up my values since I’m basically in charge: maintainability, speed to first release, and ease of understanding. He made the decision that a PG queue was better for now because it’s easy to change our minds later, we already know PG, and we’re not anticipating fast scaling in this application.

I think a lot of the problem in industry is simply lack of rigor. We talk a big game about being software engineers but nobody takes the time to talk about values, requirements, trade-offs or business context for decisions. It’s a shame because these topics are the actual engineering of the system.


Most people on modern engineering teams haven't built systems services themselves. I've seen this cause a lot of insecurity on making choices, because they don't have the experience in how to actually assess what will work or won't. It's far easier to fall back on the industry hive mind, e.g. "nobody ever got fired for buying $X".


I agree. I think good software architecture makes this very viable too. I worked on a system where we had a queue that would hold our jobs to be processed, extremely common. This was a POC product to show to investors so we needed this thing out fast. I implemented the basic queue in Postgres but made sure to write a solid interface around it for the queuing methods, and the queue would only be interacted with via that interface. When we moved to something a bit more heavy duty, we just changed the underlying implementation and kept the interface and everything flowed. Not to mention the fact that it was nice and testable since we'd pass a mocked queue around.

To some this is the most obvious thing to do, but you'd be surprised that some people wouldn't do this (I wouldn't have before reading a few books) and how I even got pushback at first, despite it being a 10-20 minute to wrapper logic in a class.

A good abstraction for things like this makes it really justifiable to take advantage of Postgres and Redis for things that aren't their forte for the time being until you eventually need to swap the out for a more robust solution. My experience is at startups mostly, and that ability to make complete, but small implementations to get going and being able to make them more robust over time is an essential skill.


Depending on the team I'd still opt for using "anything else" rather than Postgres as a queue. The tendency to have long-lived connections combined with a flow of short-lived messages will yield a runaway queue (due to MVCC) at low enough message rates that even an early-stage startup might notice.


Maybe, but at “mid” production scale I’ve used Postgres plenty as the substrate for queues and managing requests for FSMs and their state changes, and didn’t run into these problems.

Like everything, it depends on the application.


It can work great, but if not carefully introduced it's one typo away from a disaster in prod that nobody understands. You just need somebody to introduce a code path with longish transactions interacting with the queue and not have a reasonable prolonged load test in your deployment pipeline. Given how easy other queues are to set up I wouldn't default to Postgres on many teams.


I think you're projecting an implementation of a queue in Postgres, which isn't how most people implement these things. [0] We're not doing table level locks, or creating contention with multiple queue producers or consumers, and they're not "one typo away from disaster in prod".

To do this right, you're using row level locking e.g. SELECT FOR UPDATE/SKIP LOCKED [1], and hopefully you're already using idle_in_transaction_session_timeout to deal with total consumer failures. A properly designed queue in Postgres runs more-or-less in parallel, and supports (really fantastic features like) atomic row locks across all resources needed to serve the queue request.

If you need extremely long consumer timeouts, it's also totally fine to use RLLs in addition to state on the job itself.

[0] - https://www.crunchydata.com/blog/message-queuing-using-nativ... [1] - https://www.2ndquadrant.com/en/blog/what-is-select-skip-lock...


Even that first link explicitly calls out MVCC table bloat with that strategy? Like, you can do it right (and I have no comment on your implementation in particular, let's assume for the sake of argument it's fantastic), but it's easy to write a right-looking solution with the property that innocuous-looking one-line changes cause major issues. I wouldn't want an average development team to pursue that approach without good reason, especially when the alternatives are so easy to do right.


The first link also calls out how to deal with said bloat. It’s part of administering a Postgres DB yourself (which may or may not be something a team should be doing).

> I wouldn't want an average development team to pursue that approach without good reason, especially when the alternatives are so easy to do right.

Agreed. Good reasons I’ve had in the past are:

- wanting transactional guarantees across your DB data and queue (which you don’t get if your queue is external)

- not wanting to add more stack complexity (this has been an issue when you have to support on-prem deployments that you aren’t allowed to interact with).

I’m sure there are others.


Im sort of confused. Youre mentioning MVCC and deployment pipelines not having load but what portion of an application MVCC has anything to do with a queue, postgres or otherwise? Same with deployments? Maybe theres a specific model or deployment strategy thats in use that I am unaware of and its a blind spot?


Postgres uses multi version concurrency control. The amount of garbage kept around to ensure each transaction can be handled independently is roughly proportional to the length of the transaction multiplied by how much work is happening with any transaction overlap. That's potentially a problem with queues implemented in Postgres because the for-all-intents-and-purposes-dead rows being tracked by MVCC slow down each piece of work on the queue, increasing transaction times, and circularly causing a runaway scenario at lower thresholds than you might expect.

The comment about deployment pipelines was just that some shops explicitly load test with a multiple of prod-like data to find that sort of issue before prod. Doing so for a non-negligible amount of time is important though to catch qualitative shifts in the RDBMS behavior as it approaches a steady or runaway state as a result of any software change.


How do these techiques work with replication and sharding? Or do you just use cloud managed pg like AWS RDS to not think about it?


Just don't keep all your old data in Postgres forever. Set up a nightly cron job to archive old crap out.

Now you'll never* don't need to shard your Postgres.

* Unless you work at a very rare company.


'Ate Mongo

Luv Postgres

Simple as

A quick note for anyone thinking about triggers: if there's any case whatsoever that you're going to have more than one row insert into a table per transaction, please use statement level triggers -- especially if you're doing network calls inside the trigger. Triggers execute within the same transaction, they're synchronous, and will soak up resources.

Hell, if you're using network calls in your triggers... please don't. Use `LISTEN/NOTIFY` or queues to offload that to some other process (whether postgres itself or another process), so PG isn't left hanging around waiting for a network return.


There’s so much value in database triggers when they’re done right. Anything where live stats are needed gets a whole lot easier with triggers rather than counts.


Then you discover Materialised Views... :) Magic.


Just wait until we get Incremental Materialized Views!


I was gonna mention that as well. There's a relevant talk from the System distributed convention https://www.youtube.com/watch?v=QtQMWUik0oY It's about caching views or queries and updating them when necessary as well as keeping your cache and DB in sync automatically.


Oracle has done them for 20+ years.


The article mentions it at the very bottom, but I almost never reach for triggers because they are obscure places to put application logic. On more than one occasion I've been burned by not realizing that the code in the backend did not represent the whole picture of business logic. It's more complexity, requiring more documentation, adding another point of failure that probably isn't necessary.


One major disadvantage of triggers is the inability to do canary deployments and vastly increased complexity of rolling deployments. When SQL code lives within the application, we can trivially run multiple variants of such code simultaneously. Running alternate version of a trigger for e.g. 10% of traffic is way harder.

What I would recommend instead is making use of CTE (Common Table Expression), because DML (modifying queries) inside `WITH` are allowed and taking leverage of `RETURNING` keyword in both `UPDATE` and `INSERT` we can execute multiple inter-dependent updates within single query.

With such approach we can trivially run multiple versions of an application in parallel (during deployment, for canary deployment etc.) and we have similar performance advantage of a single roundtrip to database. Additional advantage is the fact that there is only one statement which means that our query will see consistent database view (with very common read committed isolation level it is easy to introduce race conditions unless optimistic locking is used carefully).


I wouldn’t call this “simplified”.

Personally, it’s much more valuable to have all business logic in one place, in a single language, available at a glance. The perforance gain isn’t worth the increased complexity in codebase.


You just unintentionally made your problem distributed which is can of worms and you'll find out later when your project is successful in production.


Wasn't this something that Oracle pushed aggressively like in the 80s or 90s and then everyone agreed it was a maintainability living hell? Is this a thing again for some reason I'm missing?


It's good for vendor lock-in.


Triggers and stored procedures should only handle logic which you do not expect to change. They are notoriously hard to test and debug.

They are coupled tightly with database schema design. Making changes to them needs careful consideration and a extensive testbed environment which cannot be mocked with a fraction of the data.


I worked at a company which relied on significant use of Postgres triggers and it was not simplified in my mind due to:

- Engineers being more comfortable expressing the required business logic in the other languages they were working in then PL/pgSQL

- Challenging to write tests for the triggers

- Harder to deploy variations for testing if needed


You can write tests using pgTAP https://pgtap.org/


If you plan on never changing away from postgres, never having to shard, never needing to do anything that a trigger can’t support, then it is a good option. Which may be true for the vast majority of the apps. You also need sql expertise in addition to app dev expertise.


All true, but still a bad idea. Splitting your concerns across multiple systems will eventually bite you. It's too hard to reason about, and unless you want to make some very hard yards, there's not even sensible source control.


A colleague of mine talks about the Law of Conservation of Complexity. It boils down to "the complexity will have to go somewhere".

You can make the development of your backend more simple, by shoving the complexity into the database, meaning your backend just does less. That in itself does not make your application any simpler.


For Django there's https://github.com/Opus10/django-pgtrigger that makes it possible to define triggers right in your models, so you have everything in one place.


Business logic in the database screams anti-pattern to me.

How do we know who created the rule, edited the rule? How can we reason about the sequence in which these rules are executed based on larger use cases with complex interactions.

Seems like a fire waiting to happen.


> How do we know who created the rule, edited the rule? How can we reason about the sequence in which these rules are executed based on larger use cases with complex interactions.

You are presumably operating inside of a database, a place where the above concerns can be tracked in ~3 additional columns. More complex rule arrangements can be addressed with additional tables & relations. If you are starting from a blank schema, everything is possible. As noted by others here, you either go all-in, or all-out. The middle ground where half the logic is in the database and half is in GitHub is where things get yucky.

Consider the simplification angle. There are some techniques that allow for running entire apps directly out of the database. You might not even need Node, .NET, Go, Rust, etc. Hypothetically, if 100% of the things are in a database, you can simply record the binary log to S3 and have a perfect log of everything over time. Imagine how easy it would be to set up a snapshot of a given environment on a developer machine. Inversely, you could directly ship a developer's machine to production. You can also do some crazy shit where you merge bin logs from different timelines via marker transactions.

The other major advantage includes being able to update production while its live, even if production is running on a single box. I saw a particular PL/SQL install earlier in my career that was utilized for this exact property - production literally could not ever drop a single transaction or stop servicing them. Latency beyond 2 seconds could be catastrophic for system stability. Production could come down, but it had to be all or nothing. Think - shutting down a nuclear reactor and the amount of time you are locked out due to the subsequent safety and restart checklists. You absolutely need a way to safely & deterministically update in-between live, serialized transactions or you can't run your business effectively.


I'd say depends on the complexity of the logic itself. I would never write triggers with any logical branching, but for simple update table B when table A is updated? I definitely see the value in that.


I'm torn on this subject. It's not a simplification in my view, but just one way to achieve a goal that has pros and cons.

The big pro is that you no longer need to remember to update tableB, which is derived from data in tableA due to performance, in your application code every time you update tableA.

The cons are that:

* You add more state to your DB

* You can't express the logic in your backend language

Thinking more about it, I don't think the cons outweigh the pros. I would prefer this trigger logic to be part of the DBMS, so I can express the logic in my backend language and also avoid increasing the dependency of my application logic on DB state.


One of the reasons we use database triggers is that we have a legacy system running on Rails and a new system in Typescript. The old system has an entity that is similar to the new systems entity but a bit different. While in this limbo of sunsetting the old system, we have triggers on the old entity when it changes to update the new entity. The thing is, these triggers invoke a lambda which does the business logic for migrating old row to new row. We could also have the old system maybe make an API call to the new system and skip triggers altogether.


Using a trigger to kick off a lambda seems like the way to go. You're essentially doing what the NOTIFY command does.


It's easy to start with when your project is small, especially for quick fixes and improvements (e.g. total # of orders made by user), later it will become a mess and makes your project difficult to maintain, because you usually don't test your database in unit tests and database migrations are still a thing (from one db to another, from one type of columns to another and so on)


> You should consider the fact that complexity will still be there, but it will be abstracted away inside the database.

You have got to be kidding me.


I used them once when I was a young engineer, then quickly realised how much I hated them because of how much they obscured the logic of my application. I hadn't even started to worry about migrations at that point.

Been first in all my teams since then to loudly voice my opposition when someone suggests it as a quick fix for something more complicated


Heh I vaguely recall at Etsy, predating my time, that a significant amount of business logic was done using stored procedures and triggers.

They migrated away from it at some point, but some of the people who handled that migration were still around when I was there. Didn’t sound fun at all, sounded like a horrific nightmare.


We use triggers and notifications extensively.!it is great because we don’t have to run a message queue . the only concerns are that notification has a size limit that is quite small. Also, it is harder to implement multiple workers who get only some of the notifications to load balance.


Using triggers when you have a single-codebase is prone to obscuring where things happen - is it in code, or in a trigger somewhere? However, when you have multiple different codebases touching the same db, it can be great at enforcing things to happen in the same way across these.


We have a trigger that is 1400 lines long. Try debugging that when there is a problem.


Sounds like your problem isn't the trigger per-se


And then someone does something to the database that disables the triggers. Or just one of them. Or someone adds the same trigger a second time. It requires a lot of discipline to keep it sane.


Not about the core of the article, but how often are folks using something other than the id column as primary key and having id be a foreign key? This seems extremely confusing to me.


>You should consider the fact that complexity will still be there, but it will be abstracted away inside the database.

Where Git isn't looking...


Or maybe just use stored procs, which may be better for the wallet example here.


I don't like the idea of moving application backend logic to the database.


When you have one problem, you decided to use Postgres Trigger.

Now you have two problems.


That's just written to trigger me right?

Right?


Betbola138




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: