This kind of issue always comes up when people put business logic inside the database. Databases are for data. The data goes in and the data goes out, but the data does not get to decide what happens next based on itself. That's what application code is for.
The way you model data and store it in your database is fundamentally a part of your business logic. The same data can be modeled in many different ways, with different trade-offs for different use cases. Especially if you have a large amount of data, you can't just work with it as is, you need to know how you will use it and model it in a way that makes the common operations fast enough. As your application evolves, this may change, and even require data migrations.
None of this means you have to or even should use stored procedures, triggers, or listen/notify. I'm just making the point that there is no clean separation between "data" and "business logic".
Can't upvote this enough. The point is not that procedures outside of the DB is wrong, nor is it that procedures should always go into the DB. It's that you should look at the context and decide what the best way to solve the problem is.
Agreed. I used triggers frequently for things like incrementing/decrementing count fields for dashboards because it's the only way to guarantee those numbers are correct while ensuring something in the application hasn't bypassed a callback or handler to modify the data.
You only need to cover three scenarios and it's very simple to implement. Recorded added +1, Record removed -1, Record moved +1 & -1.
If you have counts that are more complicated, it doesn't work but this solution easily beats semi-frequent COUNT queries.
The first thing I did when I saw this article was to check the Postgres docs, because I thought "heh, surely they just didn't read the fine print," but the LISTEN/NOTIFY page has zero mentions of "lock" in the entire content.
I think, It's because the locking is part of the transaction commit locking, but yes it should be mentioned.
But it's oversimplified a case of "high queue load f* up the availability/timings for other DB operations" (and themself).
And thats a generic problem you have, even if just due to "generic CPU/WAL/disk load" if you put your queue into your DB even iff that specific lock would be somehow solved with some atomic concurrent algorithms or similar (not sure if that even is possible).
So in general make your storage db, and queue a different service (and you cache too), even if it uses the same kind of storage. (Through technically there are clever in-between solutions which run their own queue service but still use you DB for final storage but have a ton of caching, in memory locking etc. to remove a huge part of the load from the DB. )
One can replace LISTEN/NOTIFY with logical replication / CDC. And it's funny because somehow, somewhere, PG must be serializing the writing of the WAL to some degree. So it's not clear to me why LISTEN/NOTIFY needs additional serialization. Perhaps PG should turn NOTIFY into INSERTs on a special table that a worker process watches and turns those inserts into notifies (and deletes the inserts).
That may hold to a certain extent for relational databases where your business model doesn't align well with physical model (tables). Although you might wonder why stored procedures and triggers were invented.
In databases where your domain is also your physical data model, coupling business logic to the database can work quite well, if the DBMS supports that.
I think an argument can be made that relations, schemas and constraints encode a kind of business logic that is intrinsic to the definition and integrity of the data, while other types of business logic represent processes that may hinge on data but aren’t as tightly coupled to it. Similar to the difference between a primitive type and a function.
I guess some will argue that their business logic is special and really is so tightly coupled to the data definition that it belongs in the database, and I’m not going to claim those use cases don’t exist, but I’ve seen over-coupling far more often than under-coupling.
This is why I say: Applications come and go, but data is forever.
Every company I’ve been at that relied on application code to handle referential integrity had orphaned rows, and incidents related to data errors or the absurd pipelines they had built to recreate what FK constraints and triggers already do.
RDBMS are extremely well-tested pieces of software that do their job incredibly well. To think that you could do better, or even equally as well, is hubris. If you want to trade those guarantees for “velocity” go right ahead, but you also need to take into account the inevitable incidents and recoveries that will occur.
It’s really not about code is better or database it better, it’s mostly about locality: if you want to update thousands of records, you can’t pull those records into a separate process, update them there and then write back. So you put your code next to the data in the database. Stored procedures are just code deployed to a database container…
Its possible but of course slow because of https://gist.github.com/jboner/2841832. Data locality matters a lot. Moving data around the network when it doesn't really need to be moved is heresy (unless performance doesn't matter, then who cares). Remember the computer doesn't care about your religion which says only this can do this and only that can do that.
Although I'm partial to a SPROC, I do not deploy them because I understand my colleagues might throw me from a window. But without going full tilt DB-as-the-application,
The DB can make much stronger guarantees about transactions and updates the closer that logic happens to itself. In the world of cloud computing, this can be a cost savings for ingress/egress too.
>> deploy them because I understand my colleagues might throw me from a window
Maybe throw your colleagues out the window instead if they don't know what they are talking about. I'm not anti/pro SPROC at all, but I am anti anti-reality. People that don't understand the vast differences in latencies between in process and out of process work should not exist in the industry.
And both of those philosophies will lead to bad engineering.
There are things that work better, are safer and simpler to do on the database, and things that work better, are safer and simpler in code. And those things might change depending on context, technology, requirements, size of project, experience of contributors, etc.
Forcing round pegs into square holes will always lead to brittle code and brittle products, often for more cost (mental and financial!) than actually using each tool correctly.
I am mostly on the side of business logic should live in applications and relationships between data types are not business logic so much as just the layout of the data. But I typically access data via an ORM and they typically don’t have support for triggers and stored procedures. If they did, I would certainly use it because projects I work on might have multiple people writing application code but everyone uses a single set of database models. This would mean that critical constraints on the shape of the data could be defined and respected at all times vs some developer on my team forgetting to include some critical check in their data update routine.
Generally customers don't care about religious views. Make understanding the actual machine and associated latencies your religion instead. The reason to write a stored proc or do some processing in the database is entirely about data locality, not to keep the drooling masses from messing things up. A library is fine for that.
Every ORM I’m aware of allows you to drop down to raw SQL. Write your stored procedure, store it in VCS, add it as a migration, and then call it. If you want to make it friendlier, wrap the call in a function in your language so you can add helpers, better error handling, etc.
What I would prefer is integration at the model definition level. For example let’s say that I have a Customer model and an Order model. I don’t always want to pull in the customer fields when listing orders. Most ORMs would allow me to create a join and specify the field from Customer I want when fetching Orders but those joins add up quickly. I could denormalize the data and put things like the customer name and email onto each order but if the customer changes either value now the application code has to remember to update it. And yes I could put that in the model’s save() method but that is fragile too because what if someone else does run code that updates stuff at the raw SQL level and doesn’t include these updates.
Now if I could specify that I want Order.customer_name to come from a specific other model and be updated automatically the ORM could automatically create a trigger to update that field when the customer table is updated.
Obviously this is a very simplistic example but there are many more, including versioning and soft deletes that could be incredibly useful. But the key is that the ORM has to generate the code for the triggers and stored procedures. Doing that manually is possible now but (a) uses a different language even than regular SQL which not everyone is familiar with, and (b) there is no type checking for what you are doing. The ORM model definitions are the main source of truth about the shape of your database, so I want to use them as such.
>> I don’t always want to pull in the customer fields when listing orders
Data locality is king. Everything comes down to physical things such as blocks on the SSD, network interconnect, RAM, L3, L2, L1 cache and registers. Are those customer fields in the same page as whatever else you need? If so, most of the work is already done. Yes, you can save some network bandwidth transferring things that aren't needed but does it matter? It might but it might not. The key is to know what matters and reason about things from the perspective of the machines actually doing the work.
I believe that both code and data are kings, under different realms. Code is king of the "what we're doing today" realm. Data is king of the "what's possible tomorrow" realm.
If you want your database to just store bytes, use a key-value store. But SQL gives you schemas and constraints for a reason; they're guardrails for your business logic. Just don’t ask your tables to run the business for you.
If only different ORMs had more support for triggers and stored procedures. Things would be so much easier if I could do things like denormalize certain frequently accessed fields across tables but with proper ability to update them automatically without having to do them in application code.
I largely agree.. though data mapper libraries (such as Dapper for .Net) can be pretty helpful, even if there's a minor disconnect from the SQL used and the POCO/Record definitions used... It's far simpler than most ORMs and keeps you a little closer to the DB.
I used to think like this, but over the past decade and a half they have gotten a lot more performant and usable and the speed with which you can develop using them is just unmatched by writing raw SQL. Again, I say this as someone who used to be very much team just write SQL and even created a quasi-ORM that allowed me to write all the queries by hand but returned model instances that could have methods as a sort of in-between solution. I still routinely use raw SQL but only when it is actually necessary.
This is one of those absolute statements that cause the kind of problem stated by grandparent. There are lots of those: "Use Postgres for everything", "No business data on the DB", "No methods bigger than 10 lines", "Abstractions only after 3 usages".
Back to the topic: Lots of potential bugs and data corruption issues are solved by moving part of the business logic to the database. Other people already covered two things: data validation and queue atomicity.
On the other hand, lots of potential issues can also arise by putting other parts of business logic to the database, for example, calling HTTPS endpoints from inside the DB itself is highly problematic.
The reality is that the world is not black and white, and being an engineer is about navigating this grey area.
So what are your thoughts on constraints then? Foreign keys? Should that only be handled by the application, like Rails does (or did, haven't used in a long time).
I don't think of those as business logic, per se. They're just validity checks on what the data should look like before it's written to disk - they're not actionable in the way L/N is. That being said, constraints usually end up being duplicated outside the db anyway, but having them where the data rests (so you don't have to assume every client is using the correct constraint code) makes sense.
I see. Further I have used triggers to automatically populate log tables or aggregate statistics on write. Why do I need fast statistics? For API limits. Customers have N free operations per months and such, so I have to query that on every operation. Do you consider these things as business logic that don't belong in the database?
FK Constraints on partitioned tables has been a solved problem for years for Postgres. MySQL still doesn’t support them, unfortunately.
For sharding, Vitess kind of supports them; Citus fully supports them.
You’re correct that they do impact performance to an extent, but as a counter argument, if your data is incorrect, it doesn’t matter how quickly you wrote it.
FKs are nothing special. It's just more INSERTs/UPDATEs/DELETEs. If you can't have a few more DMLs in your transactions in your sharded DB then you've already got scaling problems.
Really, FKs are typically implemented internally by RDBMSes as TRIGGERs that do what you expect FKs to do, which means they really are nothing more than syntactic sugar.
You're reaching the wrong conclusion, probably because of confirmation bias. Certainly this LISTEN/NOTIFY problem does not lead to your conclusion, nor does it support it. After all if you were relying on LISTEN/NOTIFY you could instead rely on logical replication decoding / CDC instead. And heck, you could even have a client connected to the same database that uses logical decoding to pick up events worth NOTIFYing about and then does just that, but without burdening any other transactions.
I've seen people who disagree with that statement and say that having a separate back end component often leads to overfetching and in-database processing is better. I've worked on some systems where the back end is essentially just passing data to and from stored procedures.
It was blazing fast, but working with it absolutely sucked - though for whatever reason the people who believe that seem to hold those views quite strongly.
It really depends, but it's also a factor of time, that is, "back in the day", databases were designed to serve many different clients, nowadays a common practice is to have a 1:1 relationship between a database and a client application.
Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.
> Of course, this is sometimes abused and taken to extremes in a microservices architecture where each service has their own database and you end up with nastiness like data duplication and distributed locking.
Not to mention the difficulty in maintaining referential integrity with all of that duplicated data. There are various workarounds, but at that point it feels very much like we’re recreating a shared DB, but shittily, and netting zero benefits.
Disagree; these issues come up when people use more advanced features of DBs without having the requisite DB expertise on staff. I’ll give OP that Postgres’ docs do not mention this gotcha (and props to them for drilling down to source code!), but by and large, these issues are from people operating via tech blogs.
The DB is - or should be - the source of truth for your application. Also, since practically everyone is using cloud RDBMS with (usually) networked storage, the latency is atrocious. Given those, it seems silly to rely on an application to react to and direct changes to related data.
For example, if you want to soft-delete customer data while maintaining the ability to hard-delete, then instead of having an is_deleted and/or deleted_at column, have a duplicate table or tables, and an AFTER DELETE trigger on the originals that move the tuples to the other tables.
Or if you want to have get_or_create without multiple round trips (and you don’t have Postgres’ MERGE … RETURNING), you can easily accomplish this with a stored procedure.
Using database features shouldn’t be seen as verboten or outdated. What should be discouraged is not treating things like stored procedures and triggers as code. They absolutely should be in VCS, should go the same review process as anything else, and should be well-documented.
That's purely because nobody knows how to write SQL let alone stored procedures. If stored procedures had better devex they'd be used for most of your app.
Postgres lets you write stored procedures out of the box in pgSQL, C, Tcl, Perl, and Python. There are also 3rd party extensions for most languages you might want, including Rust and JS.
More broadly, not knowing how to write SQL is a very solvable problem, and frankly anyone accessing an RDBMS as a regular part of their job should know it. Even if you’re always using an ORM, you should understand what it’s doing so you can understand the EXPLAIN output you’ll probably be looking at eventually.
Agreed. What’s worse is when they confidently proclaim that they had to scale up N times “to handle the load,” but then a brief reading of of their schema and queries reveals that an RPi could probably handle it if they’d designed a better schema, and had a basic understanding of B+trees.
I go through this with my wife for every party we throw. She wants the house cleaned, table set, food spread ready, seasonal cocktails mixed, furniture moved around, decorations just so, etc.
I’m like here’s a giant thing of ice cold booze have fun.
I know this struggle, and the best I've been able to do is to push every time for limited scope. Let's just get pizzas instead of cooking 3 different mains and having a cheese plate, 4 bowls of chips, etc. Social media has really done a number on people (see also those omnipresent balloon arches)
It doesn't seem like the extra traffic is the issue. People don't want Google's AI from reading and summarizing their data and thus preventing clickthroughs. Why would I click on your site if google did all the work of giving me the answer ahead of time?
Both are an issue. People don't want AI overviews cannibalizing their website traffic. People also don't want AI bots spamming their website with outrageous numbers of requests everyday.
The danish flag is a white cross on a red background. If you cut out the white cross, you will be left with four rectangles of red, which can be pushed together and sewn up again, forming a solid red flag
I also ran into this trying to upgrade my company's auth strategy. The hardest part of auth is convincing people that... it's not actually as hard or dangerous as they think it is. It was an uphill and ultimately unsuccessful battle of mine. People can't even divorce JWTs as simple, verifiable json data blobs from the entirety of the OAuth2 spec. You see it on HN, with hundreds of circular comment threads and I've seen it in real life.
I would recommend that people don't do auth not because it's easy to be insecure, it's that auth sometimes needs agility. Auth sometimes needs to grow and adapt just like any other part of your product.
Except that auth might not be a core part of your insurance or tax app, and you'd rather spend your energy on the part of "agility" that has to do with the core parts of your app.
On the flip side I was at a startup using auth0, because as you said, not a core part of the business right? Until the traction hit and they had hundreds of thousands of users. Suddenly the auth bill became untenable - users are great but there wasn’t enough revenue to cover these costs. Auth0 didn’t budge. In fact they were outright nasty to deal with. They were holding our user logins and passwords hostage and they knew it.
You don't have to buy into Okta, you can also lean on auth frameworks like auth.js. Either way you're depending on outside labor to adapt.
I worked for a social media company before and we also rolled our own auth and we didn't regret it. High user accounts are a special case and you should know ahead of time.
But for B2B? Beware. You might get hit with an ask for active directory support.
Yes, people mix up the concepts of authentication and authorization (access control).
Authentication can be really simple if you rely on a standard like JWT.
Why do these companies keep trying to fire remote employees? Is it so hard to let them keep working from home? Is the company's real estate portfolio that important? You could probably even pay remote workers less (or give non-remote employees a "bonus" for coming in) and everyone would still be happy.
Too many issues with remote employees disappearing to do other stuff during the day. Amazon and others are requiring employees to go to any office, not necessarily the one with their teammates, so it's evidently not about in-person collaboration. My manager (different company) told me straight up, the dept is ok with me working remote but not other people.
Older than this is the "open office" thing. We saw over time how this wasn't about collaboration or even space-saving, but about keeping employees under watch.
I suspect this will become increasingly common with companies that allow remote work. Atlassian is a large Australian tech employer that is full remote, and this is definitely the case there - performance reviews twice yearly and a very high attrition rate.
I can certainly see a company that wants to be remote-first making it work. Probably not one that has already invested tons of money and time in offices filled with well-established employees.
Letting people work at a slow pace for a while, PIPing them, then ultimately firing, is very expensive. An unproductive teammate not only doesn't get stuff done but also brings down the rest of the team.
Edit: Some companies also don't like the morale effects of sink-or-swim, though Amazon is fine with it I've heard.
They aren't exactly bad hires. Lots of previously solid employees were going kinda AWOL sometimes while remote. Even if someone wants to be responsive, after seeing enough pings go unanswered all day, they might follow suit.
They could redefine the job requirements to be remote-first, but that'd require a lot of firing and hiring.
My guess is that there's tax breaks on the line in most cases. That is, corporations got some tax breaks on the basis of so many humans in downtown (or whatever office). If they don't meet those obligations, they loose some overly generous subsidy.
- Human contact is more important than efficiency gains, hence mandating return-to-office.
- At the same time, efficiency gains are more important than human contact, hence reducing human headcount in favor of increased AI use.
If you read between the lines, you can see how those two points are related: humans find difficult to feel connected when their communication partner is just pixels on a screen, so that's why remote workers are being replaced with AI.
> Human contact is more important than efficiency gains, hence mandating return-to-office.
First, citation needed. Second, in this day and age companies try whatever they can to ruin in-office employees' morale as well, which goes counter to the position that social quality of life is important for efficiency.
Poe's law strikes again, I should have added "/s" at the end of my message.
If I had to rewrite my previous message in a less subtle way, it would be that companies are constantly contradicting themselves when giving any public "reason" for their actions, and I do not think employees' happiness (which fathomdeez focuses on) is anywhere close to their actual reason.
The endless Q&A between employees and bosses show that it's not about collaboration, and bosses aren't saying what it's really about. None of their answers make sense.