Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Postgres is probably the best solution for every type of data store for 95-99% of projects. The operational complexity of maintaining other attached resources far exceed the benefit they realise over just using Postgres.

You don’t need a queue, a database, a blob store, and a cache. You just need Postgres for all of these use cases. Once your project scales past what Postgres can handle along one of these dimensions, replace it (but most of the time this will never happen)

It also does wonders for your uptime and SLO.



We collect messages from tens of thousands of devices and use RabbitMQ specifically because it is uncoupled from the Postgres databases. If the shit hits the fan and a database needs to be taken offline the messages can pool up in RabbitMQ until we are in a state where things can be processed again.


Still trivial to get that benefit with just a separate postgres instance for your queue, then you have the (very large IMO) benefit of simplifying your overall tech stack and having fewer separate components you have to have knowledge for, keep track of version updates for, etc.


You may well be the 1-5% of projects that need it.


Everyone should use this pattern unless there's a good reason not too though. Turning what would otherwise be outages into queue backups is a godsend.

It becomes impossible to ever lose in-flight data. The moment you persist to your queue you can ack back to the client.


In my experience, persistent message queue is just a poor secondary database.

If anything, I prefer to use ZeroMQ and make sure everything can recover from an outage and settle eventually.

To ingest large inputs, I would just use short append only files and maybe send them over to the other node over ZeroMQ to get a little bit more reliability, but rarely are such high volume data that critical.

There is nothing like free lunch when talking distributed fault tolerant systems and simplicity usually fares rather well.


What if the queue goes down? Without insane engineering, there is always going to be a single point of failure somewhere.


It doesn't, it's two clusters in replication. It's like the least insane engineering?

Are y'all not running your stuff highly available?


Databases run as clusters in replication...


Yes? I don't understand your point. Queueing systems are a kind of database with different semantics and performance characteristics.

The answer to what happens if the queue goes down is that it runs highly available.


Most people don't need another kind of database. A few do but not enough to justify an "everyone should do it this way by default" opinion.


Sure, if you're at the "it's not worth maintaining two dbs or paying for any managed offering" stage then you should still use the pattern but implement the work queue in your primary db. It'll be slower and you'll have to diy a little if there aren't ready made libs to do it but you still get all the major benefits.


> The moment you persist to your queue you can ack back to the client

You mean like... ACID transactions?

https://www.postgresql.org/docs/current/tutorial-transaction...


You act like the "I can persist data" is the part that matters. It's the fact that I can from my pool of app servers post a unit of work to be done and be sure it will happen even if the app server gets struck down. It's the architecture of offloading work from your frontend whenever possible to work that can be done at your leisure.

Use whatever you like to actually implement the queue, Postgres wouldn't be my first or second choice but it's fine, I've used it for small one-off projects.


> The moment you persist to your queue you can ack back to the client.

Relational databases also have this feature.


And if you store your work inbox in a relational db then you invented a queueing system. The point is that queues can ingest messages much much faster and cheaper than a db, route messages based on priority and globally tune the speed of workers to keep your db from getting overwhelmed or use idle time.


You only start planning staying online when your database is down if you have some very rare and impactful requirements.

For way more than 99% of the people, all of that should be irrelevant.


It's not the database I'm worried about.

If you push a change to your frontend server that does all the work itself and it breaks you have a service interruption. That same system based on queues it's a queue backup and you have breathing room.

If you're doing a longer operation and your app server crashes that work is lost and the client has sit there and wait, in a queue system you can return almost immediately and if there's a crash it'll get picked up by another worker.

Having a return queue and websockets let you give feedback to the client js in a way that is impervious to network interruptions and refreshes, once you reconnect it can catch up.

These aren't theoretical, this architecture has saved my ass on more occasions than I can count.


I believe there are a lot of assumptions baked into your argument that are going undiscussed. Foremost being that queuing all requests is even desirable. For certain workloads that may be the case, like ingesting event data, but for others queuing the requests doesn't make the service any more available than not queuing the requests.


> Postgres is probably the best solution for every type of data store for 95-99% of projects.

I'd say it's more like:

- 95.0%: SQLite

- 4.9%: Postgres

- 0.1%: Other


The way these things go the 5% that you need postgres or otherwise for will account for 95% of the data.


>95.0%: SQLite

I'm a bit late to the party (also, why is everyone standing in a circle with their pants down) but, does no one care about high availability with zero data loss failover, and zero downtime deployments anymore?


Do you think more than 5% of projects really need to have HA & zero downtime deployments?

(and litestream exists for sqlite if needed)


How do you ensure fault tolerance on SQLite?


In one instance I did that by storing the file on a fault tolerant NAS. Basically I outsourced the issue to NetApp. The NAS was already there to match other requirements so why not lean on IT.


Why?


Careful with using postgres as a blob store. That can go bad fast...


This really depends on the size of your blobs. Having a small jsonb payload will not be a problem, storing 500MB blobs in each row is probably not ideal if you frequently update the rows and they have to be re-written.


Doesn't Postgres actually use it's TOAST system (basically pointers to blobs on disk) for any large value anyway?


Ominous... Care to elaborate?


Why?


more deets?

I want to use Postgres for JSON, I know it has specific functionality for that.

But still, what do you mean by that and does it apply to JSON why or why not?


While it makes sense to use postgres for a queue where latency isn't a big issue, I've always thought that the latency needs of many kinds of caches are such that postgres wouldn't suffice, and that's why people use (say) redis or memcached.

But do you think postgres latency is actually going to be fine for many things people use redis or memcached for?


Honestly. This is 100% correct.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: