Postgres is probably the best solution for every type of data store for 95-99% o...

fullstop · on April 11, 2023

We collect messages from tens of thousands of devices and use RabbitMQ specifically because it is uncoupled from the Postgres databases. If the shit hits the fan and a database needs to be taken offline the messages can pool up in RabbitMQ until we are in a state where things can be processed again.

hn_throwaway_99 · on April 11, 2023

Still trivial to get that benefit with just a separate postgres instance for your queue, then you have the (very large IMO) benefit of simplifying your overall tech stack and having fewer separate components you have to have knowledge for, keep track of version updates for, etc.

wvenable · on April 11, 2023

You may well be the 1-5% of projects that need it.

Spivak · on April 11, 2023

Everyone should use this pattern unless there's a good reason not too though. Turning what would otherwise be outages into queue backups is a godsend.

It becomes impossible to ever lose in-flight data. The moment you persist to your queue you can ack back to the client.

mordae · on April 11, 2023

In my experience, persistent message queue is just a poor secondary database.

If anything, I prefer to use ZeroMQ and make sure everything can recover from an outage and settle eventually.

To ingest large inputs, I would just use short append only files and maybe send them over to the other node over ZeroMQ to get a little bit more reliability, but rarely are such high volume data that critical.

There is nothing like free lunch when talking distributed fault tolerant systems and simplicity usually fares rather well.

0cf8612b2e1e · on April 11, 2023

What if the queue goes down? Without insane engineering, there is always going to be a single point of failure somewhere.

Spivak · on April 11, 2023

It doesn't, it's two clusters in replication. It's like the least insane engineering?

Are y'all not running your stuff highly available?

wvenable · on April 11, 2023

Databases run as clusters in replication...

Spivak · on April 12, 2023

Yes? I don't understand your point. Queueing systems are a kind of database with different semantics and performance characteristics.

The answer to what happens if the queue goes down is that it runs highly available.

wvenable · on April 12, 2023

Most people don't need another kind of database. A few do but not enough to justify an "everyone should do it this way by default" opinion.

Spivak · on April 12, 2023

Sure, if you're at the "it's not worth maintaining two dbs or paying for any managed offering" stage then you should still use the pattern but implement the work queue in your primary db. It'll be slower and you'll have to diy a little if there aren't ready made libs to do it but you still get all the major benefits.

jakearmitage · on April 11, 2023

> The moment you persist to your queue you can ack back to the client

You mean like... ACID transactions?

https://www.postgresql.org/docs/current/tutorial-transaction...

Spivak · on April 11, 2023

You act like the "I can persist data" is the part that matters. It's the fact that I can from my pool of app servers post a unit of work to be done and be sure it will happen even if the app server gets struck down. It's the architecture of offloading work from your frontend whenever possible to work that can be done at your leisure.

Use whatever you like to actually implement the queue, Postgres wouldn't be my first or second choice but it's fine, I've used it for small one-off projects.

colonwqbang · on April 11, 2023

> The moment you persist to your queue you can ack back to the client.

Relational databases also have this feature.

Spivak · on April 11, 2023

And if you store your work inbox in a relational db then you invented a queueing system. The point is that queues can ingest messages much much faster and cheaper than a db, route messages based on priority and globally tune the speed of workers to keep your db from getting overwhelmed or use idle time.

marcosdumay · on April 11, 2023

You only start planning staying online when your database is down if you have some very rare and impactful requirements.

For way more than 99% of the people, all of that should be irrelevant.

Spivak · on April 12, 2023

It's not the database I'm worried about.

If you push a change to your frontend server that does all the work itself and it breaks you have a service interruption. That same system based on queues it's a queue backup and you have breathing room.

If you're doing a longer operation and your app server crashes that work is lost and the client has sit there and wait, in a queue system you can return almost immediately and if there's a crash it'll get picked up by another worker.

Having a return queue and websockets let you give feedback to the client js in a way that is impervious to network interruptions and refreshes, once you reconnect it can catch up.

These aren't theoretical, this architecture has saved my ass on more occasions than I can count.

Rapzid · on April 12, 2023

I believe there are a lot of assumptions baked into your argument that are going undiscussed. Foremost being that queuing all requests is even desirable. For certain workloads that may be the case, like ingesting event data, but for others queuing the requests doesn't make the service any more available than not queuing the requests.

alberth · on April 11, 2023

> Postgres is probably the best solution for every type of data store for 95-99% of projects.

I'd say it's more like:

- 95.0%: SQLite

- 4.9%: Postgres

- 0.1%: Other

contravariant · on April 11, 2023

The way these things go the 5% that you need postgres or otherwise for will account for 95% of the data.

roncesvalles · on April 12, 2023

>95.0%: SQLite

I'm a bit late to the party (also, why is everyone standing in a circle with their pants down) but, does no one care about high availability with zero data loss failover, and zero downtime deployments anymore?

alberth · on April 12, 2023

Do you think more than 5% of projects really need to have HA & zero downtime deployments?

(and litestream exists for sqlite if needed)

SergeAx · on April 12, 2023

How do you ensure fault tolerance on SQLite?

brohee · on April 12, 2023

In one instance I did that by storing the file on a fault tolerant NAS. Basically I outsourced the issue to NetApp. The NAS was already there to match other requirements so why not lean on IT.

runeks · on April 12, 2023

adverbly · on April 11, 2023

Careful with using postgres as a blob store. That can go bad fast...

dewey · on April 12, 2023

This really depends on the size of your blobs. Having a small jsonb payload will not be a problem, storing 500MB blobs in each row is probably not ideal if you frequently update the rows and they have to be re-written.

GordonS · on April 12, 2023

Doesn't Postgres actually use it's TOAST system (basically pointers to blobs on disk) for any large value anyway?

colonwqbang · on April 11, 2023

Ominous... Care to elaborate?

jakearmitage · on April 11, 2023

no_butterscotch · on April 11, 2023

more deets?

I want to use Postgres for JSON, I know it has specific functionality for that.

But still, what do you mean by that and does it apply to JSON why or why not?

jrochkind1 · on April 11, 2023

While it makes sense to use postgres for a queue where latency isn't a big issue, I've always thought that the latency needs of many kinds of caches are such that postgres wouldn't suffice, and that's why people use (say) redis or memcached.

But do you think postgres latency is actually going to be fine for many things people use redis or memcached for?

armatav · on April 12, 2023

Honestly. This is 100% correct.