There are a few mentions of Oban [1] here. Most people don't realise that Oban in fact uses SKIP LOCKED [2] as well.
Oban's been great, especially if you pay for Web UI and Pro for the extra features [3]
The main issue we've noticed though is that due to its simple fetching mechanism using locks, jobs aren't distributed evenly across your workers due to the greedy `SELECT...LIMIT X` [2]
If you have long running and/or resource intensive jobs, this can be problematic. Lets say you have 3 workers with a local limit of 10 per node. If there are only 10 jobs in the queue, the first node to fetch available jobs will grab and lock all 10, with the other 2 nodes sitting idle.
From my understanding, `work_mem` is the maximum available memory per operation and not just per connection. If you have a stored procedure with loops and/or many nested operations, that can quickly get quite big.
One trick worth noting, is that you can override the working memory at the transaction level. If you have a query you know needs more memory (e.g doing a distinct or plain sorting on a large table), within a transaction you can do:
`set local work_mem = '50MB'`
That will override the setting for operations inside this transaction only.
> Bad code in a specific part of the codebase bringing down the whole app, as in our November incident.
This is a non-issue if you're using a Elixir/Erlang monolith given its fault tolerant nature.
The noisy neighbour issue (resource hogging) is still something you need to manage though. If you use something like Oban[1] (for background job queues and cron jobs), you can set both local and global limits. Local being a single node, and global across the cluster.
Operating in a shared cluster (vs split workload deployments) give you the benefit of being much more efficient with your hardware. I've heard many stories of massive infra savings due to moving to an Elixir/Erlang system.