The architecture is remarkable. The lengths they’ve gone to for language version compatibility, and protecting app namespaces is especially impressive.
Namespaces aren't so much a concept in Elixir, but this refers to the names used for things like modules. Expert will rewrite the code of its "engine" so that the engine's code and dependencies and those of the application it is embedded into don't overlap.
Ping requires something persistent to check. That requires creating tuples, and most likely deleting them after they’ve been consumed. That puts pressure on the database and requires vacuuming in ways that pubsub doesn’t because it’s entirely ephemeral.
Not to mention that pubsub allows multiple consumers for a single message, whereas FOR UPDATE is single consumer by design.
Postgres LISTEN/NOTIFY was a consistent pain point for Oban (background job processing framework for Elixir) for a while. The payload size limitations and connection pooler issues alone would cause subtle breakage.
It was particularly ironic because Elixir has a fantastic distribution and pubsub story thanks to distributed Erlang. That’s much more commonly used in apps now compared to 5 or so years ago when 40-50% of apps didn’t weren’t clustered. Thanks to the rise of platforms like Fly that made it easier, and the decline of Heroku that made it nearly impossible.
We have Postgres based pubsub, but encourage people to use a distributed Erlang based notifier instead whenever possible. Another important change was removing insert triggers, partially for the exact reasons mentioned in this post.
The problem was with restrictive connections, not DNS based discovery for clustering. It wasn't possible (as far as I'm aware) to connect directly from one dyno to another through tcp/udp.
I have only worked with a product that used it, so no direct experience, but one problem that was often mentioned is split-brains happening very frequently.
That's the issue with goroutines, threads, or any long running chain of processes. The tasks must be broken up into atomic chunks, and the state has to be serialized in some way. That allows failures to be retried, errors to be examined, results to be referenced later, and the whole thing to be distributed between multiple nodes.
It must in my view at least, as that's how Oban (https://github.com/oban-bg/oban) in Elixir models this kind of problem. Full disclosure, I'm an author and maintainer of the project.
I can confirm, from firsthand knowledge, that Elixir is used at dozens of Fortune 500 companies in the US.
reply