> Or, if you prefer, you can still fallback to Redis, which is something I would...

josevalim · on Sept 8, 2020

> What if the geoip lookup took 1.5 seconds to look up from a remote API? Is ETS still the right choice?

I would use ETS to cache local lookups (for all cores in the same node). Then fallback to Redis to populate the ETS cache. But again, feel free to skip one of ETS or Redis. The point is that ETS adds a different tool you may (or may not) use.

> Like, if I wanted to cache a PostgreSQL query that took 1 second to finish. Isn't ETS the primary place for such a thing?

Here is the math you need to consider. Let's say you have M machines with N cores each. Then remember that:

1. ETS is local lookup

2. Redis is distributed lookup

If you cache the data in memory in Ruby/Python, you will have to request this data in PostgreSQL M * N times to fill in all of the caches, one per core per node. Given the amount of queries, I will most likely resort to Redis.

In Elixir, if you store the data in ETS, which is shared across all cores, you will have to do only M lookups. If I am running two or three nodes in production, then I am not going to bother to run Redis because having two or three different machines populating their own cache is not an issue I would worry about.

> > if I need distributed state, I just use to the database too.

Apologies, I meant to say "persistent distributed state" as not all distributed state is equal. For ephemeral distributed state, like Phoenix Presence and Phoenix PubSub, there is no need for storage, as they are about what is happening on the cluster right now.

bitwalker · on Sept 7, 2020

My opinion is that this depends entirely on the cost relative to the overall task, and how likely cache hits are to occur. If cache hits are very likely and the task occurs frequently, I'd strongly consider storing it in ETS. If cache hits are unlikely, then it depends purely on how expensive the task is, but generally there isn't a lot of benefit to caching things that are infrequently accessed.

I wouldn't cache database queries unless the query is expensive, or the results rarely change but are frequently accessed.

Generally though, whether to store something in ETS or not is situational - your best bet is actually measuring things and moving stuff into ETS later when you've identified the areas where it will actually make a meaningful difference.

> This part throws me off because I remember hearing various things in Phoenix work in a distributed fashion without needing Redis.

This is true, but it depends on what kind of consistency model you need for that distributed state. The data you are referring to (I believe) is for Phoenix Presence, and is perfectly fine with an eventually consistent model. If you need stronger guarantees than that, you'll need a different solution than the one used by Phoenix - and for most things that require strong consistency, its better to rely on the database to provide that for you, rather than reinvent the wheel yourself. There are exceptions to that rule, but for most situations, it just doesn't make sense to avoid hitting the database if you already have one. For use cases that would normally use ETS, but can't due to distribution, Mnesia is an option, but it has its own set of caveats (as does any distributed data store), so its important to evaluate them against the requirements your system has.