If I grasp the essence of Rama: - "Depots" are event streams (for event sourced ...

nathanmarz · on Aug 15, 2023

From the post:

Individually, none of these concepts are new. I’m sure you’ve seen them all before. You may be tempted to dismiss Rama’s programming model as just a combination of event sourcing and materialized views. But what Rama does is integrate and generalize these concepts to such an extent that you can build entire backends end-to-end without any of the impedance mismatches or complexity that characterize and overwhelm existing systems.

You have the general model correct, but here are a few clarifications:

- PStates are partitioned, durable, replicated indexes that are represented as arbitrary combinations of data structures. A PState can be as simple an an integer per partition, or it can be complex like a map of lists of maps of sets. PStates allow you to shape your indexes to perfectly match your application's use cases.

- I wouldn't call Rama queries an "engine", as it's considerably more straightforward in how it works than something like SQL. The base query API is called "paths", which are an imperative way to concisely reach into one partition of one PState to fetch or aggregate values. There's also "query topologies" which are predefined, on-demand distributed computations that can fetch and aggregate data from many partitions of many PStates.

3cats-in-a-coat · on Aug 15, 2023

Thanks, I will read more soon! I'm curious... how do you resolve the "impedance mismatch" between some "canonical" models that business decisions are made, based upon, which need to be synchronous with the depots (and mutually synchronous with other models sharing fragments of the same data), and the eventually consistent read models, which have a more lax constraint on how up to date they are?

How do you ensure consistency here? How do you organize it in the data flow?

Say I update a user, because that user seems to still be there in the query result/indexes, but actually an event for this user being deleted has happened some time ago?

This can also happen I suppose of the depots run queries themselves on PState in order to determine if a certain event is valid at all or not, and how exactly to carry it out.

nathanmarz · on Aug 15, 2023

The impedance mismatches you're used to from using databases are gone because:

- You can finely tune your indexes to be exactly the optimal shape for your application (data structure). You can see this in our Mastodon implementation with the big variety of data structures we used for all the use cases. - You're generally just using regular Java objects everywhere: appending to depots, during ETL processing, and stored in indexes.

How you coordinate data creation with view updates is a deeper topic, so I'll just summarize one of the basic mechanisms Rama provides for coordinating this. Depot appends can have an "ack level" that determines the conditions before Rama tells you that depot append has completed. The default level is "full ack" which includes all streaming topologies colocated with that depot fully processing that record. With this level, when the depot append completes you know that all associated indexes (PStates) have been updated.

There's also "append ack", which only waits for the depot append to be replicated on the depot, and "no ack", which is fire and forget. These all have their uses depending the specific needs of an application.

3cats-in-a-coat · on Aug 15, 2023

Thanks! So we can see these ACKs as "wait and synchronize" signals I suppose? However how can we ensure an "all or nothing" between all parties trying to ACK a conditions they're mutually dependent on? I.e. transactionality or atomicity?

dustingetz · on Aug 15, 2023

you're missing automatic/free linear scaling

3cats-in-a-coat · on Aug 15, 2023

Systems that promise "free linear scaling" without qualifiers either withhold or have not analyzed/realized their bottlenecks yet. Say if there is eventual consistency maybe the "eventuality" becomes so long that the service fails at its purpose. Or the communication link bandwidth is exhausted between key business logic (mutation event generating) services, and so on.

The only systems that scale linearly are stateless systems. Mastodon is not stateless. And even stateless systems hit some bottlenecks eventually, as they exist and run in a scale-variant Universe.

So this claim by itself doesn't immediately impress me, just turns my red lights on, awaiting further investigation. But we can of course discuss why this claim is made and how is it supported. The article is long so I've not had the chance to read it entirely yet.

But we have X number of event streams mapped through Y number of ETLs to produce Z number of read model indices, in a shape that seems to form a highly interlinked DAG, which eventually loops back on itself in terms of message flow. Just the increased cross-chatter here as we introduce more features suggests non-linear scaling.

dustingetz · on Aug 16, 2023

for example it can scale the way persistent data structures scale, which is to say "O(1) within target operational bounds" despite technically being log-n with high branch factor)