Yeah definitely, these ideas always sound very appealing to me, in theory -- I a...

nathanmarz · on Aug 15, 2023

When you step back and consider the incredible amount of manpower and resources that have been put into these applications, it's amazing how buggy these applications are. To put it simply, they're buggy because the underlying infrastructure and techniques used to build them are so complex that the implementation is beyond the realm of human understanding.

The way applications are built, and have been built since before I was born, is by combining together potentially dozens of narrow tools together: databases, computation systems, caches, monitoring tools, etc. There has never been a cohesive model capable of expressing arbitrary backends end-to-end, and every application built has to be twisted to fit onto the existing narrow pieces.

Rama is a lot more than just "event sourcing" and "materialized views". Those are two concepts at its foundation, but the real breakthrough is being that cohesive model capable of expressing diverse backends in their entirety. It took me more than five years of dedicated research to discover this model, and it was extremely difficult.

chubot · on Aug 15, 2023

Yes, I 100% agree with you. I would like something like this to succeed, and agree the problem is real.

But what are the tradeoffs? There's nothing that comes with 100x benefit with no tradeoffs

(side note: I worked on Google Code for a short while in 2008, concurrent with Github's founding ... I think Github moved a lot faster in a large part because they weren't dealing with distributed systems at first -- they had a Rails app, a database, and RAID disks, and grew it from there. We had BigTable and perf reviews :-P )

Eventual consistency is probably one?

Can I specify that comment editing is correct and ACID, while likes/upvotes are eventually consistent? (No is a fine answer, these problems are hard)

I read through much of the doc, and don't see a mention of the word "consistency" at all, which seems like an oversight for something that is unifying what would be in a database with computation.

nathanmarz · on Aug 15, 2023

Rama is a much broader platform than a database, so the consistency semantics you get depend on how you use it. When using Rama, you're not mutating indexes directly like you do with a database, but adding source data that then gets materialized into any number of indexes.

You get read-after-write consistency for any PStates in a streaming ETL colocated with the depot you appended to. This is if you do the depot append with "full acking", which coordinates its response with the completion of colocated streaming ETLs. If you append at a lower level of acking, then you get eventual consistency on those PStates at the benefit of lower latency appends.

Microbatching is always eventually consistent as processing is asynchronous and uncoordinated to depot appends. Microbatching is higher thorughput than streaming and has simpler fault-tolerance semantics.

You'll be able to read a lot more about this when we release the docs next week.

chubot · on Aug 16, 2023

OK, thanks for the response

I think many people are going to have problems programming with this consistency model, as they will with any that's different than a single machine. But that's basically "physics", so it's inevitable :)

But it seems like great work within the constraints -- look forward to learning more

I have indeed wondered why none of the cloud platforms have built more forward-looking tech like this -- instead it's copies of AWS and so forth

datavirtue · on Aug 15, 2023

I would like your newsletter.

datavirtue · on Aug 15, 2023

You should try using Facebook marketplace. It is so rickety. I have to get on a desktop to use it at all.

chubot · on Aug 15, 2023

Hilariously, I went to edit the above comment, and HN was overloaded. Then it served me three or four 500's, AND it served me stale data in between

I was pissed off that I would have to type my comment again, but actually it did save it, and refreshing worked.

From what I understand Hacker News is architected more in-memory, on one big box ... Perhaps similar to the event sourcing model

(not knocking hacker news -- it's generally a very fast site, MUCH better than Reddit. Just that scaling beyond a single machine is difficult and full of gotchas )

kaba0 · on Aug 16, 2023

StackOverflow used to run on a single (very beefy) machine also for a long time — databases make efficient use of vertical scaling, horizontal scaling is much harder.

Of course, specialist systems can often do much better.