Oh, I'm handwaving, not talking about the underlying details of star schemata, c...

Oh, I'm handwaving, not talking about the underlying details of star schemata, clever column representations and whatnot.

But I think the analogy is still correct.

Every such system has two functional requirements:

1. Store data.

2. Query data.

And every system has the same non-functional requirement:

1. Storage (write) should be as fast as possible.

2. Queries (read) should be as fast as possible.

However, per an observation I made a while back, complexity in the problem domain is a conserved value.

Insofar as your data requires processing to be useful, that complexity cannot be made to go away. You can only decide where the complexity cost is paid.

You can pay it at write time and amortise that across reads. You can pay it at read time and excuse writes. Or you can pay it in the middle with some sort of ETL pipeline or processing queue.

But you must always pay it. The experiences of data warehousing made that bitterly clear.

So really, the job of a software architect is to take the business requirements as a non-functional requirement (an -ility) and then pick the architecture that fits that NFR. That includes dropping other nice-to-have non-functionals.

Twitter's non-functional requirement is that they want end-to-end latency to be 5 seconds or less, under conditions of very low write:read ratio. This suggests paying the complexity cost up front and amortising it over reads. And that's what they've done.