What you're saying applies more or less no any NoSQL database. Unusable for your use case is a bit of a big statement. There's plenty of cases where you don't need the power of relational databases and relational databases can't/don't typically scale horizontally the way NoSQL can. And really NoSQL can because of its attributes. If all you need is key/value lookups or time series, you're not looking for any joins or random reads that don't agree with the native ordering/partitioning then it's hard to beat NoSQL scalability/ingest-wise.
True, which is why I said that cassandra has its uses.
The problem is that the documentation and the language tries really hard to sell you something else, and I have seen multiple people wasting way too much time on this.
I agree, but I think the part that's not stated is that when picking such a technology, you basically commit to not needing joins or random reads. If you do, as the application evolves, you're really out of luck. Generally, that commitment cannot be made upfront.
Typically, solutions like time-series or document databases are deployed alongside other components in a much larger system. If you are looking for a single database solution, then maybe NoSQL is not for you
I don't think those have the same performance characteristics. At the end of the day you can't have your cake and eat it. You can't keep large scale indices that facilitate fast joins without writing them out to disk and you can't keep them in sync with your data without transactionality/atomicity. This boils down to tradeoffs in terms of locking/consensus/scale/indices/data locality. That said if you're going to be emulating this over noSQL anyways then there's an argument for using something that does it for you (assuming it meets your requirements).
This is definitely about tradeoffs.
But there is 2 tradeoffs spectrum,
1- Consistent (strict serializability + global transaction) vs Eventual consistency.
2- Range query and Join vs KeyValue only
FoundationDB provide a Consistent databse but By default only a KeyValue API.
(CitusDB,Vitess..) give you Consistency but only within a partition, but also give a nice SQL API.
(Cockroachdb/Yugabyte) provide a sql api with a consistent database.
Cassandra gives you Eventual consistency with the Key-value API.
My experience has been that the performance penalty for using a Consistent database instead of one that will silently corrupt your data is small "less than 10% slower".
As for the cost of using Join and Index, no once is forcing you to create index and send query that use join. Its 100% opt-in.
But still Google Spanner and CockroachDB made it very cheap to do join between parent entity and a child entity table but using interleaved table.
And using RocksDB and good SSD, the cost of index maintenance is not as bad as it appear to be.
Cassandra has tunable consistency. It also has some, terribly slow, lightweight transactions. If you have replication, which you need for HA/DR, then the cost of writes is pretty much the same, data has to be written to all 3 replicas. The question is, for the most part, when do you ack the writes. So on a single site Cassandra you can e.g. write 3 replicas, read 2 back, and you have a consistent database (for a given key). That's not sufficient for ACID though.
I'm honestly not familiar at all with CitusDB or Vitess in terms of where they sit in their ability to handle failures, scale out, transactions etc. or where they sit in the CAP story. I'm sure Spanner is paying the tax somewhere but it's just that Google can back it with infinite resources to give you the performance you need.
If there was a database that offered ACID, HA/DR, and scaled horizontally perfectly at a cost of 10% more resources then I doubt anyone would be using stuff like Cassandra, ScyllaDB or HBase... I don't think that exists though but I'll look up those other databases you mentioned.
It will exist next year. Cassandra is getting HA ACID transactions (depending how you define C; we won't be getting foreign key constraints by then) that will be fast. Typically as fast (in latency terms) as a normal eventually consistent operation. Depending on the read/write balance and complexity, they may even result in a reduction in cluster resource utilisation. For many workloads there will be additional costs, but it remains to be discovered exactly how much. I hope it will be on that sort of scale.
>It also has some, terribly slow, lightweight transactions
FWIW, LWTs are much (2x) faster in 4.1 (which has been very slowly making its way out the door for a while now... the project is mostly already more focused on 4.2, so pushing 4.1 out the door is tortuous for procedural reasons)
cool, good to know. There's still the relational vs. no-relational question though for many use cases but having better transactions opens the door for more use cases.
Right with real transaction you could maintain your own index if you really need them.
And to have good performance you should never join entity that have distinct partition key. Otherwise you end up reading too much data over the network.
I’m honestly very happy Cassandra will get real transaction. But i still need to understand the proposed design. Will it be possible to have serializability when doing Range scan query?
Yes, but probably not initially. The design supports them, it’s just not very pressing to implement since most users today use hash partitioners. There will be a lot of avenues in which to improve transactions after they are delivered and it is hard to predict what will get the investment when.
Proper global indexes are likely to follow on the heels of transactions, as they solve most of the problem. Though transactions being one-shot and needing to declare partition-keys upfront means there’s some optimistic concurrency control required. Interactive transactions supporting pessimistic concurrency and without needing to declare the involved keys will also hopefully follow, at which point indexes are very trivial, but the same caveat as above applies.