Hacker News new | past | comments | ask | show | jobs | submit login
zBase – A high-performance, elastic, distributed key-value store (zynga.com)
60 points by slynux on Sept 19, 2013 | hide | past | favorite | 24 comments



I don't understand what this offers that isn't offered by, say, Riak. Riak ticks all the same big feature boxes as zBase (distributed, elastic KV-store that persists to disk) and has the advantage that Someone Else (i.e. Basho) is paying for development.


From what I've read zBase is a fork of Membase. So zBase is a direct competitor to Couchbase which is what Membase became.

One of the main advantage of zBase / Membase / Couchbase compared to many other NoSQL data stores is their strict consistency model (compared to eventual consistency in Riak for example).

This means that with a zBase/Membase/Couchbase cluster, if client A writes a new value to a key in the cluster and client B reads from the same key immediately afterwards, client B is guaranteed to (immediately) see what client A wrote.

While in eventual consistency models, client B might see the old value in that case (could be because the change might not have propagated to all servers in the cluster yet). If client B tried to read again a few minutes later he might then see the new value from A.

Strict consistency is required in a lot of applications such as game servers (which is why Zynga needs it).

For use cases that are more read-heavy, like holding the contents of a news website, eventual consistency is good enough.


Riak is extremely slow, perhaps zBase delivers better performance?


opendomain: you're dead. Not sure why (I had a look through the comment history, and couldn't find anything that stood out). FYI.


Are there any benchmarks you can point me to.


If you look at the key features, following are the attractive points about zBase:

- LRU based or random eviction based cache management.

- Support for multiple disks and thereby IO parallelism.

- Incremental Backup and Restore (You can pack 5x .. 10x size of RAM in ZBase and make use of incremental backups for node failover)

- Incremental backup helps to offer Blob level restore in hourly, daily and weekly granularities)

- Cluster manager - ZBase operates by partitioning entire data into virtual buckets and servers act as containers to hold these vbuckets. Hence provides scalable ways to increase or decrease the number of servers in a cluster.


That dynamic resharding looks very nice. The big issue I see for using this as a real datastore is the apparent lack of queries and indexes on the data. Keeps it a lot simpler I guess, but so many workloads require the use of queries. I guess you'd load the data into some other system for querying and just use this for storage? Or would you use another database for storing the data, and load it into zBase for quick access to buckets?


It's a distributed key-value store with durability. If you're looking for something to do ad-hoc queries against this is not for you.

Think of it as memcache + disk persistence. (So rather than erasing things by purging cache when memory slab fills, you just evict it from memory and read from disk if its needed again).


I get that - but the usual implementation would be to have a set of databases with indexes (maybe mysql or mongodb) where you could store all the data and run ad-hoc queries against. You'd then put memcache in front of that for fast access to repeated queries where you already know which data you want. If the data isn't in the memcache, it would fall through to the underlying DB that is already on disk.

zBase would have it's own full copy of the data already on distributed disks, so it wouldn't need to fall through to some other database. That seems to be the entire point there - but surely you'd still need to store the data in some place you could run ad-hoc queries on it? That means that the data is duplicated into two places that would need to be kept up to date in sync. If a transaction fails on one of the data stores, don't you have inconsistent data now?


Currently zBase does not have any capabilities for indexing. But, the inherent design enables to use incremental replication protocol to build things outside of zBase to do indexing.

zBase is used as highly available key-value store for writes and reads. It offers few fancy operations like get-lock as well.


If your workload is light enough that a set of "full" databases is a cost effective solution, then a distributed KV store is not what you need.


Game workloads don't require ad hoc queries, typically - you're usually just stuffing a save state into the database every few seconds, and you route everything that you might need to query through analytics (except maybe billing, which you'd probably handle another way). Zynga pushes analytics data into their biggish (a few hundred nodes IIRC) Vertica cluster, which is far better for that sort of thing than most random access databases.

In any case most game state data wouldn't be very useful on its own, you typically want to look at event streams and histories of certain values, not snapshots of the current state.


When Cassandra came out, before they built a query language, their suggestion was to manually build indeices. Your data would be in one key value store then an index on the data was just another key value store with your data keys as the value and the index as the key.

It works out well enough if you never need ad hoc queries.


Anybody cares to compare this to Redis?

EDIT: To clarify, I know Redis, I'm interested in learning how this differs beyond its distributed nature.


Redis clustering and elasticity is still a work in progress. That said, when sharded or for single nodes, Redis is an outstanding tool, and their roadmap to distribution is very promising: http://redis.io/topics/cluster-spec

Note also that Redis Sentinel http://redis.io/topics/sentinel provides high availability.


Thanks hbbio - I was mostly interested in learning the differences from the point of view of zBase, as I'm a user of redis.


Sorry not answering earlier. I don't know zBase neither, just Redis...


No problem hbbio - I appreciate :)


Note that Zynga's workload is typically very write heavy and zBase has been designed to support just that. in fact its one of the largest No-SQL d/b deployments with over 6000+ nodes in production.


are all of them in 1 cluster?


No. Many smaller clusters.


This sounds somewhat like RethinkDB, although I don't believe RethinkDB has dynamic resharding.

Other than the dynamic resharding part, how do zBase and RethinkDB compare to each other?


Hey look at all that hard work those people that got fired put in! Yay Zynga!


Looks neat and all, but I have a hard time getting behind anything Zynga is doing...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: