GhostDB – A Fast Distributed Cache

dvirsky · on July 9, 2020

Written in Go? I wonder what the GC pressure is going to look like with many small keys. I've written a redis clone in Go a few years ago and the GC pauses when reaching a few GBs of utilization were awful. Granted the Go GC has improved considerably since but it's still going to be noticeable I bet.

jldugger · on July 9, 2020

> I've written a redis clone in Go a few years ago and the GC pauses when reaching a few GBs of utilization were awful.

Probably managable with a good slab allocator.

dvirsky · on July 9, 2020

Yeah, I've seen people do that but I've glanced the code, it's certainly not the case here.

GhostDBCache · on July 9, 2020

We're aware of the issues this could cause and we will be looking into relieving GC pressure

dvirsky · on July 9, 2020

Great. Good luck. As someone who's worked on the redis codebase, I suggest you take a good look at some of the design decisions and optimizations that project took, e.g. wrt persistence and replication.

mesaframe · on July 9, 2020

> in-memory, general purpose key-value caching

> GhostDB can provide you with up to a 25x increase in data retrieval speeds when compared to databases such as MongoDB and MySQL.

Isn't it insincere to say your tech is faster than a DBMS which works on a slower hardware.

WJW · on July 9, 2020

Not associated with the OP, but I don't think so. It really does return your data 25x faster than a disk-based DBMS. The implied reduction in durability from being in-memory is clearly stated up front.

javajosh · on July 9, 2020

Given that you can configure any DB to use extra RAM as a cache, and many datasets will fit in that cache, if I was a fact checker I'd give the "25x faster" rating at least a "partly false" rating.

mesaframe · on July 9, 2020

Yeah but it is not something that is cultivated from your effort.

iammru · on July 9, 2020

Doesn't Redis already address all of this?

cmckn · on July 9, 2020

Yes, probably with superior performance. The clustering model seems to differ, but as far as I can tell, the 'distributed' part of this project isn't really implemented.

GhostDBCache · on July 9, 2020

Depending on the use case. If this is being used to reduce load on your database, then there are no noticeable differences between performance of GhostDB or redis from the clients end.

totorovirus · on July 9, 2020

Same question. How is this different from Redis?

GhostDBCache · on July 9, 2020

Not too different from Redis at the moment, however this is straight out of a university project were we had a tight deadline and had to limit scope considerably but if you check the roadmap on the repo we want to add a lot to it.

boxfire · on July 9, 2020

How does this compare to Anna, which makes similar claims (and seems to actually deliver), https://github.com/hydro-project/anna (up to date link)

GhostDBCache · on July 9, 2020

I've never heard of this so I can't say.

trishume · on July 9, 2020

“microsecond performance” irks me because you can’t get latencies of a microsecond without very expensive special network cards and APIs which they don’t seem to mention and I doubt they use. So they either mean 10s to hundreds of microseconds or they’re taking the reciprocal of their throughput which would be weird and misleading.

If you’ve made a fast product that’s great! Show us with well-explained benchmarks not the term which sounds best but you can still hand wave as justified, because it’ll make people suspicious.

dallbee · on July 9, 2020

Or, perhaps, they're taking the non-networked use case.

65536 · on July 9, 2020

I don’t have a stake in this but the page does say

> delivers microsecond performance at any scale

and then

> a very large hash table that is distributed across multiple machines

So the way that I read it, the network will be involved when they said “at any scale”.

nodesocket · on July 9, 2020

Great project. I did not see an explanation on the architecture of a cluster. Is data replicated to all nodes, or is data only stored on a single node (sharded). What happens when a node in the cluster goes down? Does there have to be a consensus (odd number of nodes) for the cluster to be "healthy"?

cmckn · on July 9, 2020

> GhostDB provides a very large hash table that is distributed across multiple machines.

Sounds like no replication to me, AKA the memcached model. I can't find how to actually configure a cluster (the `Cluster Configuration` section of the docs doesn't contain anything related to hosts). I also can't find anything client side that would distribute requests to a list of nodes with a consistent hash, for ex. I can't find a client at all, actually.

Still, interesting project, kind of aiming for Redis features and a memcached topology.

GhostDBCache · on July 9, 2020

The clients are the SDKs. Our docs are getting an update now to better explain how to configure clusters.

The SDKs are in separate repos currently (this is due to how university made us structure the project).

GhostDBCache · on July 9, 2020

Currently we follow the memcached model so there is no replication however, remember this is straight out of university were we had to limit scope considerably and were on a tight time constraint but we aim to add data replication, consensus etc.

rawoke083600 · on July 9, 2020

Congrats for getting it out there ! That is more than half the battle. I thing I would like to see also is more Redis-Clones. The redis-parts not the KV parts(lots of options here). I want some alternatives or speed improvements for "Unions" and other set based operations :). PS. GhostDB looks cool !

GhostDBCache · on July 9, 2020

Adding support for sets is on our roadmap!

Irishsteve · on July 9, 2020

It’s listed as a college project that they’d like to turn into something generally available for use. Fair play.

forgotmypw17 · on July 10, 2020

for nojs folk: http://archive.is/U8tTA

OP: fyi, your site fails to load or display anything besides a "loading" spinner which never goes away if js is disabled in browser.

strogonoff · on July 9, 2020

> at any scale

Would it be feasible to bundle GhostDB with an Electron app, and run it on end-user’s machine?

GhostDBCache · on July 9, 2020

I don't know much about electron but if it allows you to bundle binaries and execute them then yes!

matt_f · on July 9, 2020

Could anyone explain a use case for this? (Distributed KV store)

Genuine question, thanks

mbreese · on July 9, 2020

I would imagine instances where you'd use memcached.

Session storage could be one use-case. Another could be caching DB query results for a specified amount of time (1 min, 5 min, etc).

GhostDBCache · on July 9, 2020

In-memory data lookup

Relational and Non-relational database speedup

Managing spikes in web/mobile apps

Session-store

Token caching

Gaming - Player profiles & leaderboards

Web page caching

Global ID or counter generation

Fast access to any suitable data

AlphaSite · on July 9, 2020

Is this similar to geode/gem fire?

GhostDBCache · on July 9, 2020

I'm not aware of these so I can't say

Neoxy · on July 9, 2020

Will love to dig deeper