Hacker News new | past | comments | ask | show | jobs | submit login
OrbitDB – serverless, peer-to-peer database on top of IPFS (github.com/orbitdb)
202 points by imhoguy on Dec 23, 2018 | hide | past | favorite | 57 comments



Here's a quick overview of how OrbitDB works.

A "database" in OrbitDb is a single document or single log, with fixed at creation write permissions. So if you would want ten users to each be able to write their own log of data, but not touch the other's, then you would have ten separate OrbitDB "databases". Anyone can read.

Each new piece of data is an "Entry", and is written to its own IPFS address. Each entry contains a bunch of IPFS address pointers - to the last entry on that "database", to any extra entries you found out about, and to a bunch of previous entries (just to speed up reading). So given one entry, you can keep recursively crawling all previous IPFS addresses contained in them, to discover all previously known entries that make up that database to that point. The entries work a lot like git commits. It's a sweet DAG.

In order to get started knowing the state of a database, you have to have the latest "head" entry. To do this, you need some way of talking live to other OrbitDB "peers" hosting that database, and asking them for the latest head entries. OrbitDB does this through IPFS PubSub, looking for other peers subscribed to the same database, and exchanging latest head entries with them. Importantly this means that if no one is currently online with your database, then you can't get the current state of it.

OrbitDB (and IPFS PubSub) are definitely, absolutely not production ready. But that's another topic.


> Each entry contains a bunch of IPFS address pointers - to the last entry on that "database", to any extra entries you found out about, and to a bunch of previous entries (just to speed up reading)

How bad is the latency to read the entire database state? Even git can become pretty slow fetching and processing objects on large repos, and all of those refs are local after the initial fetch.

Seems like with the additional latency on IPFS, resolving the entire DB state (for snapshotting it or otherwise backing it up, for instance) would be unusably slow for anything even somewhat large, no?


From that description incremental snapshots should be just reading the latest HEADs, and having a full copy of the DB means also having a complete history. The only gap is first time read of the whole db. One way to solve that might look something like git pack files.


Doesn't every new user have to go through a "first time read"?


It's a valid concern.

In my testing, I saw OrbitDB load 10-40 entries per second (with 1 byte entry data payload sizes). Each entry tries to point to up to 64 previous entries, which allows a lot parallelism in loading previous entries.

Data size is also an issue which isn’t helped by parallelism.

And your database only ever grows, both in size and in number of entries.


>> OrbitDB (and IPFS PubSub) are definitely, absolutely not production ready. But that's another topic.

That seems to be the story of the whole blockchain space unfortunately. It's incomplete at every level of the stack, even the IPFS pub/sub layer still needs a lot of work to scale.

A problem with the blockchain community is that there is an ICO first and then do the work later mentality. This is a bit like the startup mentality of Silicon Valley.

I think that what works for centralization won't work for decentralization. You need the people who own the coins to be skilled and for them to directly contribute to the projects; you don't want your network to be made up of random shareholders who don't understand what they're buying and aren't capable of adding real value to it.


Why did you reference ICO / coins / ...? Ipfs and orbitdb exist without any cryptocurrency. There's filecoin relying on ipfs, but it's pretty isolated.


Everything is in this state initially.


GUN and SSB are production ready, both running now large communities:

- Patchwork has grown & scaled with spanning trees.

- Notabug.io handling large influx of users with daisy-chain ad-hoc mesh-networks (DAM).


notabug.io looks like an awful community, but it does seem to work.


Some teams/communities are doing it the right way round ;p but feels like the exception.


You may also be interested in (full disclosure: ours)

https://github.com/amark/gun

The Internet Archive ( https://news.ycombinator.com/item?id=17685682 ) runs it, and D.Tube and notabug.io run it in production and have pushed terabyte/daily of P2P traffic.

They both use CRDTs, the CRDT that GUN uses is capable of mutable (and immutable) decentralized cryptographically secure (user accounts and all) data. This is important because indexing data can be done in realtime with P2P updates without having to rediscover the new hash of changed data (which adds extra latency/delay on immutable-only backed stores).


I hope the text is just a little unclear and that: https://gun.eco/docs/Auth

"... Finally, you can then save data to their account that nobody else can write to:

(...)

When it is stored on disk or sent over the wire, it uses cryptographic signatures (see the video explainer), to secure the account and data without relying upon any trusted servers!

And then when you use GUN to read the data, it automatically verifies and decrypts the data for you:..."

Really means to say that data is always signed and encrypted (presumably, effectively encrypted with authenticated encryption) - and not simply signed?

It's one thing to be able to prove who wrote what, another to be able to read what everyone writes... And the latter isn't usually what you want...


SEA automatically signs/verifies.

To encrypt (cipher/decipher) data, you need to call `SEA.encrypt` ( https://gun.eco/docs/SEA#encrypt ).

There are a couple properties it automatically encrypts/decrypts (account data when you login), but beyond that you have to tell it what data is private versus public with `SEA.encrypt`.

Thank you for pointing out those docs though, certainly need to be fixed to clarify more (rather than leaving it to the video explainers) - will do that now. Thank you!


The project's choice of name deters me from even trying this unfortunately.


It would be nice if you were to expand that point. Is it for personal beliefs, or corporate policy, or something else?


Well, it's not a neutral word and this is an age where flawed filters and people might erroneously flag a search/posting history as fire arm related. It might be a low probability of causing any issue, but it's an unnecessary risk and one that would continue to niggle at me if I became dependent on the tool.


Glad you posted about this. How many peers does the largest GUN installation run, and what is the maximum? Notable difference between GUN and OrbitDB is GUN is production-ready while OrbitDB is still alpha.


D.Tube (from internal + external stats like similarweb) has had ~1M monthly uniques.

A full peer can run in every browser, but other components like DAM ( https://gun.eco/docs/DAM ) and AXE (still being developed) optimize network bandwidth, so:

Not all peers need to connect to other peers.

Some peers can be installed on laptops.

Some peers can run in the cloud.

WebRTC fails a lot because Browsers still suck, so a lot of peers still "daisy chain" through IPv6 peers (which have handled these large terabyte loads). We hope AXE will improve the WebRTC situation.

AXE will use a radix DHT to optimize peer connections, the algorithm is explained here: (copied from a chat I had explaining how it will work)

https://gun.eco/docs/DHT

So I can't give you any good answers yet, but we should get more stats coming up soon as these changes happen!


How far away is AXE from being ready for early adoption? (Btw, hi Mark! Good to see you on HN)


Hey! :D

Yeah, so @rogowski in the community has prototyped 37% of it on the AXE github branch!

I haven't had time yet myself, it would at least be several months, but the more people that team up with him (you up for joining!?) then obviously the faster it could get out.

Part of my delay is GUN works & scales well enough without needing AXE for now, so we're focusing on creating better experiences for those millions of users, to then drive more adoption to AXE when things actually start costing more.


Who pays for this? IPFS would normally need a pinning node to keep contents from disappearing.

Does OrbitDB require the same? If yes, it is not serverless. If no, then where will my data get stored if I upload it and then disappear for a year?


This is not a service. IPFS is not a service. Who ever has authorized the software to run on their computer is paying for it.

IPFS is a set of protocols, in the same way that Bitcoin is a set of protocols.

Do you consider a peer2peer WebRTC collaborative text editor serverless?

IPFS is just a fancy content addressed value store that allows peers to ask the network if someone has the value for any address, and OrbitDB is a database built on top of it.

> If no, then where will my data get stored if I upload it and then disappear for a year?

On your computer, and it will be available to the network if you turn your computer back on.


(1) IPFS is a service, because it is a single database with consistent information. It is somewhat unusual in that it is widely distributed and is owned by multiple independent actors, but that does not make it any less of a service.

An example of set of protocols is HTTP. There is no single "HTTP content cloud" -- each HTTP server is unique and provides different content.

(2) I defined "serverless" as "does not need the server to operate" (we traditionally ignore static files servers in this definition). So the case of "collaborative text editor" is a great example, because there are two very distinct products using this name:

- Some of them are designed for throw-away documents and have little expectation of durability, like collabedit -- these could be truly serverless, with all data stored in LocalStorage (of course a commercial entity would still at least maintain a TURN server, thus making it not fully serverless)

- Some of them are designed for more permanent purpose, like Google Docs. For those, even if you use peer2peer synchronization, you still need a central server. After all, you don't want to lose your life's work if you cleared your browser state.

My assumption with most of the databases is that unless mentioned otherwise, they are durable and capable. For OrbitDB, it looks like if you want your database to survive browser cookie clearing and/or to have more than LocalStorage limit, you need to maintain a real server on your infrastructure. So it is not serverless.


I think I know what you're on about.

In my mind "Serverless" just means "Requires a server" and the author doesn't particularly think about their terminology.


With the decreasing number of personal computers isn't the substrate for IPFS just shrinking?


While I prize anything that may help go distributed for our freedom and safety I'm not much convinced by IPFS (IMVHO a nearly-unusable monster) and while I comprehend many of it's reasons I prefer a network that distribute contents to any peer so it basically give a guarantee of existence of contents if there are enough peers.


What's the incentive to host this content?


Being a society, free, not tied to few big guys, for instance with the best "backup" possible for our valuable data. Apart of that if you want a distributed network (and we desperately need it) you can choose to participate or not if it's architectural designed for that...


Hm.. Am I reading this correctly?

https://github.com/orbitdb/orbit-db/blob/master/GUIDE.md#acc...

Currently, the only access mode is read-write? So as soon as you move beyond the use-case of private, single-client data (eg: browser storage) everyone involved can read and write data?


I'm not sure where you are getting the idea that the only access mode is read-write.

> You can specify the peers that have write-access to a database. You can define a set of peers that can write to a database or allow anyone write to a database. By default and if not specified otherwise, only the creator of the database will be given write-access.

It would appear that the database is read-only for all other peers.


"OrbitDB currently supports only write-access and the keys of the writers need to be known when creating a database. That is, the access rights can't be changed after a database has been created. In the future we'll support read access control and dynamic access control in a way that access rights can be added and removed to a database at any point in time without changing the database address. At the moment, if access rights need to be changed, the address of the database will change."

That sounds like data is either private, or world read-writable.

Ed: but I take it, data is World-readable in general? So private data would need encryption?


Does anyone know if a similar thing exists for RDF?

I'm looking for a distributed scalable P2P triple store (or graph database) to store and retrieve RDF using SPARQL.


You should probably check out:

- SOLID (Tim Berners-Lee, RDF)

- GUN (ours, graph)


This is super cool. Love it!


Is this related to OrbitJS?


"serverless"...


Yeah... Peer to peer networking is distinct from client to server networking. If there are no nodes that have roles dedicated to serve content at the request of s client,its serverless.


Can we all just accept serverless mainly means decentralized and move on from these posts?


in the real sense of the word


I would define anything that responds to remote request a server program, which then would make the system that runs that program a server. =)


A more interesting definition is that a server is something that is asymmetric from a client. Things that are peer to peer which are symmetric to each other are something special.


Not really. Each peer is running a client and server and the only difference is the role of each node in a given transaction.


While you can frame peer-to-peer interactions as as a series of server-client interactions with servers and clients swapping roles... you shouldn't. It's a different paradigm with different implications.

Individual transactions might be usefully described as server-client, but the overall system is not server based. So...serverless.


In all cases, one node must send the first packet. The node sending the packet is the client, and the destination of the packet is the server.

Also, the internet is a peer-to-peer system. But "peer-to-peer" is an abstract paradigm, because ultimately, (a) two peers need to know about each other, and (b) one peer needs to initiate every transaction.

I realize we're just arguing semantics here, and I'm not sure what point I'm trying to make, but it's an interesting discussion nonetheless...


> In all cases, one node must send the first packet. The node sending the packet is the client, and the destination of the packet is the server.

I'm pretty sure you are just making up new definitions. It's kinda like saying that whoever says the first word in class is the teacher and the rest are student.

The fact is that both peers send packets and the first packet isn't particularly significant, except from a stateful connection or firewall perspective.


Well, let’s breakout Wikipedia [0]

> Clients therefore initiate communication sessions with servers which await incoming requests.

> Both client-server and master-slave are regarded as sub-categories of distributed peer-to-peer systems.

[0] https://en.m.wikipedia.org/wiki/Client–server_model


> Clients therefore initiate communication sessions with servers which await incoming requests.

Dogs walk on four feet, therefore anything that walks on four feet is a dog?

> Both client-server and master-slave are regarded as sub-categories of distributed peer-to-peer systems.

Dogs are canids, that doesn't mean all canids are dogs.

The client-server model really entails more than having one party that sends a packet before the other. Some illuminating details and a shallow comparison to the peer-to-peer architecture can be found in the article you linked.


By this definition, nothing that communicates with another process can be called serverless.


That definition is wrong.


There are two separate meanings of the term server.

1) An application which runs on a network and services clients 2) A dedicated computer which runs server applications.


Serverless as 3) An application which run on a network and communicate with others on an equal status, you cannot distinguish one node from the others based on the source code or the logic it is running. In a sense even amazon serverless platform is not server less but it is server lite.


What definition would you use?


Some that doesn't make a Bittorrent client or a Bitcoin node a server.


But both of those are servers?


No, both of these are peer to peer applications


.. which have client and server parts of the protocol, that can be separately disabled. The fact that same piece of software can run on in client mode and in server mode is irrelevant.

In bittorrent example, the seeders are pure servers, and those cheating aps which only connect to seeds are pure clients. Yes, many trackers actively try to detect and discourage pure clients, but this is not enforced by the protocol itself.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: