I'm wondering why you choose to implement your own cryptography routines instead of using something standard like TLS. Apparently your `DecryptData` and `Encrypt` methods are vulnerable to replay attacks due to a lack of (EC)DH-style key exchange.
Thanks for the critique! I wanted to use symmetric crypto as its trivial to use without domains and certificates. The possibility of replays is a non-issue, as the key-value store is implemented as a CRDT and therefore all operations are idempotent.
On the other hand, I didn't anticipate replay attacks in the design and thanks to your comment, I'll keep them in mind should I ever find myself in a scenario where they are undesirable...
It doesn't matter if the operations are idempotent. The point is that an eavesdropper can replay a message that sets a key, for example, overwriting whatever was there previously.
It would be better to use an established cryptography system. You could do self-signed certs with TLS, like Syncthing does. Or just use SSH.
If the CRDT part is done correctly, then replaying a message that sets a key will not change anything, ever.
If the message is:
Key: Foo
Reference CRDT node ID: 7654321 (the last node that the clients knows of that updated the value of ‘Foo’)
Operation: Update
Value: Bar
The ID of this new node: 1122112211
(Omitted for simplicity: Timestamps, hashes, …)
Replaying that message won’t do anything if the target already knows about the existence of that new node.
If the target didn’t know about the node, then I guess you’re helping them sync their own data? Maybe they owe you a thanks? If you knew what each encrypted message contained, you might be able to do some split-state shenanigans; for example: replay the message that sets a “PasswordAuthEnabled” key to “Yes” but deliberately omit the message that changes the “Password” key from its default of “password” to a genuine password. It’s very hard to imagine an actual situation like this occurring, but I guess that’s what makes crypto (and designing secure systems in general) so damn tricky. That and the math. And end users. And…
I see, thanks. I was focusing on the "idempotent" part but yeah a CRDT would protect against replays. Still not a great design though, still opens yourself up to issues, in case not all messages are part of the CRDT, or you have a buggy CRDT implementation.
It's a shame that the meaning of 'idempotent' has gotten watered down by half-assed implementations. The original NFS paper from Sun [0] claims that write operations are idempotent, but they aren't really. Not if another operation has occurred. Like in:
write '1' @ 0
write '2' @ 0
write '1' @ 0 (replayed through a duplicated packet)
the duplicated write RPC reverts the second write. Duplicated link and rename RPCs are even worse. They added a replay detection cache in the server later to prevent some common error cases, but it fails if the server reboots in the middle.
Anyway, CRDT correctness is hard enough that I'd be reluctant to trust it against an adversary who can inject replays.
You stole my idea! I love it. As a dev who spends a big chunk of their day in the shell, this is the kind of tool that I was destined to create myself, but never did thanks to lack of time, laziness, life, etc.
I’m wondering what sorts of use-cases people would use a personal key-value store for. Maybe it’s just a useful foundation for building other tools on top of, like a password manager.
The primary use case is for shuffling around files or clipboards between different computers. I also regularly use the url-sharing capability.
Prior, I had to deal with ephemeral http servers, which I didn't like from an ergonomic perspective.
Ergonomically, I find redis nice. The problem is, that it is in-memory and that encryption is cumbersome. Also, kvass is able to be used offline, as the kv-store is implemented as a CRDT.
For passwords specifically there's a similar tool, https://www.passwordstore.org/ - but it stores GPG-encrypted plain text files versioned with git, instead of managing a sqlite db
I use a similar setup to store code snippets (certain Java annotations for integration/unit tests, various things like that), vehicle license plate/vins, internal (but nonsensitive) ids for test accounts, tons of things like that.
Honestly a password manager would probably be technically better—or a bunch of flat files lol—but there was a certain charm to having it displayed / function exactly as I like it, and lightning quick with nothing I didn’t need.
IDE would be another natural place for a lot of my usages, but I kept finding I needed to leave it in a pull request review or slack conversation or similar, not necessarily programming myself.
I use skate to store secrets used by some personal programs. I have scripts that pull out the secrets and set them as environment variables that are used by the programs. This way I don't have them sitting around in a configuration file in the source directory and can't accidentally commit them to git but they're easy to sync between computers.
"Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."
In case anyone is wondering about the name, it's a Slavic fermented bread drink that's much less alcoholic than beer (and commercially canned versions are near zero alcohol). It's one of my favorite chilled summer drinks, and you should be able to find it in Slavic stores in the US as well.
though I haven't gotten around to trying it, I've only had commercial bottled and canned ones. I imagine if you make it yourself you'll have a slightly more alcoholic outcome.
Cool.
Curious why you chose sqlite instead of something like badger [https://github.com/dgraph-io/badger] given you expose it as a key value database, which badger is.
SQLite allows me to keep multiple versions of the same entry, which is convenient for state merging. Half the sync logic is actually implemented in SQL. Other than that, I’m already familiar with it and the storage backend is not very performance critical for the intended use case.
Mainly self-hosting and generating share-able urls. If your key's end in ".html" the mime type is even set accordingly and you can use it for toy-websites ;)
This is by no means meant to replace the backend of your app. It's more of an alternative to usb-sticks and google drive.
What do you mean it didn't take off? QR code detection is implemented in native iOS camera and IIRC most android implementations too. Almost everyone can use it.
I feel like that has changed over the past few years. Many restaurants in my area started using them for menus, and I recently saw them used to setup wifi while on vacation.
The only places I see qr codes is on my phone to share the WiFi password and on products to scan for compitions and from time to time on advertising at bus stoos
In my country (BR) this transfer method (Pix) that can be iniatiated with a QR code has really picked up - I'm surprised a simple "scam" - replacing printed QR codes that are glued to resturant tables - hasn't caught on yet.
I'm late to reply here but I only now got around to setting Kvass up and testing it out.
I got it running on a free GCP Compute VM and linked it through to my PC so that the VM hosts the Kvass server and my PC (and in future laptop) set/get stuff on there.
I plan on using Kvass to pass things between my laptop and PC - links, files, images... etc. Will see how that goes - perhaps I don't end up using it at all.
If it seems useful I'll try hook my web domain in so that I have a more static domain to use it with.
Redis is in-memory so it's prohibitive for big files. Also kvass still works if its disconnected from the server. This is important, if you want to use it for config files.
On the other hand, using redis (/skate) for storing files was the inspiration for creating kvass.
I have so many questions about this. Much of the architecture seems off to me. I like the concept, but it doesn't seem as secure as it could be.
For the README, I'd hope to find a bit more information about the way data is stored and transmitted. For example, this seems to just be a SQLite database with values in fields? Is there a separate encryption key for the database itself? Otherwise anyone with access to the file would be able to see all data stored?
The encryption key is only used to encrypt data in transit, but not at rest? And then you're encrypting the full JSON blob instead of only the values? This seems risky to me.
What is the purpose of the ProcessID? It is randomly generated and stored in the database (thus used by all clients too). So, I'm not sure what this is for? I see it's used to resolve conflicts, but these should probably be given out by the server?
Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
Finally, I don't understand why you're using plain HTTP (no TLS) for communication b/w client and server. I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
This would have been a great use-case for a simple (non-HTTP/JSON) TCP server:
>>> AUTHTOKEN xxx
>>> SET $KEY $LEN $SHA1
>>> <bytes>
<<< OK
>>> AUTHTOKEN xxx
>>> GET $KEY
<<< $LEN $SHA1
<<< <bytes>
Custom protocols have their own security issues, but it can also be easier to see where there are potential issues (like unmarshalling unvalidated blobs). If you wrap something like the above in TLS-PSK, you're set. If you want to use encryption for a session (after you authenticate), that's possible too, but you're at risk of effectively re-creating TLS.
> this seems to just be a SQLite database with values in fields?
Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...
> anyone with access to the file would be able to see all data stored?
Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.
> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.
> What is the purpose of the ProcessID?
The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.
> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.
As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.
I just use WinSCP with remote file encryption turned on and have VeraCrypt for the local temp storage.
That way my entire working file system is encrypted at rest, in transit, and while stored remotely - entirely with heavily mature off the shelf open source tools.
This seems unnecessarily snarky. You can make anything sound silly by reducing its functionality to the most basic level possible, ignoring all aspects of ergonomics and packaging. And you could make this comment about any storage engine. Like the infamous Dropbox comment here.
And syncing between file systems across a network is hard. (Before you say it's easy you can just do X, Y, and Z... remember that infamous Dropbox comment.)
It was easy to share public links to values hosted on the file system in 1995 with Apache. It remains easy today with Nginx and other web servers.
Syncing filesystems across networks with rsync has worked well for years.
If you are considering a personal key value store, you are probably already familiar with web servers and rsync. If not, they are two general purpose tools which are likely to be useful for other projects as well.
I was absent the day of the infamous Dropbox comment.
You're just parroting the original comment which was proven to be so so wrong in practice. Most people aren't able to / don't want to duck-tape random systems together like this.
I could snakily ask you what's the point of Nginx? Why not just run a dial-in BBS? Don’t you have the skills to do that? Why do you need this fancy Nginx and why did anyone bother writing it? That’s what you sound like.
There's value in building something that is integrated.
I read it. Im pretty clear on what it does. Im still not feeling the why (or the differentiator from other things that store files and give you URLs).
Remember when Dropbox explained itself by telling you you didnt need to carry around USB sticks in your jean pockets that get washed or lost? I thought that was pretty neat.
A distributed file system seems like way more work to set up.
Also not everything has to follow the Unix philosophy. Plenty of very useful things are better off less Unix-y eg ffmpeg. But this doesn’t seem to do a bad job - it’s a very dedicated tool to do one thing, it just doesn’t store everything as files.
The gist is the original yc debut of Dropbox had a comment that described a pretty technical way to get the same functionality as drop box. It's commonly referenced when folks on hackernews dismiss a product when they can do the same with 10 unix commands, not realizing they might not be the target customer. Interestingly I think this situation is the exact opposite, Kvass seems to be more complicated for a non technical user than file systems as the top level comment responded with.
> What's wrong with Dropbox comment? ... IMO Dropbox is useless.
That's just repeating the original ignorant Dropbox comment. Over 15 million paying users don't think Dropbox is useless. And hundreds of millions of non-paying users don't either.
There are literally hundreds of distributed networked KV stores used by software developers for all sorts of projects. Showing how to store “hello world” seems like a pretty good intro.
Why can’t people see a use case for this? It maybe doesn’t compare as unique against the hundred other KV stores but it’s also a toy project and a KV store seems to have an obvious use?
Personal, I’m going to try this out since I was actually looking for a similar KV store. Only because I was looking and HN presented it to me tbh.
My use case is that I have a few Raspberry Pis at home (aka low powered) that I wanted to have a distributed config on. I wanted something easy to manipulate with a command line that was lightweight (eg not redis or consul or a password manager). Since it’s for LAN use (or actual Tailscale) the security wasn’t really important.
Heh. Snarky but true. I store just about everything in a “notes” folder which is mostly markdown files. Easily searchable / editable with any tool you like.
Picture in README.MD really tells you that author is aware of kvass (the drink). This repo actually made me google up that wiki page to get an answer: "Is this drink really called kvass elsewhere, not only in my country?". Yes it does it seems.
I expected to learn something from it - especially when it's popped on the front page of HN. Am I expecting too much of HN?
My train of thought when I saw the link "a key-value store": what data-structure are they using? A hashmap? How are they resolving conflicts? Is it in memory? How are they persisting data? Do they support multiple instances? What about concurrency? etc.
Of course I might be a bit disappointed when the project is just 4 web APIs on top of sqlite table.
If it makes no sense, ask a question or ignore it if it's not interesting. But you can't just one-liner shit on other people's Show HN, whatever you expect.
It's not HN job to entertain you. Someone wrote some software they thought was useful, so they are sharing it. If you don't like it down vote and move on. Get off your high horse.
That's a bit harsh, while I don't agree with OP I don't think that your language helps in promoting a healthy discussion about what is reasonable to expect from a HN front-page post.
Also you say that "it's not HN job to entertain you". What is HN's job then? Because honestly I do come here to read things that entertain me.
It doesn't say 'news' and the fact that HN is not about 'news' is in the first sentence of the guidelines beside being repeated in moderator comments endlessly for more than a decade. If you thought HN was about news because it says 'news' in the URL, you're mistaken.
That's interesting but you're better off going by the actual guidelines of the actual site and the many times they've been restated and commented upon by the actual moderators. Many relevant ones here
sorry but you don't get to redefine the word "news". The site is literally called "hacker news". so by definition, its a news site. granted, its news specific to computing and technology, but news just the same. unless everyone in the world got together behind my back, and changed the entire english language, then its a news site.
Fair enough. I did share some of those expectations too tbh and it was indeed surprising that the solution was that simple. But to be fair it does not promise any of that and it does what it sets out to.
What do you mean? It's a project. It has a purpose and it achieves that purpose. If you don't need a lot of code to achieve it, what's the problem? What makes it "toy"?
You could call yeast kvas, but in slavic languages there are usually other nouns used. (drozdze, drozdie, kvasok, kvasnice, закваска
, дрожжи). Kvas (the drink) is kvas everywhere.
I'm wondering why you choose to implement your own cryptography routines instead of using something standard like TLS. Apparently your `DecryptData` and `Encrypt` methods are vulnerable to replay attacks due to a lack of (EC)DH-style key exchange.