Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Kvass, a personal key-value store (github.com/maxmunzel)
227 points by maxmunzel on July 24, 2022 | hide | past | favorite | 116 comments


Nice project!

I'm wondering why you choose to implement your own cryptography routines instead of using something standard like TLS. Apparently your `DecryptData` and `Encrypt` methods are vulnerable to replay attacks due to a lack of (EC)DH-style key exchange.


Thanks for the critique! I wanted to use symmetric crypto as its trivial to use without domains and certificates. The possibility of replays is a non-issue, as the key-value store is implemented as a CRDT and therefore all operations are idempotent.

On the other hand, I didn't anticipate replay attacks in the design and thanks to your comment, I'll keep them in mind should I ever find myself in a scenario where they are undesirable...


TLS is available in pre-shared key (PSK) mode. Looks like there is ongoing work to add TLS-PSK to Go's standard library: https://github.com/golang/go/issues/6379#issuecomment-117006...


Cool! If this gets implemented, I'll definitely use is instead of raw AES.


Yeaaah don’t use ECB.

AEAD or gtfo


It doesn't matter if the operations are idempotent. The point is that an eavesdropper can replay a message that sets a key, for example, overwriting whatever was there previously.

It would be better to use an established cryptography system. You could do self-signed certs with TLS, like Syncthing does. Or just use SSH.


If the CRDT part is done correctly, then replaying a message that sets a key will not change anything, ever.

If the message is:

Key: Foo

Reference CRDT node ID: 7654321 (the last node that the clients knows of that updated the value of ‘Foo’)

Operation: Update

Value: Bar

The ID of this new node: 1122112211

(Omitted for simplicity: Timestamps, hashes, …)

Replaying that message won’t do anything if the target already knows about the existence of that new node.

If the target didn’t know about the node, then I guess you’re helping them sync their own data? Maybe they owe you a thanks? If you knew what each encrypted message contained, you might be able to do some split-state shenanigans; for example: replay the message that sets a “PasswordAuthEnabled” key to “Yes” but deliberately omit the message that changes the “Password” key from its default of “password” to a genuine password. It’s very hard to imagine an actual situation like this occurring, but I guess that’s what makes crypto (and designing secure systems in general) so damn tricky. That and the math. And end users. And…


I see, thanks. I was focusing on the "idempotent" part but yeah a CRDT would protect against replays. Still not a great design though, still opens yourself up to issues, in case not all messages are part of the CRDT, or you have a buggy CRDT implementation.


It's a shame that the meaning of 'idempotent' has gotten watered down by half-assed implementations. The original NFS paper from Sun [0] claims that write operations are idempotent, but they aren't really. Not if another operation has occurred. Like in:

  write '1' @ 0
  write '2' @ 0
  write '1' @ 0 (replayed through a duplicated packet)
the duplicated write RPC reverts the second write. Duplicated link and rename RPCs are even worse. They added a replay detection cache in the server later to prevent some common error cases, but it fails if the server reboots in the middle.

Anyway, CRDT correctness is hard enough that I'd be reluctant to trust it against an adversary who can inject replays.

[0] https://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=75...


You stole my idea! I love it. As a dev who spends a big chunk of their day in the shell, this is the kind of tool that I was destined to create myself, but never did thanks to lack of time, laziness, life, etc.


I’m wondering what sorts of use-cases people would use a personal key-value store for. Maybe it’s just a useful foundation for building other tools on top of, like a password manager.


The primary use case is for shuffling around files or clipboards between different computers. I also regularly use the url-sharing capability.

Prior, I had to deal with ephemeral http servers, which I didn't like from an ergonomic perspective.

Ergonomically, I find redis nice. The problem is, that it is in-memory and that encryption is cumbersome. Also, kvass is able to be used offline, as the kv-store is implemented as a CRDT.


For passwords specifically there's a similar tool, https://www.passwordstore.org/ - but it stores GPG-encrypted plain text files versioned with git, instead of managing a sqlite db

More importantly, it has Firefox and Chrome extensions for auto-filling passwords on the web https://github.com/passff/passff https://github.com/browserpass/browserpass-extension


I use a KV, Hashi vault, so my shell scripts get api keys, secrets, etc and they’re not stored plaintext or in SCM.


I use a similar setup to store code snippets (certain Java annotations for integration/unit tests, various things like that), vehicle license plate/vins, internal (but nonsensitive) ids for test accounts, tons of things like that.

Honestly a password manager would probably be technically better—or a bunch of flat files lol—but there was a certain charm to having it displayed / function exactly as I like it, and lightning quick with nothing I didn’t need.

IDE would be another natural place for a lot of my usages, but I kept finding I needed to leave it in a pull request review or slack conversation or similar, not necessarily programming myself.


I use skate to store secrets used by some personal programs. I have scripts that pull out the secrets and set them as environment variables that are used by the programs. This way I don't have them sitting around in a configuration file in the source directory and can't accidentally commit them to git but they're easy to sync between computers.


I already use my password manager for the problem this tool is trying to solve.


But it's just a wrapper around SQLite. Skip the middleman and just use SQLite.


Or don't skip the middleman and get a simple k/v interface instead of having to deal with a whole sqlite database.


It's clearly not "just a wrapper around SQLite", read through the README and it'll be evident why.


But you can’t access Sqlite over the web.


...and you shouldn't.


Seems to me that for a personal tool like this, sqlite3 is non-problematic.

https://www.sqlite.org/whentouse.html

"Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic."


Any concrete reasons? SQLite is probably good enough for 99% of websites / apps.


It's not designed to be straight exposed as a web service.

It's not hardened to handled malicious traffic.


In case anyone is wondering about the name, it's a Slavic fermented bread drink that's much less alcoholic than beer (and commercially canned versions are near zero alcohol). It's one of my favorite chilled summer drinks, and you should be able to find it in Slavic stores in the US as well.


There is a cold cucumber soup that uses kvass as its base. I would recommend giving that a try as well.


Okroshka? Yeah I love that stuff. I'm vegetarian, replacing the sausage with a vegetarian one (or leaving it out) works well. I followed this recipe:

https://www.youtube.com/watch?v=ifE7gDiLDbE

The Life of Boris also has a great video on making Kvass:

https://www.youtube.com/watch?v=k1UTJKBMvgc

though I haven't gotten around to trying it, I've only had commercial bottled and canned ones. I imagine if you make it yourself you'll have a slightly more alcoholic outcome.


What does it do better than Skate? Or what additional things does it do, url and qr codes?


I think it's fair to say that skate is the more mature tool. Kvass on the other hand is more focused and simpler.

Especially self-hosting kvass is even simpler than skate, and I had issues linking/syncing skate in the past.

It would probably be a nice weekend project to port the url/qr features to skate.


Cool. Curious why you chose sqlite instead of something like badger [https://github.com/dgraph-io/badger] given you expose it as a key value database, which badger is.


SQLite allows me to keep multiple versions of the same entry, which is convenient for state merging. Half the sync logic is actually implemented in SQL. Other than that, I’m already familiar with it and the storage backend is not very performance critical for the intended use case.


Cool project! Congrats on launching ! What is the benefit compared to reddit or CF workers KV?


Mainly self-hosting and generating share-able urls. If your key's end in ".html" the mime type is even set accordingly and you can use it for toy-websites ;)

This is by no means meant to replace the backend of your app. It's more of an alternative to usb-sticks and google drive.


I like the idea. What surprised me was the custom network protocol. I expected it using ssh to work with the remote instance.


The built-in server and remote support is pretty nice! API seems solid, and I dig the QR codes.


Qr codes are pretty cool just a shame it never really took off.


What do you mean it didn't take off? QR code detection is implemented in native iOS camera and IIRC most android implementations too. Almost everyone can use it.

In that sense it took off more than bitcoin.


That is true, its just not as widely used as I would have hoped.


I feel like that has changed over the past few years. Many restaurants in my area started using them for menus, and I recently saw them used to setup wifi while on vacation.


Maybe it's just were I live?

The only places I see qr codes is on my phone to share the WiFi password and on products to scan for compitions and from time to time on advertising at bus stoos


Very popular in nost of South and Southeast Asia


I just paid my lunch tab by scanning a QR code on a receipt, and then tapping Apple Pay. It was rad.


In my country (BR) this transfer method (Pix) that can be iniatiated with a QR code has really picked up - I'm surprised a simple "scam" - replacing printed QR codes that are glued to resturant tables - hasn't caught on yet.


i just have a directory in git and store everything in files

can anyone help explain what i'd use this for?


I'm late to reply here but I only now got around to setting Kvass up and testing it out.

I got it running on a free GCP Compute VM and linked it through to my PC so that the VM hosts the Kvass server and my PC (and in future laptop) set/get stuff on there.

I plan on using Kvass to pass things between my laptop and PC - links, files, images... etc. Will see how that goes - perhaps I don't end up using it at all.

If it seems useful I'll try hook my web domain in so that I have a more static domain to use it with.


What is the benefit compared to the private use of Redis? Redis is under BSD licence and continues to be very actively maintained and used.


Redis is in-memory so it's prohibitive for big files. Also kvass still works if its disconnected from the server. This is important, if you want to use it for config files.

On the other hand, using redis (/skate) for storing files was the inspiration for creating kvass.


I thought it’s a password store. Also bc of the name “v” pronounced Spanish = b, so key-bass, sounds like pass


It's not Spanish, though, and in quite a few languages there's "w" instead of "v". https://en.wikipedia.org/wiki/Kvass


uh oh leave that drink alone mate.


now all that is missing is a FUSE driver


I don't know much about the other solutions that people are mentioning in the comments, but I have to say... this looks elegant! Great job!


I have so many questions about this. Much of the architecture seems off to me. I like the concept, but it doesn't seem as secure as it could be.

For the README, I'd hope to find a bit more information about the way data is stored and transmitted. For example, this seems to just be a SQLite database with values in fields? Is there a separate encryption key for the database itself? Otherwise anyone with access to the file would be able to see all data stored?

The encryption key is only used to encrypt data in transit, but not at rest? And then you're encrypting the full JSON blob instead of only the values? This seems risky to me.

What is the purpose of the ProcessID? It is randomly generated and stored in the database (thus used by all clients too). So, I'm not sure what this is for? I see it's used to resolve conflicts, but these should probably be given out by the server?

Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.

Finally, I don't understand why you're using plain HTTP (no TLS) for communication b/w client and server. I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.

This would have been a great use-case for a simple (non-HTTP/JSON) TCP server:

    >>> AUTHTOKEN xxx
    >>> SET $KEY $LEN $SHA1
    >>> <bytes>
    <<< OK

    >>> AUTHTOKEN xxx
    >>> GET $KEY
    <<< $LEN $SHA1
    <<< <bytes>
Custom protocols have their own security issues, but it can also be easier to see where there are potential issues (like unmarshalling unvalidated blobs). If you wrap something like the above in TLS-PSK, you're set. If you want to use encryption for a session (after you authenticate), that's possible too, but you're at risk of effectively re-creating TLS.


Hi mbreeze!

> this seems to just be a SQLite database with values in fields?

Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...

> anyone with access to the file would be able to see all data stored?

Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.

> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.

The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.

> What is the purpose of the ProcessID?

The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.

> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.

Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.

As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.


I just use WinSCP with remote file encryption turned on and have VeraCrypt for the local temp storage.

That way my entire working file system is encrypted at rest, in transit, and while stored remotely - entirely with heavily mature off the shelf open source tools.


Can you also drink it?


Hack Mama


For personal use, I’ve had good luck storing things in files. Then when I need those those things, I read the files.


This seems unnecessarily snarky. You can make anything sound silly by reducing its functionality to the most basic level possible, ignoring all aspects of ergonomics and packaging. And you could make this comment about any storage engine. Like the infamous Dropbox comment here.


Fair. The project could do a better job of explaining what benefit is it has over the file system API.


For example sharing a public link to a value.

And syncing between file systems across a network is hard. (Before you say it's easy you can just do X, Y, and Z... remember that infamous Dropbox comment.)


It was easy to share public links to values hosted on the file system in 1995 with Apache. It remains easy today with Nginx and other web servers.

Syncing filesystems across networks with rsync has worked well for years.

If you are considering a personal key value store, you are probably already familiar with web servers and rsync. If not, they are two general purpose tools which are likely to be useful for other projects as well.

I was absent the day of the infamous Dropbox comment.


> It remains easy

You're just parroting the original comment which was proven to be so so wrong in practice. Most people aren't able to / don't want to duck-tape random systems together like this.

I could snakily ask you what's the point of Nginx? Why not just run a dial-in BBS? Don’t you have the skills to do that? Why do you need this fancy Nginx and why did anyone bother writing it? That’s what you sound like.

There's value in building something that is integrated.


Mostly the "remote" command as seen in the README.


Dropbox explained itself pretty well.

A simple one paragraph why at the top of this project's README wouldnt be amiss.


For example

> Its trivial to set up and operate kvass across multiple devices

> remember the file we stored earlier? Let's get a shareable url for it!


I read it. Im pretty clear on what it does. Im still not feeling the why (or the differentiator from other things that store files and give you URLs).

Remember when Dropbox explained itself by telling you you didnt need to carry around USB sticks in your jean pockets that get washed or lost? I thought that was pretty neat.


> Its trivial to set up and operate kvass across multiple devices

Still, using a distributed file system is so much better, as its API is supported by basically everything else (including Dropbox!).

I feel that a key-value store goes against the Unix philosophy and is solving an imaginary problem.


A distributed file system seems like way more work to set up.

Also not everything has to follow the Unix philosophy. Plenty of very useful things are better off less Unix-y eg ffmpeg. But this doesn’t seem to do a bad job - it’s a very dedicated tool to do one thing, it just doesn’t store everything as files.


Out of curiosity, what is the infamous Dropbox comment?


https://news.ycombinator.com/item?id=9224

The gist is the original yc debut of Dropbox had a comment that described a pretty technical way to get the same functionality as drop box. It's commonly referenced when folks on hackernews dismiss a product when they can do the same with 10 unix commands, not realizing they might not be the target customer. Interestingly I think this situation is the exact opposite, Kvass seems to be more complicated for a non technical user than file systems as the top level comment responded with.



What's wrong with Dropbox comment? I still didn't find any use for this service, but rsync works for me almost every day. IMO Dropbox is useless.


> What's wrong with Dropbox comment? ... IMO Dropbox is useless.

That's just repeating the original ignorant Dropbox comment. Over 15 million paying users don't think Dropbox is useless. And hundreds of millions of non-paying users don't either.


Many people do find it useful, and the people who created it have become very wealthy.

I'm in the same boat as you, but there are more kinds of people and situations in the world than just us.


yc and Dropbox realised that people would like to pay for it. in the same logic, a toaster is useless, I can always just heat bread in a pan no?


a browser is useless, you can always send a request through curl and read the html.


Yeah but is your filesystem endorsed by a fun & quirky children's cartoon beaver? Can it do QR codes? Didn't think so.


To your point, the very first examples don’t really demonstrate much value, even if they are the most basic examples of how it works.

It’s a bit like selling a car by showing all the different things you can hold in the cup holders.


There are literally hundreds of distributed networked KV stores used by software developers for all sorts of projects. Showing how to store “hello world” seems like a pretty good intro.

Why can’t people see a use case for this? It maybe doesn’t compare as unique against the hundred other KV stores but it’s also a toy project and a KV store seems to have an obvious use?

Personal, I’m going to try this out since I was actually looking for a similar KV store. Only because I was looking and HN presented it to me tbh.

My use case is that I have a few Raspberry Pis at home (aka low powered) that I wanted to have a distributed config on. I wanted something easy to manipulate with a command line that was lightweight (eg not redis or consul or a password manager). Since it’s for LAN use (or actual Tailscale) the security wasn’t really important.


Heh. Snarky but true. I store just about everything in a “notes” folder which is mostly markdown files. Easily searchable / editable with any tool you like.


This is spot on. Filesystems are more powerful, fast and scalable than people think.


What's the use case for this (besides being a nice learning project)?

I didn't see this on the readme.


I hope this feeling is me catching onto the joke in the name rather than being a first responder


The name can be read as an acronym of ‘Key-Value ASSociative store’, but also alludes to the beverage: https://en.wikipedia.org/wiki/Kvass


Picture in README.MD really tells you that author is aware of kvass (the drink). This repo actually made me google up that wiki page to get an answer: "Is this drink really called kvass elsewhere, not only in my country?". Yes it does it seems.


the original, personal key value store


Somewhat unrelated: Can one buy kvass starter in the United States, and if so, what is it called?

I'm not interested in bottled kvass, it never tastes like the real thing and you don't get to watch kvass explosions in the bottle as it is being made


Kvass starter? I typically just use yeast, old rye bread, raisins, and sugar. Maybe added lemon for taste at the end.

This recipe is similar to how I make mine (in Russian): https://www.gastronom.ru/recipe/55100/domashnij-kvas-iz-hleb...


TIL.. When I saw kvass being made I was too young to know how it works.


You don't need a starter? You can just make one trivially.

There's a pretty amusing "Life of Boris" video that shows how on YT.


how about:

  echo "value" > ${home}/.db/key
  cat ${home}/.db/key > value
  scp -r ...


[flagged]


Isn't that the whole point of "Show HN"? Or what do you expect when you click on a "Show HN" story?


I expected to learn something from it - especially when it's popped on the front page of HN. Am I expecting too much of HN?

My train of thought when I saw the link "a key-value store": what data-structure are they using? A hashmap? How are they resolving conflicts? Is it in memory? How are they persisting data? Do they support multiple instances? What about concurrency? etc.

Of course I might be a bit disappointed when the project is just 4 web APIs on top of sqlite table.


Take a look at https://news.ycombinator.com/showhn.html

If it makes no sense, ask a question or ignore it if it's not interesting. But you can't just one-liner shit on other people's Show HN, whatever you expect.


It's not HN job to entertain you. Someone wrote some software they thought was useful, so they are sharing it. If you don't like it down vote and move on. Get off your high horse.


That's a bit harsh, while I don't agree with OP I don't think that your language helps in promoting a healthy discussion about what is reasonable to expect from a HN front-page post.

Also you say that "it's not HN job to entertain you". What is HN's job then? Because honestly I do come here to read things that entertain me.


To be news. I can understand if you missed that, it's not very obvious from visiting the home page.


That is very much not the purpose of HN, it's in

https://news.ycombinator.com/newsguidelines.html


not sure what you were trying to prove with that, but it seems you proved yourself wrong. The link says "news", not "entertainment"


It doesn't say 'news' and the fact that HN is not about 'news' is in the first sentence of the guidelines beside being repeated in moderator comments endlessly for more than a decade. If you thought HN was about news because it says 'news' in the URL, you're mistaken.


> Type of site

> News aggregator

https://wikipedia.org/wiki/Hacker_News


That's interesting but you're better off going by the actual guidelines of the actual site and the many times they've been restated and commented upon by the actual moderators. Many relevant ones here

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

or here

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...


sorry but you don't get to redefine the word "news". The site is literally called "hacker news". so by definition, its a news site. granted, its news specific to computing and technology, but news just the same. unless everyone in the world got together behind my back, and changed the entire english language, then its a news site.


It can be a perplexing site, sort of like meeting a kitten named Tiger and yet living to tell the tale.


Fair enough. I did share some of those expectations too tbh and it was indeed surprising that the solution was that simple. But to be fair it does not promise any of that and it does what it sets out to.


Wait, this is just a toy project.


What do you mean? It's a project. It has a purpose and it achieves that purpose. If you don't need a lot of code to achieve it, what's the problem? What makes it "toy"?


Not trolling or trying to downplay anybody here, but honestly - how “kvass” (readed as “k-v-ass” given it is a “key-value” storage) is a good name?..


About as good a name as the password manager which I'm unable to read in any other way than keep ass.

(Just in case you are unaware, kvas/kvass is a traditional north-eastern europe drink.)


> the password manager which I'm unable to read in any other way than keep ass

I've never read it this way but now I can't unsee it.


Kvas means yeast, could be also a drink name.


Since this is on top of the readme they refer to the drink: https://user-images.githubusercontent.com/5411096/179968508-...

You could call yeast kvas, but in slavic languages there are usually other nouns used. (drozdze, drozdie, kvasok, kvasnice, закваска , дрожжи). Kvas (the drink) is kvas everywhere.



In Norwegian kvass means sharp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: