> Not sure I actually understand why K is pushed here almost daily. Is it just t...

welterde · on March 13, 2020

> But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.

Out of curiosity I did the same in python (with numpy) simply memmap-ing the sorted binary hash list and doing a binary search (again using numpy) on that. I got around 40us lookup time using that solution. Not too bad for about 5 minutes of coding. Also tried using leveldb as datastore and got around 6us when using that (and solution is also only couple lines of code).

geocar · on March 14, 2020

I’d be interested in seeing your five minute python solution and your benchmarks.

welterde · on March 14, 2020

The leveldb solution is quite straightforward (running in ipython3 for easy benchmarking with timeit):

  import leveldb,hashlib
  db = leveldb.LevelDB('db')
  def lookup(pw):
    m = hashlib.sha1()
    m.update(pw)
    try: # the leveldb api is a bit minimalist.. 
      db.Get(m.digest())
      return True
    except:
      return False
  %timeit lookup(b'password')

Creating the database was just calling db.Put(x) for every entry.

The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:

  import numpy as np
  db = np.memmap('db3', dtype=np.dtype('S20'), mode='r')
  def lookup(pw):
    m = hashlib.sha1()
    m.update(pw)
    idx = np.searchsorted(db, m.digest())
    return db[idx] == m.digest()
  %timeit lookup(b'password')

And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.

geocar · on March 14, 2020

Cool. What cpu/ram did you make your timings on?

I think the author of the okon benchmarks was using a i7-6700 CPU @ 3.40GHz, 16GB RAM

How long did the import take you?