Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The leveldb solution is quite straightforward (running in ipython3 for easy benchmarking with timeit):

  import leveldb,hashlib
  db = leveldb.LevelDB('db')
  def lookup(pw):
    m = hashlib.sha1()
    m.update(pw)
    try: # the leveldb api is a bit minimalist.. 
      db.Get(m.digest())
      return True
    except:
      return False
  %timeit lookup(b'password')
Creating the database was just calling db.Put(x) for every entry.

The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:

  import numpy as np
  db = np.memmap('db3', dtype=np.dtype('S20'), mode='r')
  def lookup(pw):
    m = hashlib.sha1()
    m.update(pw)
    idx = np.searchsorted(db, m.digest())
    return db[idx] == m.digest()
  %timeit lookup(b'password')
And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.


Cool. What cpu/ram did you make your timings on?

I think the author of the okon benchmarks was using a i7-6700 CPU @ 3.40GHz, 16GB RAM

How long did the import take you?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: