The leveldb solution is quite straightforward (running in ipython3 for easy benchmarking with timeit):
import leveldb,hashlib
db = leveldb.LevelDB('db')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
try: # the leveldb api is a bit minimalist..
db.Get(m.digest())
return True
except:
return False
%timeit lookup(b'password')
Creating the database was just calling db.Put(x) for every entry.
The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:
import numpy as np
db = np.memmap('db3', dtype=np.dtype('S20'), mode='r')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
idx = np.searchsorted(db, m.digest())
return db[idx] == m.digest()
%timeit lookup(b'password')
And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.
The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:
And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.