> Not sure I actually understand why K is pushed here almost daily. Is it just the esoterics? Alright, APL and K are expressive for array computation. Do many people here deal with array computation?
People like k for different reasons. For me, it's the right blend of fast to develop and runs fast enough. Someone recently made a database engine called okon which is designed to process HIBP and looking at their github it took 20+ days of C++ code to do. And they managed to get under 49µsec so they pat themselves on the back for it being so good.
But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.
So for me, the decision is, 5 minutes, or 20 days, and that's a no-brainer.
> How does this free K from loading and writing data?
Your CPU has a shockingly small amount of directly-attached memory (around 64kb). That's been true since the 1970s, and is likely to be true for another fifty years.
This used to be called ram, but now we call anything directly addressable by the CPU as ram, and we call this thing cache (or L1 or whatever) because that's basically how most people use it.
Now what your CPU does all day, is look at a memory address, and if that's not in the cache, it sends a message to a memory controller and waits for that memory to show up (somewhat a simplification). Those instructions show up, now your CPU can execute them, and those instructions require more memory so again the CPU sends a message to the memory controller and waits for that memory to show up. All day long, alternating between these two wait states, we wait a bit for code, we wait a bit for data, and then we repeat.
Because K is small: Because the interpreter, the application source code, and all your state variables fit into that 64kb, you can see that memory waiting drops by half.
> Is K in fact a thin layer on top of assembler (specifically SIMD and other extensions), for number crunching in the vein of shaders/CUDA?
No. It's a very simple interpreted language that (in most implementations) dispatch directly on the source code (or in some cases, on the AST).
> why those techniques aren't looted for open-source languages.
If you really understand the above, the answer to this question is obvious (hint: try to think about values, not features). Until then; until you really understand the question you're asking, you're not going to find the answer very useful.
> But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.
Out of curiosity I did the same in python (with numpy) simply memmap-ing the sorted binary hash list and doing a binary search (again using numpy) on that.
I got around 40us lookup time using that solution.
Not too bad for about 5 minutes of coding.
Also tried using leveldb as datastore and got around 6us when using that (and solution is also only couple lines of code).
The leveldb solution is quite straightforward (running in ipython3 for easy benchmarking with timeit):
import leveldb,hashlib
db = leveldb.LevelDB('db')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
try: # the leveldb api is a bit minimalist..
db.Get(m.digest())
return True
except:
return False
%timeit lookup(b'password')
Creating the database was just calling db.Put(x) for every entry.
The numpy solution is also quite easy (db3 is essentially your ..|cut|xxd .. file I think:
import numpy as np
db = np.memmap('db3', dtype=np.dtype('S20'), mode='r')
def lookup(pw):
m = hashlib.sha1()
m.update(pw)
idx = np.searchsorted(db, m.digest())
return db[idx] == m.digest()
%timeit lookup(b'password')
And I was running this on a system where not even db3 (which is just the binary hashes concatenated) would fit into my RAM, so for "real world" applications the results would be much worse since it's more limited by the disk IO than what's running on top. I just reran both and got 3us for the leveldb solution and 13us for the binary search/numpy solution.
People like k for different reasons. For me, it's the right blend of fast to develop and runs fast enough. Someone recently made a database engine called okon which is designed to process HIBP and looking at their github it took 20+ days of C++ code to do. And they managed to get under 49µsec so they pat themselves on the back for it being so good.
But my k(q) solution was written in five minutes and runs in under 5µsec -- almost 10x faster. I used binary search instead of b-tree (a decision the okon author considered but eventually went the b-tree route instead), but it's otherwise the same basic strategy.
So for me, the decision is, 5 minutes, or 20 days, and that's a no-brainer.
> How does this free K from loading and writing data?
Your CPU has a shockingly small amount of directly-attached memory (around 64kb). That's been true since the 1970s, and is likely to be true for another fifty years.
This used to be called ram, but now we call anything directly addressable by the CPU as ram, and we call this thing cache (or L1 or whatever) because that's basically how most people use it.
Now what your CPU does all day, is look at a memory address, and if that's not in the cache, it sends a message to a memory controller and waits for that memory to show up (somewhat a simplification). Those instructions show up, now your CPU can execute them, and those instructions require more memory so again the CPU sends a message to the memory controller and waits for that memory to show up. All day long, alternating between these two wait states, we wait a bit for code, we wait a bit for data, and then we repeat.
Because K is small: Because the interpreter, the application source code, and all your state variables fit into that 64kb, you can see that memory waiting drops by half.
> Is K in fact a thin layer on top of assembler (specifically SIMD and other extensions), for number crunching in the vein of shaders/CUDA?
No. It's a very simple interpreted language that (in most implementations) dispatch directly on the source code (or in some cases, on the AST).
> why those techniques aren't looted for open-source languages.
If you really understand the above, the answer to this question is obvious (hint: try to think about values, not features). Until then; until you really understand the question you're asking, you're not going to find the answer very useful.