Hi! Thanks for reading my suggested approach. I can understand how Map-reduce ca...

spencertipping · on Nov 8, 2010

I think it's an imperfect solution, though it might still be solid. It depends on your reduction strategy (basically breaking associativity). If you do a fold-left, then you'll accumulate such high relevance that you'll quickly start discarding every word in the right-hand data set. On the other hand, if you binary-split the folding process you're more likely to be OK.

I used a binary split and due to the sparseness of the data (and the uninsightfulness of my algorithm) didn't run into too many spurious drop cases. But the implementation I posted is very basic and shows my lack of background in NLP-related pursuits -- I'm lucky it did anything useful at all :)