Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hi! Thanks for reading my suggested approach. I can understand how Map-reduce can be used to process the things faster (by processing them in parallel and later using reduce to aggregate the results??)

  But can't imagine how reduce (of map-reduce) can help in dropping of word-pairs (i.e. in Step 3)


I think it's an imperfect solution, though it might still be solid. It depends on your reduction strategy (basically breaking associativity). If you do a fold-left, then you'll accumulate such high relevance that you'll quickly start discarding every word in the right-hand data set. On the other hand, if you binary-split the folding process you're more likely to be OK.

I used a binary split and due to the sparseness of the data (and the uninsightfulness of my algorithm) didn't run into too many spurious drop cases. But the implementation I posted is very basic and shows my lack of background in NLP-related pursuits -- I'm lucky it did anything useful at all :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: