Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

MapReduce and MPI are very different. MapReduce doesn't use message passing- the map phase reads inputs from sharded files in a separate disk system, applies map to the inputs, and writes out the mapped outputs to temp sort files on a seperate disk system. Then the shuffler sorts those and writes the outputs to the appropriate destination output shards in a seperate disk system, at which point the reducer reads them, applies reduce, and writes the final outputs, sharded by key to a seperate disk system.

The mappers, shufflers, and reducers are all independent of each other, reading and writing from the filesystem, and managed by a coordinator. There's nothing like MPI, other than the use of the Stubby RPC system, which sort of resembles MPI but has completely different distributed communication semantics.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: