In the final solution at the end of the article there are only two pipes: 1. A p... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		dice on Jan 18, 2015 \| parent \| context \| favorite \| on: Command-line tools can be faster than your Hadoop ... In the final solution at the end of the article there are only two pipes: 1. A pipe to feed the file names into xargs for starting up parallel `mawk` processes. 2. A pipe to a final `mawk` process which aggregates the data from the parallel processes. There's still some performance that could be gained by using a single processes with threads and shared memory, but this is pretty good for something that can be whipped together quickly.

zobzu on Jan 19, 2015 [–]

Yeah its not bad. In the final command, it is basically leveraging mawk for everything which works out well since there's fewer pipes.

But in this case its about replacing hadoop with mawk basically. Which is indeed a good point as well - and incidentally also confirms my own comment =)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact