Thanks for the tip, always nice to see the author of tools commenting on hn :-)
The (old) version of parallel packaged with Ubuntu 16.04 (linux subsystem for windows) - doesn't have --pipe-part -- but running from upstream, the speed is more reasonable:
$ time (./parallel-20170522/src/parallel -a ngrams.tsv \
--pipe-part --block -1 -j4 mawk -f map.awk \
| mawk -f reduce.awk )
max_key: 2006 sum: 22569013
real 0m2.265s
user 0m4.672s
sys 0m1.672s
(Tried a few variants with/without -jN -- and this seems typical for the fast end of the spectrum).
$ time (cat ngrams.tsv \
| mawk -f map.awk \
| mawk -f reduce.awk )
max_key: 2006 sum: 22569013
real 0m3.472s
user 0m2.891s
sys 0m2.406s
[ed: btw, did a double-take when I saw your Gnu Privacy Guard id: 0x88888888 :-) ]
> Granted, it's always possible that the parallel on $PATH might not be the one you're thinking of, but... Whatevs.
For exactly this reason, GNU Parallel has the option --minversion.
So in your script you put:
parallel --minversion 20140722 || exit
if the rest of your script depends on functionality only present from version 20140722.
If the parallel in the $PATH is not GNU Parallel it will fail (and thus exit). If it _is_ GNU Parallel it will fail if the version is < 20140722 and succeed otherwise.
Except that is not entirely true: For instance according to the author the '{= perl expression =}' will probably never be supported.
(Full disclosure: I am the author of GNU Parallel. I fully support building other parallelizing tools, but to avoid user confusion, I would recommend calling them something other than 'parallel' if they are not actually compatible with GNU Parallel).
Rust-parallel _is_ fast, and there is clearly a niche here, that GNU Parallel is unlikely to fill: By design GNU Parallel will never require a compiler; this is so you can use GNU Parallel on old systems with no compilers (Think an old, dusty AIX-box that people have forgotten the root password to). This design decision limits how fast GNU Parallel can be compared to compiled alternatives.
But the main problem with rust-parallel is that it is not compatible with GNU Parallel (and according to the author, it probably never will be 100% compatible). If you use rust-parallel to walk through GNU Parallel's tutorial (man parallel_tutorial) you will see it fails very quickly.
(Full disclosure: I am the author of GNU Parallel. I fully support building other parallelizing tools, but to avoid user confusion, I would recommend calling them something other than 'parallel' if they are not actually compatible with GNU Parallel. History has shown that using the same name will lead to a lot of unnecessary grief: e.g. GNU Parallel vs. Parallel from moreutils).
The page with differences to other tools will be moved to 'man parallel_aternatives' in next version. It made sense when the section was short, but it has grown so big that it can make it harder to navigate.
Thanks for input.
(The trick to find the examples: LESS=+/EXAMPLE: man parallel )