Hacker News new | past | comments | ask | show | jobs | submit | ole_tange's comments login

Examples simulated with GNU Parallel:

    simple_gpu_scheduler --gpus 0 1 2 < gpu_commands.txt
    parallel -j3 --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' < gpu_commands.txt

    simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
    parallel --header : --shuf -j3 -v CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128

    simple_hypersearch "python3 train_dnn.py --lr {lr} --batch_size {bs}" --n-samples 5 -p lr 0.001 0.0005 0.0001 -p bs 32 64 128 | simple_gpu_scheduler --gpus 0,1,2
    parallel --header : --shuf CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1; seq() > 5 and skip() =}' python3 train_dnn.py --lr {lr} --batch_size {bs} ::: lr 0.001 0.0005 0.0001 ::: bs 32 64 128

    touch gpu.queue
    tail -f -n 0 gpu.queue | simple_gpu_scheduler --gpus 0,1,2 &
    echo "my_command_with | and stuff > logfile" >> gpu.queue

    touch gpu.queue
    tail -f -n 0 gpu.queue | parallel -j3 CUDA_VISIBLE_DEVICES='{=1 $_=slot()-1 =} {=uq;=}' &
    # Needed to fill job slots once
    seq 3 | parallel echo true >> gpu.queue
    # Add jobs
    echo "my_command_with | and stuff > logfile" >> gpu.queue
    # Needed to flush output from completed jobs 
    seq 3 | parallel echo true >> gpu.queue


Fun fact: GNU Parallel used to be a wrapper for making Makefiles: https://www.gnu.org/software/parallel/history.html

Also see this if you have a system without GNU Parallel: http://oletange.blogspot.dk/2013/04/why-not-install-gnu-para...


--pipe is well know for being slow.

Try --pipe-part instead:

    parallel -a ngrams.tsv --pipe-part --block -1 awk -f map.awk |
      awk -f reduce.awk


Thanks for the tip, always nice to see the author of tools commenting on hn :-)

The (old) version of parallel packaged with Ubuntu 16.04 (linux subsystem for windows) - doesn't have --pipe-part -- but running from upstream, the speed is more reasonable:

  $ time (./parallel-20170522/src/parallel -a ngrams.tsv \
    --pipe-part --block -1 -j4 mawk -f map.awk \
    | mawk -f reduce.awk )
  max_key: 2006 sum: 22569013

  real    0m2.265s
  user    0m4.672s
  sys     0m1.672s
(Tried a few variants with/without -jN -- and this seems typical for the fast end of the spectrum).

  $ time (cat ngrams.tsv \
     | mawk -f map.awk \
     | mawk -f reduce.awk )
  max_key: 2006 sum: 22569013

  real    0m3.472s
  user    0m2.891s
  sys     0m2.406s
[ed: btw, did a double-take when I saw your Gnu Privacy Guard id: 0x88888888 :-) ]


--line-buffer may or may not give additional speed up.


> Granted, it's always possible that the parallel on $PATH might not be the one you're thinking of, but... Whatevs.

For exactly this reason, GNU Parallel has the option --minversion.

So in your script you put:

parallel --minversion 20140722 || exit

if the rest of your script depends on functionality only present from version 20140722.

If the parallel in the $PATH is not GNU Parallel it will fail (and thus exit). If it _is_ GNU Parallel it will fail if the version is < 20140722 and succeed otherwise.


> Same syntax applies.

Except that is not entirely true: For instance according to the author the '{= perl expression =}' will probably never be supported.

(Full disclosure: I am the author of GNU Parallel. I fully support building other parallelizing tools, but to avoid user confusion, I would recommend calling them something other than 'parallel' if they are not actually compatible with GNU Parallel).


Rust-parallel _is_ fast, and there is clearly a niche here, that GNU Parallel is unlikely to fill: By design GNU Parallel will never require a compiler; this is so you can use GNU Parallel on old systems with no compilers (Think an old, dusty AIX-box that people have forgotten the root password to). This design decision limits how fast GNU Parallel can be compared to compiled alternatives.

But the main problem with rust-parallel is that it is not compatible with GNU Parallel (and according to the author, it probably never will be 100% compatible). If you use rust-parallel to walk through GNU Parallel's tutorial (man parallel_tutorial) you will see it fails very quickly.

(Full disclosure: I am the author of GNU Parallel. I fully support building other parallelizing tools, but to avoid user confusion, I would recommend calling them something other than 'parallel' if they are not actually compatible with GNU Parallel. History has shown that using the same name will lead to a lot of unnecessary grief: e.g. GNU Parallel vs. Parallel from moreutils).


The page with differences to other tools will be moved to 'man parallel_aternatives' in next version. It made sense when the section was short, but it has grown so big that it can make it harder to navigate.

Thanks for input.

(The trick to find the examples: LESS=+/EXAMPLE: man parallel )


Except for GNU Parallel --pipe. When you use '<' what you typically mean is '--pipepart -a file'.


Sure do. Makes you feel that the future is now!


Did you start by reading the "Reader's guide" which is presented before the first option is introduced in the man page?


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: