Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
GNU Parallel
1. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
LAB MEETING—TECHNICAL TALK
GNU PARALLEL
O. TANGE, “GNU PARALLEL - THE COMMAND-LINE
POWER TOOL”, ;login: The USENIX Magazine, VOL. 36, NO.
1, PP. 42–47, FEB. 2011
Coby Viner
Hoffman Lab
Wednesday, April 13, 2016
2. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
OVERVIEW
WHY USE GNU PARALLEL?
BASIC EXAMPLES FROM THE TUTORIAL
BASIC ELEMENTS OF SYNTAX [FROM THE TUTORIAL]
MUCH MORE SYNTAX FOR MANY OTHER TASKS
MORE TUTORIAL EXAMPLES
SOME EXAMPLES OF MY GNU PARALLEL USAGE
3. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
4. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
5. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
6. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
7. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
8. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
Improved, cleaner, syntax (for the programmer), even in
serial
9. LAB MEETING—
TECHNICAL
TALK
COBY VINER
USE CASES
BASIC EXAMPLES
BASIC SYNTAX
ADDITIONAL
SYNTAX
MORE EXAMPLES
REAL EXAMPLES
WHY USE GNU PARALLEL?
a shell tool for executing jobs in parallel using one
or more computers.
Easily parallelize perfectly parallel tasks
For each chromosome. . .
For each sex, for each technical replicate, for each
hyper-parameter(s)
Job submission scripts within a for loop
Improved, cleaner, syntax (for the programmer), even in
serial
Facile interleaving of tasks, in the order one is thinking
about them
10. A BASIC [MAN PAGE] EXAMPLE: “WORKING
AS XARGS -N1. ARGUMENT APPENDING”
find . -name '*.html' | parallel gzip --best
17. ANOTHER BASIC [MAN PAGE] EXAMPLE:
“INSERTING MULTIPLE ARGUMENTS”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
18. ANOTHER BASIC [MAN PAGE] EXAMPLE:
“INSERTING MULTIPLE ARGUMENTS”
bash: /bin/mv: Argument list too long
ls | grep -E '.log$' | parallel mv {} destdir
ls | grep -E '.log$' | parallel -m mv {} destdir
19. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
20. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Input:
parallel echo ::: A B C # command line
cat abc-file | parallel echo # from STDIN
parallel -a abc-file echo # from a file
Output [line order may vary]:
A
B
C
21. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
22. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple inputs.
Input:
parallel echo ::: A B C ::: D E F
cat abc-file | parallel -a - -a def-file echo
parallel -a abc-file -a def-file echo
cat abc-file | parallel echo :::: - def-file # alt. file
parallel echo ::: A B C :::: def-file # mix cmd. and file
Output [line order may vary]:
A D
A E
A F
B D
B E
B F
C D
C E
C F
23. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
24. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
25. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Matching input.
Input:
parallel --xapply echo ::: A B C ::: D E F
Output [line order may vary]:
A D
B E
C F
-xapply will wrap, if insufficient input is provided.
26. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
27. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Replacement strings: The 7 predefined replacement strings
Input:
parallel echo {} ::: A/B.C
parallel echo {.} ::: A/B.C
Output:
A/B.C
A/B
Rep. String Result
. remove ext.
/ remove path
// only path
/. only ext. and path
# job number
% job slot number
28. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Customizing replacement strings
-extensionreplace to change {.} etc.
Shorthand custom (PCRE+) replacement strings
GNU parallel’s 7 replacement strings:
--rpl '{} '
--rpl '{#} $_=$job->seq()'
--rpl '{%} $_=$job->slot()'
--rpl '{/} s:.*/::'
--rpl '{//} $Global::use{"File::Basename"}
||= eval "use File::Basename; 1;"; $_ = dirname($_);'
--rpl '{/.} s:.*/::; s:.[^/.]+$::;'
--rpl '{.} s:.[^/.]+$::'
29. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
30. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
Always try to define replacements, with {<>} syntax.
31. BASIC ELEMENTS OF SYNTAX [FROM THE
TUTORIAL]
Multiple input sources and positional replacement:
parallel echo {1} and {2} ::: A B ::: C D
Always try to define replacements, with {<>} syntax.
Test with --dry-run first.
32. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
33. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
34. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
35. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
36. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
37. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
As a counting semaphore: parallel --semaphore or sem
38. MUCH MORE SYNTAX FOR MANY OTHER TASKS
--pipe: instead of STDIN as command args, data sent to
STDIN of command
command_A | command_B | command_C, where
command_B is slow
Remote execution to directly parallelize over multiple
machines
Working directly with a SQL database
Shebang: often cat input_file | parallel command,
but can do #!/usr/bin/parallel --shebang -r echo
As a counting semaphore: parallel --semaphore or sem
Default is one slot: a mutex
39. ANOTHER [MAN PAGE] EXAMPLE:
“AGGREGATING CONTENT OF FILES”
parallel --header : echo x{X}y{Y}z{Z} >
x{X}y{Y}z{Z}
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
40. ANOTHER [MAN PAGE] EXAMPLE:
“AGGREGATING CONTENT OF FILES”
parallel --header : echo x{X}y{Y}z{Z} >
x{X}y{Y}z{Z}
::: X {1..5} ::: Y {01..10} ::: Z {1..5}
parallel eval 'cat {=s/y01/y*/=} >
{=s/y01//=}' ::: *y01*
This runs: cat x1y*z1 > x1z1, ∀x∀z
41. POST-MEME2IMAGES INKSCAPE
CONVERSIONS FOR PUBLICATION-READY
CENTRIMO PLOTS AND SEQUENCE LOGOS
parallel inkscape --vacuum-defs --export-pdf={.}.pdf {}
::: "$centrimo_eps_1" "$centrimo_eps_2"