<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>GNU Parallel book</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>
<body>
<ul id="index">
<li><a href="#Why-should-you-read-this-book">Why should you read this book?</a></li>
<li><a href="#Learn-GNU-Parallel-in-5-minutes">Learn GNU Parallel in 5 minutes</a>
<ul>
<li><a href="#Input-sources">Input sources</a></li>
<li><a href="#Building-the-command-line">Building the command line</a></li>
<li><a href="#Controlling-the-output">Controlling the output</a></li>
<li><a href="#Controlling-the-execution">Controlling the execution</a></li>
<li><a href="#Pipe-mode">Pipe mode</a></li>
<li><a href="#Thats-it">That's it</a></li>
</ul>
</li>
<li><a href="#Learn-GNU-Parallel-in-an-hour">Learn GNU Parallel in an hour</a>
<ul>
<li><a href="#Input-sources1">Input sources</a>
<ul>
<li><a href="#Linking-input-sources">Linking input sources</a></li>
</ul>
</li>
<li><a href="#Building-the-command-line1">Building the command line</a>
<ul>
<li><a href="#The-command">The command</a></li>
<li><a href="#The-replacement-strings">The replacement strings</a>
<ul>
<li><a href="#The-positional-replacement-strings">The positional replacement strings</a></li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
<li><a href="#Replacement-strings">Replacement strings</a>
<ul>
<li><a href="#Defining-replacement-strings">Defining replacement strings</a></li>
<li><a href="#Copying-environment">Copying environment</a></li>
<li><a href="#Controlling-the-output1">Controlling the output</a>
<ul>
<li><a href="#parset">parset</a></li>
</ul>
</li>
<li><a href="#Controlling-the-execution1">Controlling the execution</a></li>
<li><a href="#Remote-execution">Remote execution</a>
<ul>
<li><a href="#Workers">Workers</a></li>
<li><a href="#transferfile">--transferfile</a></li>
<li><a href="#return">--return</a></li>
<li><a href="#cleanup">--cleanup</a></li>
</ul>
</li>
<li><a href="#Pipe-mode1">Pipe mode</a></li>
<li><a href="#Thats-it1">That's it</a></li>
</ul>
</li>
<li><a href="#Advanced-usage">Advanced usage</a>
<ul>
<li><a href="#Controlling-the-execution2">Controlling the execution</a></li>
<li><a href="#Remote-execution1">Remote execution</a></li>
</ul>
</li>
</ul>
<h1 id="Why-should-you-read-this-book">Why should you read this book?</h1>
<p>If you write shell scripts to do the same processing for different input, then GNU <b>parallel</b> will make your life easier and make your scripts run faster.</p>
<p>The book is written so you get the juicy parts first: The goal is that you read just enough to get you going. GNU <b>parallel</b> has an overwhelming amount of special features to help in different situations, and to avoid overloading you with information, the most used features are presented first.</p>
<p>All the examples are tested in Bash, and most will work in other shells, too, but there are a few exceptions. So you are recommended to use Bash while testing out the examples.</p>
<h1 id="Learn-GNU-Parallel-in-5-minutes">Learn GNU Parallel in 5 minutes</h1>
<p>You just need to run commands in parallel. You do not care about fine tuning.</p>
<p>To get going please run this to make some example files:</p>
<pre><code># If your system does not have 'seq', replace 'seq' with 'jot'
seq 5 | parallel seq {} '>' example.{}</code></pre>
<h2 id="Input-sources">Input sources</h2>
<p>GNU <b>parallel</b> reads values from input sources. One input source is the command line. The values are put after <b>:::</b> :</p>
<pre><code>parallel echo ::: 1 2 3 4 5</code></pre>
<p>This makes it easy to run the same program on some files:</p>
<pre><code>parallel wc ::: example.*</code></pre>
<p>If you give multiple <b>:::</b>s, GNU <b>parallel</b> will generate all combinations:</p>
<pre><code>parallel wc ::: -l -c ::: example.*</code></pre>
<p>GNU <b>parallel</b> can also read the values from stdin (standard input):</p>
<pre><code>seq 5 | parallel echo</code></pre>
<h2 id="Building-the-command-line">Building the command line</h2>
<p>The command line is put before the <b>:::</b>. It can contain contain a command and options for the command:</p>
<pre><code>parallel wc -l ::: example.*</code></pre>
<p>The command can contain multiple programs. Just remember to quote characters that are interpreted by the shell (such as <b>;</b>):</p>
<pre><code>parallel echo counting lines';' wc -l ::: example.*</code></pre>
<p>The value will normally be appended to the command, but can be placed anywhere by using the replacement string <b>{}</b>:</p>
<pre><code>parallel echo counting {}';' wc -l {} ::: example.*</code></pre>
<p>When using multiple input sources you use the positional replacement strings <b>{1}</b> and <b>{2}</b>:</p>
<pre><code>parallel echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*</code></pre>
<p>You can check what will be run with <b>--dry-run</b>:</p>
<pre><code>parallel --dry-run echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*</code></pre>
<p>This is a good idea to do for every command until you are comfortable with GNU <b>parallel</b>.</p>
<h2 id="Controlling-the-output">Controlling the output</h2>
<p>The output will be printed as soon as the command completes. This means the output may come in a different order than the input:</p>
<pre><code>parallel sleep {}';' echo {} done ::: 5 4 3 2 1</code></pre>
<p>You can force GNU <b>parallel</b> to print in the order of the values with <b>--keep-order</b>/<b>-k</b>. This will still run the commands in parallel. The output of the later jobs will be delayed, until the earlier jobs are printed:</p>
<pre><code>parallel -k sleep {}';' echo {} done ::: 5 4 3 2 1</code></pre>
<h2 id="Controlling-the-execution">Controlling the execution</h2>
<p>If your jobs are compute intensive, you will most likely run one job for each core in the system. This is the default for GNU <b>parallel</b>.</p>
<p>But sometimes you want more jobs running. You control the number of job slots with <b>-j</b>. Give <b>-j</b> the number of jobs to run in parallel:</p>
<pre><code>parallel -j50 \
wget https://ftpmirror.gnu.org/parallel/parallel-{1}{2}22.tar.bz2 \
::: 2012 2013 2014 2015 2016 \
::: 01 02 03 04 05 06 07 08 09 10 11 12</code></pre>
<h2 id="Pipe-mode">Pipe mode</h2>
<p>GNU <b>parallel</b> can also pass blocks of data to commands on stdin (standard input):</p>
<pre><code>seq 1000000 | parallel --pipe wc</code></pre>
<p>This can be used to process big text files. By default GNU <b>parallel</b> splits on \n (newline) and passes a block of around 1 MB to each job.</p>
<h2 id="Thats-it">That's it</h2>
<p>You have now learned the basic use of GNU <b>parallel</b>. This will probably cover most cases of your use of GNU <b>parallel</b>.</p>
<p>The rest of this document will go into more details on each of the sections and cover special use cases.</p>
<h1 id="Learn-GNU-Parallel-in-an-hour">Learn GNU Parallel in an hour</h1>
<p>In this part we will dive deeper into what you learned in the first 5 minutes.</p>
<p>To get going please run this to make some example files:</p>
<pre><code>seq 6 > seq6
seq 6 -1 1 > seq-6</code></pre>
<h2 id="Input-sources1">Input sources</h2>
<p>On top of the command line, input sources can also be stdin (standard input or '-'), files and fifos and they can be mixed. Files are given after <b>-a</b> or <b>::::</b>. So these all do the same:</p>
<pre><code>parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1
parallel echo Dice1={1} Dice2={2} :::: <(seq 6) :::: <(seq 6 -1 1)
parallel echo Dice1={1} Dice2={2} :::: seq6 seq-6
parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq-6
parallel -a seq6 -a seq-6 echo Dice1={1} Dice2={2}
parallel -a seq6 echo Dice1={1} Dice2={2} :::: seq-6
parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq-6
cat seq-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 -</code></pre>
<p>If stdin (standard input) is the only input source, you do not need the '-':</p>
<pre><code>cat seq6 | parallel echo Dice1={1}</code></pre>
<h3 id="Linking-input-sources">Linking input sources</h3>
<p>You can link multiple input sources with <b>:::+</b> and <b>::::+</b>:</p>
<pre><code>parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6
parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6</code></pre>
<p>The <b>:::+</b> (and <b>::::+</b>) will link each value to the corresponding value in the previous input source, so value number 3 from the first input source will be linked to value number 3 from the second input source.</p>
<p>You can combine <b>:::+</b> and <b>:::</b>, so you link 2 input sources, but generate all combinations with other input sources:</p>
<pre><code>parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \
::: VI V IV III II I ::::+ seq-6</code></pre>
<h2 id="Building-the-command-line1">Building the command line</h2>
<h3 id="The-command">The command</h3>
<p>The command can be a script, a binary or a Bash function if the function is exported using <b>export -f</b>:</p>
<pre><code># Works only in Bash
my_func() {
echo in my_func "$1"
}
export -f my_func
parallel my_func ::: 1 2 3</code></pre>
<p>If the command is complex, it often improves readability to make it into a function.</p>
<h3 id="The-replacement-strings">The replacement strings</h3>
<p>GNU <b>parallel</b> has some replacement strings to make it easier to refer to the input read from the input sources.</p>
<p>If the input is <b>mydir/mysubdir/myfile.myext</b> then:</p>
<pre><code>{} = mydir/mysubdir/myfile.myext
{.} = mydir/mysubdir/myfile
{/} = myfile.myext
{//} = mydir/mysubdir
{/.} = myfile
{#} = the sequence number of the job
{%} = the job slot number</code></pre>
<p>When a job is started it gets a sequence number that starts at 1 and increases by 1 for each new job. The job also gets assigned a slot number. This number is from 1 to the number of jobs running in parallel. It is unique between the running jobs, but is re-used as soon as a job finishes.</p>
<h4 id="The-positional-replacement-strings">The positional replacement strings</h4>
<p>The replacement strings have corresponding positional replacement strings. If the value from the 3rd input source is <b>mydir/mysubdir/myfile.myext</b>:</p>
<pre><code>{3} = mydir/mysubdir/myfile.myext
{3.} = mydir/mysubdir/myfile
{3/} = myfile.myext
{3//} = mydir/mysubdir
{3/.} = myfile</code></pre>
<p>So the number of the input source is simply prepended inside the {}'s.</p>
<h1 id="Replacement-strings">Replacement strings</h1>
<p>--plus replacement strings</p>
<p>change the replacement string (-I --extensionreplace --basenamereplace --basenamereplace --dirnamereplace --basenameextensionreplace --seqreplace --slotreplace</p>
<p>--header with named replacement string</p>
<p>{= =}</p>
<p>Dynamic replacement strings</p>
<h2 id="Defining-replacement-strings">Defining replacement strings</h2>
<h2 id="Copying-environment">Copying environment</h2>
<p>env_parallel</p>
<h2 id="Controlling-the-output1">Controlling the output</h2>
<h3 id="parset">parset</h3>
<p><b>parset</b> is a shell function to get the output from GNU <b>parallel</b> into shell variables.</p>
<p><b>parset</b> is fully supported for <b>Bash/Zsh/Ksh</b> and partially supported for <b>ash/dash</b>. I will assume you run <b>Bash</b>.</p>
<p>To activate <b>parset</b> you have to run:</p>
<pre><code>. `which env_parallel.bash`</code></pre>
<p>(replace <b>bash</b> with your shell's name).</p>
<p>Then you can run:</p>
<pre><code>parset a,b,c seq ::: 4 5 6
echo "$c"</code></pre>
<p>or:</p>
<pre><code>parset 'a b c' seq ::: 4 5 6
echo "$c"</code></pre>
<p>If you give a single variable, this will become an array:</p>
<pre><code>parset arr seq ::: 4 5 6
echo "${arr[1]}"</code></pre>
<p><b>parset</b> has one limitation: If it reads from a pipe, the output will be lost.</p>
<pre><code>echo This will not work | parset myarr echo
echo Nothing: "${myarr[*]}"</code></pre>
<p>Instead you can do this:</p>
<pre><code>echo This will work > tempfile
parset myarr echo < tempfile
echo ${myarr[*]}</code></pre>
<p>sql cvs</p>
<h2 id="Controlling-the-execution1">Controlling the execution</h2>
<p>--dryrun -v</p>
<h2 id="Remote-execution">Remote execution</h2>
<p>For this section you must have <b>ssh</b> access with no password to 2 servers: <b>$server1</b> and <b>$server2</b>.</p>
<pre><code>server1=server.example.com
server2=server2.example.net</code></pre>
<p>So you must be able to do this:</p>
<pre><code>ssh $server1 echo works
ssh $server2 echo works</code></pre>
<p>It can be setup by running 'ssh-keygen -t dsa; ssh-copy-id $server1' and using an empty passphrase. Or you can use <b>ssh-agent</b>.</p>
<h3 id="Workers">Workers</h3>
<h3 id="transferfile">--transferfile</h3>
<p><b>--transferfile</b> <i>filename</i> will transfer <i>filename</i> to the worker. <i>filename</i> can contain a replacement string:</p>
<pre><code>parallel -S $server1,$server2 --transferfile {} wc ::: example.*
parallel -S $server1,$server2 --transferfile {2} \
echo count {1} in {2}';' wc {1} {2} ::: -l -c ::: example.*</code></pre>
<p>A shorthand for <b>--transferfile {}</b> is <b>--transfer</b>.</p>
<h3 id="return">--return</h3>
<h3 id="cleanup">--cleanup</h3>
<p>A shorthand for <b>--transfer --return {} --cleanup</b> is <b>--trc {}</b>.</p>
<h2 id="Pipe-mode1">Pipe mode</h2>
<p>--pipepart</p>
<h2 id="Thats-it1">That's it</h2>
<h1 id="Advanced-usage">Advanced usage</h1>
<p>parset fifo, cmd substitution, arrayelements, array with var names and cmds, env_parset</p>
<p>env_parallel</p>
<p>Interfacing with R.</p>
<p>Interfacing with JSON/jq</p>
<p>4dl() { board="$(printf -- '%s' "${1}" | cut -d '/' -f4)" thread="$(printf -- '%s' "${1}" | cut -d '/' -f6)" wget -qO- "https://a.4cdn.org/${board}/thread/${thread}.json" | jq -r ' .posts | map(select(.tim != null)) | map((.tim | tostring) + .ext) | map("https://i.4cdn.org/'"${board}"'/"+.)[] ' | parallel --gnu -j 0 wget -nv }</p>
<p>Interfacing with XML/?</p>
<p>Interfacing with HTML/?</p>
<h2 id="Controlling-the-execution2">Controlling the execution</h2>
<p>--termseq</p>
<h2 id="Remote-execution1">Remote execution</h2>
<p>seq 10 | parallel --sshlogin 'ssh -i "key.pem" a@b.com' echo</p>
<p>seq 10 | PARALLEL_SSH='ssh -i "key.pem"' parallel --sshlogin a@b.com echo</p>
<p>seq 10 | parallel --ssh 'ssh -i "key.pem"' --sshlogin a@b.com echo</p>
<p>ssh-agent</p>
<p>The sshlogin file format</p>
<p>Check if servers are up</p>
</body>
</html>
|