HOME


sh-3ll 1.0
DIR:/usr/local/src/parallel-20231122/src/
Upload File :
Current File : //usr/local/src/parallel-20231122/src/parallel_book.html
<?xml version="1.0" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>GNU Parallel book</title>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:root@localhost" />
</head>

<body>



<ul id="index">
  <li><a href="#Why-should-you-read-this-book">Why should you read this book?</a></li>
  <li><a href="#Learn-GNU-Parallel-in-5-minutes">Learn GNU Parallel in 5 minutes</a>
    <ul>
      <li><a href="#Input-sources">Input sources</a></li>
      <li><a href="#Building-the-command-line">Building the command line</a></li>
      <li><a href="#Controlling-the-output">Controlling the output</a></li>
      <li><a href="#Controlling-the-execution">Controlling the execution</a></li>
      <li><a href="#Pipe-mode">Pipe mode</a></li>
      <li><a href="#Thats-it">That&#39;s it</a></li>
    </ul>
  </li>
  <li><a href="#Learn-GNU-Parallel-in-an-hour">Learn GNU Parallel in an hour</a>
    <ul>
      <li><a href="#Input-sources1">Input sources</a>
        <ul>
          <li><a href="#Linking-input-sources">Linking input sources</a></li>
        </ul>
      </li>
      <li><a href="#Building-the-command-line1">Building the command line</a>
        <ul>
          <li><a href="#The-command">The command</a></li>
          <li><a href="#The-replacement-strings">The replacement strings</a>
            <ul>
              <li><a href="#The-positional-replacement-strings">The positional replacement strings</a></li>
            </ul>
          </li>
        </ul>
      </li>
    </ul>
  </li>
  <li><a href="#Replacement-strings">Replacement strings</a>
    <ul>
      <li><a href="#Defining-replacement-strings">Defining replacement strings</a></li>
      <li><a href="#Copying-environment">Copying environment</a></li>
      <li><a href="#Controlling-the-output1">Controlling the output</a>
        <ul>
          <li><a href="#parset">parset</a></li>
        </ul>
      </li>
      <li><a href="#Controlling-the-execution1">Controlling the execution</a></li>
      <li><a href="#Remote-execution">Remote execution</a>
        <ul>
          <li><a href="#Workers">Workers</a></li>
          <li><a href="#transferfile">--transferfile</a></li>
          <li><a href="#return">--return</a></li>
          <li><a href="#cleanup">--cleanup</a></li>
        </ul>
      </li>
      <li><a href="#Pipe-mode1">Pipe mode</a></li>
      <li><a href="#Thats-it1">That&#39;s it</a></li>
    </ul>
  </li>
  <li><a href="#Advanced-usage">Advanced usage</a>
    <ul>
      <li><a href="#Controlling-the-execution2">Controlling the execution</a></li>
      <li><a href="#Remote-execution1">Remote execution</a></li>
    </ul>
  </li>
</ul>

<h1 id="Why-should-you-read-this-book">Why should you read this book?</h1>

<p>If you write shell scripts to do the same processing for different input, then GNU <b>parallel</b> will make your life easier and make your scripts run faster.</p>

<p>The book is written so you get the juicy parts first: The goal is that you read just enough to get you going. GNU <b>parallel</b> has an overwhelming amount of special features to help in different situations, and to avoid overloading you with information, the most used features are presented first.</p>

<p>All the examples are tested in Bash, and most will work in other shells, too, but there are a few exceptions. So you are recommended to use Bash while testing out the examples.</p>

<h1 id="Learn-GNU-Parallel-in-5-minutes">Learn GNU Parallel in 5 minutes</h1>

<p>You just need to run commands in parallel. You do not care about fine tuning.</p>

<p>To get going please run this to make some example files:</p>

<pre><code># If your system does not have &#39;seq&#39;, replace &#39;seq&#39; with &#39;jot&#39;
seq 5 | parallel seq {} &#39;&gt;&#39; example.{}</code></pre>

<h2 id="Input-sources">Input sources</h2>

<p>GNU <b>parallel</b> reads values from input sources. One input source is the command line. The values are put after <b>:::</b> :</p>

<pre><code>parallel echo ::: 1 2 3 4 5</code></pre>

<p>This makes it easy to run the same program on some files:</p>

<pre><code>parallel wc ::: example.*</code></pre>

<p>If you give multiple <b>:::</b>s, GNU <b>parallel</b> will generate all combinations:</p>

<pre><code>parallel wc ::: -l -c ::: example.*</code></pre>

<p>GNU <b>parallel</b> can also read the values from stdin (standard input):</p>

<pre><code>seq 5 | parallel echo</code></pre>

<h2 id="Building-the-command-line">Building the command line</h2>

<p>The command line is put before the <b>:::</b>. It can contain contain a command and options for the command:</p>

<pre><code>parallel wc -l ::: example.*</code></pre>

<p>The command can contain multiple programs. Just remember to quote characters that are interpreted by the shell (such as <b>;</b>):</p>

<pre><code>parallel echo counting lines&#39;;&#39; wc -l ::: example.*</code></pre>

<p>The value will normally be appended to the command, but can be placed anywhere by using the replacement string <b>{}</b>:</p>

<pre><code>parallel echo counting {}&#39;;&#39; wc -l {} ::: example.*</code></pre>

<p>When using multiple input sources you use the positional replacement strings <b>{1}</b> and <b>{2}</b>:</p>

<pre><code>parallel echo count {1} in {2}&#39;;&#39; wc {1} {2} ::: -l -c ::: example.*</code></pre>

<p>You can check what will be run with <b>--dry-run</b>:</p>

<pre><code>parallel --dry-run echo count {1} in {2}&#39;;&#39; wc {1} {2} ::: -l -c ::: example.*</code></pre>

<p>This is a good idea to do for every command until you are comfortable with GNU <b>parallel</b>.</p>

<h2 id="Controlling-the-output">Controlling the output</h2>

<p>The output will be printed as soon as the command completes. This means the output may come in a different order than the input:</p>

<pre><code>parallel sleep {}&#39;;&#39; echo {} done ::: 5 4 3 2 1</code></pre>

<p>You can force GNU <b>parallel</b> to print in the order of the values with <b>--keep-order</b>/<b>-k</b>. This will still run the commands in parallel. The output of the later jobs will be delayed, until the earlier jobs are printed:</p>

<pre><code>parallel -k sleep {}&#39;;&#39; echo {} done ::: 5 4 3 2 1</code></pre>

<h2 id="Controlling-the-execution">Controlling the execution</h2>

<p>If your jobs are compute intensive, you will most likely run one job for each core in the system. This is the default for GNU <b>parallel</b>.</p>

<p>But sometimes you want more jobs running. You control the number of job slots with <b>-j</b>. Give <b>-j</b> the number of jobs to run in parallel:</p>

<pre><code>parallel -j50 \
  wget https://ftpmirror.gnu.org/parallel/parallel-{1}{2}22.tar.bz2 \
  ::: 2012 2013 2014 2015 2016 \
  ::: 01 02 03 04 05 06 07 08 09 10 11 12</code></pre>

<h2 id="Pipe-mode">Pipe mode</h2>

<p>GNU <b>parallel</b> can also pass blocks of data to commands on stdin (standard input):</p>

<pre><code>seq 1000000 | parallel --pipe wc</code></pre>

<p>This can be used to process big text files. By default GNU <b>parallel</b> splits on \n (newline) and passes a block of around 1 MB to each job.</p>

<h2 id="Thats-it">That&#39;s it</h2>

<p>You have now learned the basic use of GNU <b>parallel</b>. This will probably cover most cases of your use of GNU <b>parallel</b>.</p>

<p>The rest of this document will go into more details on each of the sections and cover special use cases.</p>

<h1 id="Learn-GNU-Parallel-in-an-hour">Learn GNU Parallel in an hour</h1>

<p>In this part we will dive deeper into what you learned in the first 5 minutes.</p>

<p>To get going please run this to make some example files:</p>

<pre><code>seq 6 &gt; seq6
seq 6 -1 1 &gt; seq-6</code></pre>

<h2 id="Input-sources1">Input sources</h2>

<p>On top of the command line, input sources can also be stdin (standard input or &#39;-&#39;), files and fifos and they can be mixed. Files are given after <b>-a</b> or <b>::::</b>. So these all do the same:</p>

<pre><code>parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 ::: 6 5 4 3 2 1
parallel echo Dice1={1} Dice2={2} :::: &lt;(seq 6) :::: &lt;(seq 6 -1 1)
parallel echo Dice1={1} Dice2={2} :::: seq6 seq-6
parallel echo Dice1={1} Dice2={2} :::: seq6 :::: seq-6
parallel -a seq6 -a seq-6 echo Dice1={1} Dice2={2}
parallel -a seq6 echo Dice1={1} Dice2={2} :::: seq-6
parallel echo Dice1={1} Dice2={2} ::: 1 2 3 4 5 6 :::: seq-6
cat seq-6 | parallel echo Dice1={1} Dice2={2} :::: seq6 -</code></pre>

<p>If stdin (standard input) is the only input source, you do not need the &#39;-&#39;:</p>

<pre><code>cat seq6 | parallel echo Dice1={1}</code></pre>

<h3 id="Linking-input-sources">Linking input sources</h3>

<p>You can link multiple input sources with <b>:::+</b> and <b>::::+</b>:</p>

<pre><code>parallel echo {1}={2} ::: I II III IV V VI :::+ 1 2 3 4 5 6
parallel echo {1}={2} ::: I II III IV V VI ::::+ seq6</code></pre>

<p>The <b>:::+</b> (and <b>::::+</b>) will link each value to the corresponding value in the previous input source, so value number 3 from the first input source will be linked to value number 3 from the second input source.</p>

<p>You can combine <b>:::+</b> and <b>:::</b>, so you link 2 input sources, but generate all combinations with other input sources:</p>

<pre><code>parallel echo Dice1={1}={2} Dice2={3}={4} ::: I II III IV V VI ::::+ seq6 \
  ::: VI V IV III II I ::::+ seq-6</code></pre>

<h2 id="Building-the-command-line1">Building the command line</h2>

<h3 id="The-command">The command</h3>

<p>The command can be a script, a binary or a Bash function if the function is exported using <b>export -f</b>:</p>

<pre><code># Works only in Bash
my_func() {
  echo in my_func &quot;$1&quot;
}
export -f my_func
parallel my_func ::: 1 2 3</code></pre>

<p>If the command is complex, it often improves readability to make it into a function.</p>

<h3 id="The-replacement-strings">The replacement strings</h3>

<p>GNU <b>parallel</b> has some replacement strings to make it easier to refer to the input read from the input sources.</p>

<p>If the input is <b>mydir/mysubdir/myfile.myext</b> then:</p>

<pre><code>{} = mydir/mysubdir/myfile.myext
{.} = mydir/mysubdir/myfile
{/} = myfile.myext
{//} = mydir/mysubdir
{/.} = myfile
{#} = the sequence number of the job
{%} = the job slot number</code></pre>

<p>When a job is started it gets a sequence number that starts at 1 and increases by 1 for each new job. The job also gets assigned a slot number. This number is from 1 to the number of jobs running in parallel. It is unique between the running jobs, but is re-used as soon as a job finishes.</p>

<h4 id="The-positional-replacement-strings">The positional replacement strings</h4>

<p>The replacement strings have corresponding positional replacement strings. If the value from the 3rd input source is <b>mydir/mysubdir/myfile.myext</b>:</p>

<pre><code>{3} = mydir/mysubdir/myfile.myext
{3.} = mydir/mysubdir/myfile
{3/} = myfile.myext
{3//} = mydir/mysubdir
{3/.} = myfile</code></pre>

<p>So the number of the input source is simply prepended inside the {}&#39;s.</p>

<h1 id="Replacement-strings">Replacement strings</h1>

<p>--plus replacement strings</p>

<p>change the replacement string (-I --extensionreplace --basenamereplace --basenamereplace --dirnamereplace --basenameextensionreplace --seqreplace --slotreplace</p>

<p>--header with named replacement string</p>

<p>{= =}</p>

<p>Dynamic replacement strings</p>

<h2 id="Defining-replacement-strings">Defining replacement strings</h2>

<h2 id="Copying-environment">Copying environment</h2>

<p>env_parallel</p>

<h2 id="Controlling-the-output1">Controlling the output</h2>

<h3 id="parset">parset</h3>

<p><b>parset</b> is a shell function to get the output from GNU <b>parallel</b> into shell variables.</p>

<p><b>parset</b> is fully supported for <b>Bash/Zsh/Ksh</b> and partially supported for <b>ash/dash</b>. I will assume you run <b>Bash</b>.</p>

<p>To activate <b>parset</b> you have to run:</p>

<pre><code>. `which env_parallel.bash`</code></pre>

<p>(replace <b>bash</b> with your shell&#39;s name).</p>

<p>Then you can run:</p>

<pre><code>parset a,b,c seq ::: 4 5 6
echo &quot;$c&quot;</code></pre>

<p>or:</p>

<pre><code>parset &#39;a b c&#39; seq ::: 4 5 6
echo &quot;$c&quot;</code></pre>

<p>If you give a single variable, this will become an array:</p>

<pre><code>parset arr seq ::: 4 5 6
echo &quot;${arr[1]}&quot;</code></pre>

<p><b>parset</b> has one limitation: If it reads from a pipe, the output will be lost.</p>

<pre><code>echo This will not work | parset myarr echo
echo Nothing: &quot;${myarr[*]}&quot;</code></pre>

<p>Instead you can do this:</p>

<pre><code>echo This will work &gt; tempfile
parset myarr echo &lt; tempfile
echo ${myarr[*]}</code></pre>

<p>sql cvs</p>

<h2 id="Controlling-the-execution1">Controlling the execution</h2>

<p>--dryrun -v</p>

<h2 id="Remote-execution">Remote execution</h2>

<p>For this section you must have <b>ssh</b> access with no password to 2 servers: <b>$server1</b> and <b>$server2</b>.</p>

<pre><code>server1=server.example.com
server2=server2.example.net</code></pre>

<p>So you must be able to do this:</p>

<pre><code>ssh $server1 echo works
ssh $server2 echo works</code></pre>

<p>It can be setup by running &#39;ssh-keygen -t dsa; ssh-copy-id $server1&#39; and using an empty passphrase. Or you can use <b>ssh-agent</b>.</p>

<h3 id="Workers">Workers</h3>

<h3 id="transferfile">--transferfile</h3>

<p><b>--transferfile</b> <i>filename</i> will transfer <i>filename</i> to the worker. <i>filename</i> can contain a replacement string:</p>

<pre><code>parallel -S $server1,$server2 --transferfile {} wc ::: example.*
parallel -S $server1,$server2 --transferfile {2} \
   echo count {1} in {2}&#39;;&#39; wc {1} {2} ::: -l -c ::: example.*</code></pre>

<p>A shorthand for <b>--transferfile {}</b> is <b>--transfer</b>.</p>

<h3 id="return">--return</h3>

<h3 id="cleanup">--cleanup</h3>

<p>A shorthand for <b>--transfer --return {} --cleanup</b> is <b>--trc {}</b>.</p>

<h2 id="Pipe-mode1">Pipe mode</h2>

<p>--pipepart</p>

<h2 id="Thats-it1">That&#39;s it</h2>

<h1 id="Advanced-usage">Advanced usage</h1>

<p>parset fifo, cmd substitution, arrayelements, array with var names and cmds, env_parset</p>

<p>env_parallel</p>

<p>Interfacing with R.</p>

<p>Interfacing with JSON/jq</p>

<p>4dl() { board=&quot;$(printf -- &#39;%s&#39; &quot;${1}&quot; | cut -d &#39;/&#39; -f4)&quot; thread=&quot;$(printf -- &#39;%s&#39; &quot;${1}&quot; | cut -d &#39;/&#39; -f6)&quot; wget -qO- &quot;https://a.4cdn.org/${board}/thread/${thread}.json&quot; | jq -r &#39; .posts | map(select(.tim != null)) | map((.tim | tostring) + .ext) | map(&quot;https://i.4cdn.org/&#39;&quot;${board}&quot;&#39;/&quot;+.)[] &#39; | parallel --gnu -j 0 wget -nv }</p>

<p>Interfacing with XML/?</p>

<p>Interfacing with HTML/?</p>

<h2 id="Controlling-the-execution2">Controlling the execution</h2>

<p>--termseq</p>

<h2 id="Remote-execution1">Remote execution</h2>

<p>seq 10 | parallel --sshlogin &#39;ssh -i &quot;key.pem&quot; a@b.com&#39; echo</p>

<p>seq 10 | PARALLEL_SSH=&#39;ssh -i &quot;key.pem&quot;&#39; parallel --sshlogin a@b.com echo</p>

<p>seq 10 | parallel --ssh &#39;ssh -i &quot;key.pem&quot;&#39; --sshlogin a@b.com echo</p>

<p>ssh-agent</p>

<p>The sshlogin file format</p>

<p>Check if servers are up</p>


</body>

</html>