1. General
----------

This program is used to split up data from stdin in blocks which are sent
as input to parallel invocations of commands. The output from those are
then concatenated in the right order and sent to stdout.

Splitting up and parallelizing jobs like this might be useful to speed up
compression using multiple CPU cores or even multiple computers.

For this approach to be useful, the compressed format needs to allow multiple
compressed files to be concatenated. This is the case for gzip, bzip2 and xz.

2. Installation
---------------

Step 1, unpack the archive:

tar -xJvf splitjob*.tar.xz

Step 2, compile:

cd splitjob-*
make

Step 3, become root and install

su (and give password)
make install

3. Examples
-----------

Example 1, use multiple local cores:
splitjob -j 4 bzip2 < bigfile > bigfile.bz2

Example 2, use remote machines:
splitjob "ssh host1 gzip" "ssh host2 gzip" < f > f.gz

The above example assumes that ssh is configured to allow logins without asking
for password. See the manpage for ssh-keygen or do a google search for examples
on how to accomplish this.

Example 3, Use bigger blocks to reduce overhead:
splitjob -j 2 -b 10M gzip < file > file.gz

Example 4, parallel decompression:
splitjob -X -r 10 -j 10 -b 384M "xz -d -" < file.xz > file

4. Documentation
----------------

There is a man-page for splitjob, and you will get some help by typing:

splitjob -h

5. Known problems
-----------------

Splitjob does its best to detect and avoid any problems. If some sub command
fails it will by default make some retries before giving up and exiting with
a non-zero return value. However, like pbzip2, mpibzip2 and bzip2smp I would
like to say: Use at your own risk! Verify the contents of compressed files
before relying on them. If splitjob exits with any other return value than 0
its output should be discarded!

At parallel decompression there is a risk that the compressed data contains
the magic bytes used to separate compressed blocks. This could happen by
coincidence, but more likely because the compression has been used recursively
e g a compressed tar file or disk image file containing files compressed with
the same algorithm. Since version 3.1 of splitjob attempts to avoid failure
are made by merging with data from next job at retry when a failure is
detected and magic bytes are used to separate blocks. These attempts might
still end in failure if:

* A single block of compressed data contains more occurances of the magic
  bytes than the selected number of retries. This will give the error message
  "Failed again, giving up!" and can be avoided by increasing the number of
  retries with the -r switch.

* A job has already sent some of its data to stdout and no longer keeps it
  in its buffer. This will give the error message "Got too much data and
  failed!" and can be avoided by increasing the block size with the -b switch.
