pcp is a parallel copy utility for files
and directories. it enables parallel copy
operations between network shares mounted
on each cluster node. the execution model
allows two level of parallelism:

a) node level parallelism via machinefile and
b) thread parallelism within each process

todo:

threads currently have only one copy buffer
=> reading and writing cannot overlap. add a
second buffer per thread and use async writes 
to dest.



installation:

unpack/copy the tarball into your network share.
add the binary folder path to your path environment
variable (visible to all cluster nodes).

requirements:

stable ssh access to all compute nodes (requiring no password).

notes:

dont use any wildcards (e.g. asterisk) for filenames and directories.
these shortcuts will most often not expanded on remote nodes.

use thread level parallelism only if operating system supports
thread-safe io operations on open,read,write etc.

usage: 

pcp_run -m <machinefile> [SCRIPT OPTIONS] [pcp.bin OPTIONS] SOURCE(S) DEST. DIRECTORY

start scrip options:

-n    start as many <procs> from machine file
-h    help             

pcp options:

-t    # threads per copy process
-s    partition copy threshold
-c    chunk size for copy operations
-l    show thread statistics

examples:

#machinefile (process) parallelism
pcp_run -m /tmp/machinefile /netshare1/bigData /netshare1/many_files /netshare2/scratch
#inter-process (thread) parallelism
pcp_run -m /tmp/machinefile -t 8 /netshare1/bigData /netshare1/many_files /netshare2/scratch


enjoy, cl.



