DataMover User guide
 
Lawrence Berkeley National Laboratory
Scientific Data Management Research Group
Junmin Gu, Arie Shoshani, Alex Sim
 
April. 28, 2004
 
All the licenses are specified in the DataMover package.
 
DataMover consists of three core components and 6 utility components.
srm-get, srm-put and srm-copy are core components, and srm-status, srm-ls, srm-mkdir, srm-stage, srm-ping, and srm-abort are utility components.
 
The following section explains details of the options, functions and examples (examples are not real).
 
Notes: 
1. If there is a need for source URL, the format is following:
srm://host:port/file_path 
srm://host:port/file_path&msshost=mss_host&mssport=mss_port&remoteobj=corba_obj 
gsiftp://host:port/file_path
file:/local_file_path  or file:////absolute_file_path
 
If the sourceURL contains & or =, you need to embrace with double quotes.
 
e.g. srm://garchive.nersc.gov/nersc/gc5/asim/test
e.g. srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/instant_6/B06.44.atm.1996-01_hb.nc?remoteobj=HRMServerLBNL&msshost=hpss.ccs.ornl.gov&mssport=2121
e.g. file:/home/dm/srm/data3/hrm/hrmcache/myfile
e.g. gsiftp://dataportal.ucar.edu/tmp/file_name
 
 
2. If there is a need for input file, the format is following:
sourceURL  fileSize(optional)  targetURL(optional)
where each field is separated by the space, and sourceURL is required, but file size and targetURL are optional.
 
e.g. the input file can contain the following on a line:
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/ B06.44.atm.2_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 352334560 srm://garchive.nersc.gov/nersc/asim/hrm/copy/B06.44.atm.2_hb.nc
 
 
 
SRM-COPY
 
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file.
 
-help        
print this message
-d
debugging messages
-f path      
input file path for request
-s url        
source URL for request
-t url        
target URL for request
-sd path
input directory path for request
-td url       
target directory on HPSS for reqest (required)
-g string     
logical file name for source URL for request
-b int       
exact file size of source URL in bytes (default 2GB)
-i           
true if target is not on HPSS (default false)
-l path
log file
-c path      
path to the configuration file (default: ./hrm.rc)
-u string    
user id (default login@host)
-w           
wait until file is archived to MSS (default false)
-x int       
Block size for gridftp
-y int       
TCP buffer size for gridftp
-z int       
Number of parallel streams for gridftp
-r           
Recursive directory (default false)
-o option    
Option for overwrite mode [yes, no, diff, diff_size, less_size] (default: no) diff: no overwriting, only differences in dir listings
-ai path    
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub
-al login   
Storage login name if necessary (no default)
-ap passwd
Storage passwd if necessary (no default)
-at type    
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far.  To work with SSH type, srm-copy has to run at the target host.                            e.g. dataportal.ucar.edu
-aj login   
Storage project id if necessary (no default)
-ad int     
Storage reteition period in days if necessary (no default)
-ar string   
Storage read passwd if necessary (no default)
-aw string   
Storage write passwd if necessary (no default)
-ei path    
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub
-el login   
Storage login name if necessary (no default)
-ep passwd  
Storage passwd if necessary (no default)
-et type    
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far
-ej login   
Storage project id if necessary (no default)
-ed int     
Storage reteition period in days if necessary (no default)
-er string   
Storage read passwd if necessary (no default)
-ew string   
Storage write passwd if necessary (no default)
 
Note:  To work with SSH type, srm-copy has to run at the target host.  e.g. dataportal.ucar.edu
  
Example 1: srm-copy.sol -f /tmp/inputfile -td gsiftp://archive.nersc.gov/nersc/asim  
          This example will copy the entries in /tmp/inputfile to the specified destination.
 
Example 2: srm-copy.sol -s srm://srm.lbl.gov:5537/nersc/junmin/small_217?obj=HRMServerLBNL&msshost=garchive.nersc.gov -t srm:/garchive.nersc.gov/nersc/asim/test/small.217.dat -b 1234567 -at GSI -et GSI
          This example will copy /nersc/junmin/small_217 on garchive.nersc.gov to /nersc/asim/test/small.217.dat on garchive.nersc.gov connecting HRMServerLBNL with GSI authetications on both ends.
             
Example 3: srm-copy.sol -s srm://srm.lbl.gov:7031/nersc/junmin/small_217?obj=HRMServerLBNL&msshost=archive.nersc.gov -t srm://archive.nersc.gov/nersc/asim/test/small.217.dat -b 1234567 -at ENCRYPT -et ENCRYPT
            This example will copy /nersc/junmin/small_217 on garchive.nersc.gov to /nersc/asim/test/small.217.dat on garchive.nersc.gov connecting HRMServerLBNL with ENCRYPT authetications on both ends.
 
  Example 4:  srm-copy.sol -s “srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1996-01_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov” -t srm://garchive.nersc.gov/nersc/asim/test/B06.44.atm.1996-01_hb.nc -b 390083812 -at PLAIN -et GSI
            This example will copy /home/ACPI/B06.44/B06.44.atm.1996-01_hb.nc on hpss.ccs.ornl.gov to /nersc/asim/test/B06.44.atm.1996-01_hb.nc on garchive.nersc.gov  via HRMServerORNL with PLAIN  authentication and GSI authentication.
 
  Example 5:  srm-copy.sol -s “srm://srm.lbl.gov:7031/nersc/junmin/test/small_217?obj=HRMServerLBNL&msshost=garchive.nersc.gov” -t srm://dataportal.ucar.edu:7031/ASIM/test2/mytest.file -b 1234567 -at GSI -et SSH
            This example will copy /nersc/junmin/test/small_217 on garchive.nersc.gov to /ASIM/test2/mytest.file  on dataportal.ucar.edu  via HRMServerLBNL with GSI  authentication and SSH authentication.
 
 Example 6: srm-copy.sol -sd "srm://sleepy.ccs.ornl.gov:6161/home/ACPI/B06.44/instant_6?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov"  -td srm://garchive.nersc.gov/nersc/asim/test2 -d -w -at plain
            This example will copy all files in /home/ACPI/B06.44/instant_6  on hpss.ccs.ornl.gov to /nersc/asim/test2  directory on garchive.nersc.gov  via HRMServerORNL with PLAIN  authentication at the source and GSI authentication at the target.
 
 Example 7: srm-copy.sol -sd "srm://sleepy.ccs.ornl.gov:6161/home/ACPI/B06.44/instant_6/*.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov"  -td srm://garchive.nersc.gov/nersc/asim/test2 -d -w -at plain
            This example will copy all files with .nc in /home/ACPI/B06.44/instant_6  on hpss.ccs.ornl.gov to /nersc/asim/test2  directory on garchive.nersc.gov  via HRMServerORNL with PLAIN  authentication at the source and GSI authentication at the target.
 
 Example 8: srm-copy.sol -f ./sample/sample.mcopy.ornl.lbnl -t srm://garchive.nersc.gov/nersc/asim/hrm/mcopy -d -w  
                  Where  ./sample/sample.mcopy.ornl.lbnl is: 
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1995-02_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 352334560
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1995-03_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 390083812
            This example will copy all file entries in ./sample/sample.mcopy.ornl.lbnl  to /nersc/asim/hrm/mcopy directory on garchive.nersc.gov  with GSI authentication at the target. Source instructions are in the source URL itself.
 
 
 
SRM-GET (deprecated)
 
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file.
 
-help
Print options list
-d 
input file path for the request. 
-s url      
Single source URL 
-sd url
source directory URL on MSS (format: srm://host:port/dir/path)
-sourced url
source directory URL on MSS (format: srm://host:port/dir/path)
-t url      
Single target URL for the request (default: file:/$PWD/same_file_name)
-td url 
target directory URL for files. If no target is provided, the following format will be used by default: file:/$PWD/dir_path/same_file_name 
-tkdir       
do not create target directory structure (default: false)
By default, target directory structure will be created recursively if there is any from the source.
-trdir  
no target directory structure, and put all files into one flat directory (default: false)
-g string   
Logical file name for source URL for a single file request (optional)
-b int      
file size of the source URL in bytes if known (default: 2GB)
-l path      
log file path
-c path      
path to the configuration file (default: hrm.rc)
-u string    
user id (default: login@host)
-v int
max number of current file transfers to targets (default: 2)
-x int       
Block size for gridftp
-y int
TCP buffer size for gridftp
-z int       
Number of parallel streams for gridftp
-r 
recursive directory (default false)
-al login   
Storage login name if necessary (no default)
-ap passwd  
Storage password if necessary (no default)
-at type    
Storage login type when provided (default GSI)  [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE]. GSI, ENCRYPT, PLAIN and NONE are implemented so far
-aj login   
Storage project id if necessary (no default)
-ad int
Storage Retention period in days if necessary (no default)
-ar string   
Storage read password if necessary (no default)
-aw string   
Storage write password if necessary (no default)
-ai path
Path for Storage information if necessary (no default) in case of KERBEROS or SSH
 
  example 1: srm-get.sol -f /tmp/myinputfile -t file:/tmp/my_directory_path -at gsi 
                  format of input file: sourceURL fileSize(optional) targetURL(optional)
                  where sourceURL is required, and default file size is max 2GB.
 
  example 2: srm-get.sol -sd srm://garchive.nersc.gov/nersc/asim/test -t file:/tmp/path -at gsi 
 
  example 3: srm-get.sol -conf hrm.rc -sd "srm://srm.lbl.gov:6169/nersc/gc5/asim/esg?obj=myOBJ&host=garchive.nersc.gov" -td file:/data/srm/client -z 4 -x 1000000 -d -r -tkdir 
 
  example 4: srm-get.sol -conf hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td file:/data/srm/client -z 4 -x 1000000 -d -r -trdir -tkdir 
 
  example 5: srm-get.sol -s srm://archive.nersc.gov/nersc/gc5/userid/path/file2.dat -t file:/tmp/myfile.dat -z 4 -at gsi -z 4 -x 1000000 -d
 
  example 6:  srm-get.sol -conf hrm.rc -sd srm://garchive.nersc.gov/nersc/asim/esg/review -td file:/home/dm/srm/data2/srm/client -z 4 -x 1000000 -u arie -d
 
  example 7:  srm-get.sol -conf hrm.rc -f ./sample/sample.get -td file:/home/dm/srm/data2/srm/data -z 4 -x 1000000 -d
                where  sample/sample.get is :
srm://garchive.nersc.gov/nersc/ccsm/b20.007/ice/b20.007.csim.i.1003-01-01-00000.nc 34425272 file:/home/data2/srm/client/b20.007.csim.i.1003-01-01-00000.nc
srm://garchive.nersc.gov/nersc/ccsm/b20.007/ice/b20.007.csim.i.1004-01-01-00000.nc 34425272 file:/home/data2/srm/client/sub/b20.007.csim.i.1004-01-01-00000.nc
 
 
SRM-PUT (deprecated)
 
A component to request a file, a set of local files or a local directory to be copied over to the designated target directory or as a target file.
Note: Due to the current limitations on the gridftp (http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=906), this command may hang unexpectedly. Please use SRM-COPY in the meanwhile.
 
-help        
print this message
-d           
debugging messages
-f path      
input file path for request to get
-sd path     
input path for all files in the directory
-td url      
target directory on HPSS for request to put (optionally required)
-s url       
source URL for request to put (required)
-g string
logical file name for source URL for request to put (optional)
-t url       
target URL on HPSS for request to put  (required for HPSS access)
-b int       
exact file size of source URL in bytes  (required)
-l path      
log file
-c path      
path to the configuration file (default: hrm.rc)
-w
wait until file is archived to MSS (default false)
-m
waive target URL option for DRM put (default false)
-u string    
user id (default login@host)
-x int       
Block size for gridftp
-y int       
TCP buffer size for gridftp
-z int       
Number of parallel streams for gridftp
-r option    
Option for overwrite mode [yes, no, diff_size, less_size]  (default: no)
-ai path    
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub
-al login   
Storage login name if necessary (no default)
-ap passwd  
Storage passwd if necessary (no default)
-at type    
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far
-aj login   
Storage project id if necessary (no default)
-ad int     
Storage retention period in days if necessary (no default)
-ar string   
Storage read passwd if necessary (no default)
-aw string   
Storage write passwd if necessary (no default)
 
  
Example 1: srm-put.sol -f /tmp/myinputfile -t srm://garchive.nersc.gov/nersc/userid/path -at gsi 
                format of input file: sourceURL fileSize(optional) targetURL(optional)
                where sourceURL is required, and file size is 2GB 
 
example 2: srm-put.sol -s file:/tmp/mydirectory -t srm://garchive.nersc.gov/nersc/userid/path -at gsi 
 
example 3: srm-put.sol -conf hrm.rc -s file:/tmp/asim.test -t srm://garchive.nersc.gov/nersc/asim/test/new.test  -d 
 
example 4:  srm-put.sol -conf hrm.rc -s file:/home/data3/srm -t srm://garchive.nersc.gov/nersc/asim/hrm/mput -z 4 -x 1000000 -d 
 
example 5:  srm-put.sol -conf hrm.rc -s file:/home/data3/hrmcache -t srm://garchive.nersc.gov/nersc/asim/hrm/mput -z 4 -x 1000000 -d 
 
 
 
SRM-ABORT
 
A component to abort the previous request.
 
 
-help       
print this message
-d          
debugging messages
-l path     
log file
-c path     
path to the configuration file (default: hrm.rc)
-u string   
user id (default: login@host)
-r string   
request id (no default)
 
  example: srm-abort.sol -r u12345-123 
 
 
 
SRM-LS
 
A component to browse a file or a directory as specified.
Note: “*” (wild card) works as part of the filename in the source URL. However, wild card cannot be used with the recursive directory option.
 
-help       
print this message
-d          
debugging messages (default false)
-q          
quiet, no debugging messages (default true)
-x          
xml output format (default false)
-xd         
xml output format including directories with <dir>(default false)
-r          
recursive directory (default false)
-s url      
source URL for request to get (required)
-o path     
local output file path for the listing (default no output)
-l path     
log file
-c path     
path to the configuration file (default: hrm.rc)
-u string   
user id (default: login@host)
-ai path
Path for Storage information if necessary (no default) in case of KERBEROS or SSH
-al login   
Storage login name if necessary (no default)
-ap passwd  
Storage passwd if necessary (no default)
-at type    
Storage login type [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, NONE] when provided (default GSI)
-aj login   
Storage project id if necessary (no default)
-ar login   
Storage read passwd if necessary (no default)
-aw login   
Storage write passwd if necessary (no default)
 
 
  Example 1: srm-ls.sol -s srm://garchive.nersc.gov/nersc/userid/path -at gsi
 
  Example 2:  srm-ls.sol -s srm://garchive.nersc.gov/nersc/userid/path -at gsi -r -xml -xd -o /tmp/myoutput
 
  sample output 1 with -xml option:
    <list>
        <file path="B06.44.atm.1995-02_hb.nc" size="352334560"/>
        <file path="B06.44.atm.1995-03_hb.nc" size="390083812"/>
        <file path="ncar/B06.81.atm.1890.nc" size="327575548"/>
        <file path="review/B06.81.atm.1890.nc" size="327575548"/>
    </list>
 
  sample output 2 with -xml -xd or -xmld option:
    <list>
        <file path="B06.44.atm.1995-02_hb.nc" size="352334560"/>
        <file path="B06.44.atm.1995-03_hb.nc" size="390083812"/>
        <dir path="ncar/">
        <file path="B06.81.atm.1890.nc" size="327575548"/>
        </dir>
        <dir path="review/"/>
        <file path="B06.81.atm.1890.nc" size="327575548"/>
        <file path="B06.81.atm.1890.nc" size="327575548"/>
        </dir>
    </list>
 
 
 
 
SRM-MKDIR
 
A component to make a directory at the target with recursive option.
 
 
-help       
print this message
-d          
debugging messages
-s url      
source URL for creating directories (required)
-r url      
directory URL for starting point in SOURCE url (default: very beginning)
-l path     
log file
-c path     
path to the configuration file (default: hrm.rc)
-u string   
user id (default: login@host)
-ai path    
Path for Storage information if necessary (no default) in case of KERBEROS or SSH
-al login   
Storage login name if necessary (no default)
-ap passwd  
Storage passwd if necessary (no default)
-at type    
Storage login type [PLAIN, GSI, ENCRYPT, KERBEROS, SSH] when provided (default GSI)
-aj login   
Storage project id if necessary (no default)
-ar login   
Storage read passwd if necessary (no default)
-aw login   
Storage write passwd if necessary (no default)
 
 
  example: srm-mkdir.sol -s srm://garchive.nersc.gov/nersc/userid/path1/new_path -r srm://garchive.nersc.gov/nersc/userid/path1 
 
 
 
 
SRM-STAGE (deprecated)
 
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file. Basic functionality is the same as the SRM-GET except there is no pulling data to the user’s local path. In other words, the target needs to be where SRM can access.
 
 
-help        
print this message
-d           
debugging messages
-f path      
input file path for request to get
-sd url       
source URL on MSS (default: srm://host:port/dir/path)
-sourced url  
source URL on MSS (default: srm://host:port/dir/path)
-td url       
target URL for files (default: SRM cache)
-tkdir       
do not create target directory structure (default: false)
-trdir       
no target directory structure, and put all files into one flat directory (default: false)
-s url      
source URL for request to get
-g string   
Logical file name for source URL for request to get (optional)
-t url      
target URL for request to get (default: file:/$PWD/same_file_name)
-b int      
file size of source URL in bytes (default: 2GB)
-l path      
log file
-o option    
Option for overwrite mode [yes, no, diff_size, less_size] (default: no) diff: no overwriting, only differences in dir listings
-e           
release after staging (default false)
-r           
resursive directory (default false)
-w
no waiting after the request (default false) by default, srm-mstage will wait until request is completed and check the messages
-c path
path to the configuration file (default: hrm.rc)
-u string    
user id (default: login@host)
-x int       
Block size for gridftp
-y int       
TCP buffer size for gridftp
-z int       
Number of parallel streams for gridftp
-ai path    
Path for Storage information if necessary (no default) in case of KERBEROS or SSH
-al login   
Storage login name if necessary (no default)
-ap passwd  
Storage passwd if necessary (no default)
-at type    
Storage login type when provided (default GSI)  [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, ENCRYPT and NONE are implemented so far
-aj login   
Storage project id if necessary (no default)
-ad int     
Storage Retention period in days if necessary (no default)
-ar string   
Storage read passwd if necessary (no default)
-aw string   
Storage write passwd if necessary (no default)
 
  example 1: srm-stage.sol -f /tmp/myinputfile -t file:/tmp/my_directory_path -at gsi 
                format of input file: sourceURL fileSize(optional) targetURL(optional)
                where sourceURL is required, and max default file size is 2GB. 
 
  example 2: srm-stage.sol -sd srm://garchive.nersc.gov/nersc/asim/test -at gsi 
 
  example 3: srm-stage.sol -conf ./hrm.rc -sd "srm://srm.lbl.gov:6169/nersc/asim/esg?remoteobj=myOBJ&msshost=garchive.nersc.gov" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -tkdir 
 
  example 4: srm-stage.sol -conf ./hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -trdir -tkdir 
 
  example 5: srm-stage.sol -conf ./hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -e 
 
  example 6: srm-stage.sol -s srm://garchive.nersc.gov/nersc/userid/path/file2.dat -t gsiftp://srm.lbl.gov/tmp/client.data -z 4 -at gsi 
 
 
 
SRM-STATUS
 
A component to request the status of the previously submitted request. 
 
-help        
print this message
-d           
debugging messages
-f path
input file path for request to get
-s url       
source URL for request
-sd url      
source directory URL for request, if applicable
-t url       
target URL for request
-g string    
logical file name for source URL for request (optional)
-l path      
log file
-c path      
path to the configuration file (default: hrm.rc)
-u userid    
user ID for request (required)
-r requestid
request ID  (required)
-x int       
Block size for gridftp
-y int       
TCP buffer size for gridftp
-z int       
Number of parallel streams for gridftp
 
 
  examppe: srm-status.sol -f /tmp/myinputfile
                 srm-status.sol -s gsiftp://archive.nersc.gov/nersc/usrid/path 
                 srm-status.sol -r my_request_id 
 
           format of input file: sourceURL fileSize(optional) targetURL(optional)
           where sourceURL is required and max file size is 2GB.
 
 
SRM-PING
 
A component to check the health of HRM/DRM. 
 
 
-help       
print this message
-d          
debugging messages
-c path     
path to the configuration file (default: hrm.rc)
 
 
  example: srm-ping.sol -conf hrm.rc 
 
 
Configuration File:
 
DataMover programs use a configuration file to get connector information with the “hrm*” prefix. The configuration file can be shared with other HRM or DataMover programs.
 
NSHost and NSPort specify information about the naming service to connect.
hrm*NSHost=srm.lbl.gov
hrm*NSPort=6171
 
HRMName specifies the CORBA object name to submit the request via NSHost:NSPort.
hrm*HRMName=HPSSResourceManagerASIM
 
HRMLogFile specified the path to the event log, when EnableLogging is set to true.
hrm*EnableLogging=true
hrm*HRMLogFile=/home/dm/srm/data2/srm/log/out.client.log

 
 
Sample Configuration FILE
 
 
Note: Blank lines and a line with starting # character will be ignored.
 
Notes on prefixes:
common* : will be read by all components
drm* : will be read only by the DRM
trm* : will be read only by the TRM
hrm* : will be read only by the DataMover client programs.
 
 
Sample hrm.rc
####################################################
common*NSHost=srm.lbl.gov
common*NSPort=6171
common*VerboseConfig=true
####################################################
# For DataMover client programs
hrm*HRMLogFile=/srm/data2/srm/log/out.client.log
hrm*EnableLogging=true
hrm*HPSSHostName=garchive.nersc.gov
hrm*HRMName=DRMServerOnHRMASIM
hrm*MSSMaximumAllowed=5</