DataMover User guide
Lawrence Berkeley National Laboratory
Scientific Data Management Research Group
Junmin Gu, Arie Shoshani, Alex Sim
April. 28, 2004
All the licenses are specified in the DataMover package.
DataMover consists of three core components and 6 utility components.
srm-get, srm-put and srm-copy are core components, and srm-status, srm-ls, srm-mkdir, srm-stage, srm-ping, and srm-abort are utility components.
The following section explains details of the options, functions and examples (examples are not real).
Notes:
1. If there is a need for source URL, the format is following:
srm://host:port/file_path
srm://host:port/file_path&msshost=mss_host&mssport=mss_port&remoteobj=corba_obj
gsiftp://host:port/file_path
file:/local_file_path or file:////absolute_file_path
If the sourceURL contains & or =, you need to embrace with double quotes.
e.g. srm://garchive.nersc.gov/nersc/gc5/asim/test
e.g. srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/instant_6/B06.44.atm.1996-01_hb.nc?remoteobj=HRMServerLBNL&msshost=hpss.ccs.ornl.gov&mssport=2121
e.g. file:/home/dm/srm/data3/hrm/hrmcache/myfile
e.g. gsiftp://dataportal.ucar.edu/tmp/file_name
2. If there is a need for input file, the format is following:
sourceURL fileSize(optional) targetURL(optional)
where each field is separated by the space, and sourceURL is required, but file size and targetURL are optional.
e.g. the input file can contain the following on a line:
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/ B06.44.atm.2_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 352334560 srm://garchive.nersc.gov/nersc/asim/hrm/copy/B06.44.atm.2_hb.nc
SRM-COPY
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file.
-help |
print this message |
-d |
debugging messages |
-f path |
input file path for request |
-s url |
source URL for request |
-t url |
target URL for request |
-sd path |
input directory path for request |
-td url |
target directory on HPSS for reqest (required) |
-g string |
logical file name for source URL for request |
-b int |
exact file size of source URL in bytes (default 2GB) |
-i |
true if target is not on HPSS (default false) |
-l path |
log file |
-c path |
path to the configuration file (default: ./hrm.rc) |
-u string |
user id (default login@host) |
-w |
wait until file is archived to MSS (default false) |
-x int |
Block size for gridftp |
-y int |
TCP buffer size for gridftp |
-z int |
Number of parallel streams for gridftp |
-r |
Recursive directory (default false) |
-o option |
Option for overwrite mode [yes, no, diff, diff_size, less_size] (default: no) diff: no overwriting, only differences in dir listings |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage passwd if necessary (no default) |
-at type |
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far. To work with SSH type, srm-copy has to run at the target host. e.g. dataportal.ucar.edu |
-aj login |
Storage project id if necessary (no default) |
-ad int |
Storage reteition period in days if necessary (no default) |
-ar string |
Storage read passwd if necessary (no default) |
-aw string |
Storage write passwd if necessary (no default) |
-ei path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub |
-el login |
Storage login name if necessary (no default) |
-ep passwd |
Storage passwd if necessary (no default) |
-et type |
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far |
-ej login |
Storage project id if necessary (no default) |
-ed int |
Storage reteition period in days if necessary (no default) |
-er string |
Storage read passwd if necessary (no default) |
-ew string |
Storage write passwd if necessary (no default) |
Note: To work with SSH type, srm-copy has to run at the target host. e.g. dataportal.ucar.edu
Example 1: srm-copy.sol -f /tmp/inputfile -td gsiftp://archive.nersc.gov/nersc/asim
This example will copy the entries in /tmp/inputfile to the specified destination.
Example 2: srm-copy.sol -s srm://srm.lbl.gov:5537/nersc/junmin/small_217?obj=HRMServerLBNL&msshost=garchive.nersc.gov -t srm:/garchive.nersc.gov/nersc/asim/test/small.217.dat -b 1234567 -at GSI -et GSI
This example will copy /nersc/junmin/small_217 on garchive.nersc.gov to /nersc/asim/test/small.217.dat on garchive.nersc.gov connecting HRMServerLBNL with GSI authetications on both ends.
Example 3: srm-copy.sol -s srm://srm.lbl.gov:7031/nersc/junmin/small_217?obj=HRMServerLBNL&msshost=archive.nersc.gov -t srm://archive.nersc.gov/nersc/asim/test/small.217.dat -b 1234567 -at ENCRYPT -et ENCRYPT
This example will copy /nersc/junmin/small_217 on garchive.nersc.gov to /nersc/asim/test/small.217.dat on garchive.nersc.gov connecting HRMServerLBNL with ENCRYPT authetications on both ends.
Example 4: srm-copy.sol -s “srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1996-01_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov” -t srm://garchive.nersc.gov/nersc/asim/test/B06.44.atm.1996-01_hb.nc -b 390083812 -at PLAIN -et GSI
This example will copy /home/ACPI/B06.44/B06.44.atm.1996-01_hb.nc on hpss.ccs.ornl.gov to /nersc/asim/test/B06.44.atm.1996-01_hb.nc on garchive.nersc.gov via HRMServerORNL with PLAIN authentication and GSI authentication.
Example 5: srm-copy.sol -s “srm://srm.lbl.gov:7031/nersc/junmin/test/small_217?obj=HRMServerLBNL&msshost=garchive.nersc.gov” -t srm://dataportal.ucar.edu:7031/ASIM/test2/mytest.file -b 1234567 -at GSI -et SSH
This example will copy /nersc/junmin/test/small_217 on garchive.nersc.gov to /ASIM/test2/mytest.file on dataportal.ucar.edu via HRMServerLBNL with GSI authentication and SSH authentication.
Example 6: srm-copy.sol -sd "srm://sleepy.ccs.ornl.gov:6161/home/ACPI/B06.44/instant_6?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov" -td srm://garchive.nersc.gov/nersc/asim/test2 -d -w -at plain
This example will copy all files in /home/ACPI/B06.44/instant_6 on hpss.ccs.ornl.gov to /nersc/asim/test2 directory on garchive.nersc.gov via HRMServerORNL with PLAIN authentication at the source and GSI authentication at the target.
Example 7: srm-copy.sol -sd "srm://sleepy.ccs.ornl.gov:6161/home/ACPI/B06.44/instant_6/*.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov" -td srm://garchive.nersc.gov/nersc/asim/test2 -d -w -at plain
This example will copy all files with .nc in /home/ACPI/B06.44/instant_6 on hpss.ccs.ornl.gov to /nersc/asim/test2 directory on garchive.nersc.gov via HRMServerORNL with PLAIN authentication at the source and GSI authentication at the target.
Example 8: srm-copy.sol -f ./sample/sample.mcopy.ornl.lbnl -t srm://garchive.nersc.gov/nersc/asim/hrm/mcopy -d -w
Where ./sample/sample.mcopy.ornl.lbnl is:
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1995-02_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 352334560
srm://sleepy.ccs.ornl.gov:7031/home/ACPI/B06.44/B06.44.atm.1995-03_hb.nc?remoteobj=HRMServerORNL&msshost=hpss.ccs.ornl.gov 390083812
This example will copy all file entries in ./sample/sample.mcopy.ornl.lbnl to /nersc/asim/hrm/mcopy directory on garchive.nersc.gov with GSI authentication at the target. Source instructions are in the source URL itself.
SRM-GET (deprecated)
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file.
-help |
Print options list |
-d |
input file path for the request. |
-s url |
Single source URL |
-sd url |
source directory URL on MSS (format: srm://host:port/dir/path) |
-sourced url |
source directory URL on MSS (format: srm://host:port/dir/path) |
-t url |
Single target URL for the request (default: file:/$PWD/same_file_name) |
-td url |
target directory URL for files. If no target is provided, the following format will be used by default: file:/$PWD/dir_path/same_file_name |
-tkdir |
do not create target directory structure (default: false) By default, target directory structure will be created recursively if there is any from the source. |
-trdir |
no target directory structure, and put all files into one flat directory (default: false) |
-g string |
Logical file name for source URL for a single file request (optional) |
-b int |
file size of the source URL in bytes if known (default: 2GB) |
-l path |
log file path |
-c path |
path to the configuration file (default: hrm.rc) |
-u string |
user id (default: login@host) |
-v int |
max number of current file transfers to targets (default: 2) |
-x int |
Block size for gridftp |
-y int |
TCP buffer size for gridftp |
-z int |
Number of parallel streams for gridftp |
-r |
recursive directory (default false) |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage password if necessary (no default) |
-at type |
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE]. GSI, ENCRYPT, PLAIN and NONE are implemented so far |
-aj login |
Storage project id if necessary (no default) |
-ad int |
Storage Retention period in days if necessary (no default) |
-ar string |
Storage read password if necessary (no default) |
-aw string |
Storage write password if necessary (no default) |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH |
example 1: srm-get.sol -f /tmp/myinputfile -t file:/tmp/my_directory_path -at gsi
format of input file: sourceURL fileSize(optional) targetURL(optional)
where sourceURL is required, and default file size is max 2GB.
example 2: srm-get.sol -sd srm://garchive.nersc.gov/nersc/asim/test -t file:/tmp/path -at gsi
example 3: srm-get.sol -conf hrm.rc -sd "srm://srm.lbl.gov:6169/nersc/gc5/asim/esg?obj=myOBJ&host=garchive.nersc.gov" -td file:/data/srm/client -z 4 -x 1000000 -d -r -tkdir
example 4: srm-get.sol -conf hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td file:/data/srm/client -z 4 -x 1000000 -d -r -trdir -tkdir
example 5: srm-get.sol -s srm://archive.nersc.gov/nersc/gc5/userid/path/file2.dat -t file:/tmp/myfile.dat -z 4 -at gsi -z 4 -x 1000000 -d
example 6: srm-get.sol -conf hrm.rc -sd srm://garchive.nersc.gov/nersc/asim/esg/review -td file:/home/dm/srm/data2/srm/client -z 4 -x 1000000 -u arie -d
example 7: srm-get.sol -conf hrm.rc -f ./sample/sample.get -td file:/home/dm/srm/data2/srm/data -z 4 -x 1000000 -d
where sample/sample.get is :
srm://garchive.nersc.gov/nersc/ccsm/b20.007/ice/b20.007.csim.i.1003-01-01-00000.nc 34425272 file:/home/data2/srm/client/b20.007.csim.i.1003-01-01-00000.nc
srm://garchive.nersc.gov/nersc/ccsm/b20.007/ice/b20.007.csim.i.1004-01-01-00000.nc 34425272 file:/home/data2/srm/client/sub/b20.007.csim.i.1004-01-01-00000.nc
SRM-PUT (deprecated)
A component to request a file, a set of local files or a local directory to be copied over to the designated target directory or as a target file.
Note: Due to the current limitations on the gridftp (http://bugzilla.globus.org/bugzilla/show_bug.cgi?id=906), this command may hang unexpectedly. Please use SRM-COPY in the meanwhile.
-help |
print this message |
-d |
debugging messages |
-f path |
input file path for request to get |
-sd path |
input path for all files in the directory |
-td url |
target directory on HPSS for request to put (optionally required) |
-s url |
source URL for request to put (required) |
-g string |
logical file name for source URL for request to put (optional) |
-t url |
target URL on HPSS for request to put (required for HPSS access) |
-b int |
exact file size of source URL in bytes (required) |
-l path |
log file |
-c path |
path to the configuration file (default: hrm.rc) |
-w |
wait until file is archived to MSS (default false) |
-m |
waive target URL option for DRM put (default false) |
-u string |
user id (default login@host) |
-x int |
Block size for gridftp |
-y int |
TCP buffer size for gridftp |
-z int |
Number of parallel streams for gridftp |
-r option |
Option for overwrite mode [yes, no, diff_size, less_size] (default: no) |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH, e.g. $HOME/.ssh/identity.pub |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage passwd if necessary (no default) |
-at type |
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, SSH, ENCRYPT and NONE are implemented so far |
-aj login |
Storage project id if necessary (no default) |
-ad int |
Storage retention period in days if necessary (no default) |
-ar string |
Storage read passwd if necessary (no default) |
-aw string |
Storage write passwd if necessary (no default) |
Example 1: srm-put.sol -f /tmp/myinputfile -t srm://garchive.nersc.gov/nersc/userid/path -at gsi
format of input file: sourceURL fileSize(optional) targetURL(optional)
where sourceURL is required, and file size is 2GB
example 2: srm-put.sol -s file:/tmp/mydirectory -t srm://garchive.nersc.gov/nersc/userid/path -at gsi
example 3: srm-put.sol -conf hrm.rc -s file:/tmp/asim.test -t srm://garchive.nersc.gov/nersc/asim/test/new.test -d
example 4: srm-put.sol -conf hrm.rc -s file:/home/data3/srm -t srm://garchive.nersc.gov/nersc/asim/hrm/mput -z 4 -x 1000000 -d
example 5: srm-put.sol -conf hrm.rc -s file:/home/data3/hrmcache -t srm://garchive.nersc.gov/nersc/asim/hrm/mput -z 4 -x 1000000 -d
SRM-ABORT
A component to abort the previous request.
-help |
print this message |
-d |
debugging messages |
-l path |
log file |
-c path |
path to the configuration file (default: hrm.rc) |
-u string |
user id (default: login@host) |
-r string |
request id (no default) |
example: srm-abort.sol -r u12345-123
SRM-LS
A component to browse a file or a directory as specified.
Note: “*” (wild card) works as part of the filename in the source URL. However, wild card cannot be used with the recursive directory option.
-help |
print this message |
-d |
debugging messages (default false) |
-q |
quiet, no debugging messages (default true) |
-x |
xml output format (default false) |
-xd |
xml output format including directories with <dir>(default false) |
-r |
recursive directory (default false) |
-s url |
source URL for request to get (required) |
-o path |
local output file path for the listing (default no output) |
-l path |
log file |
-c path |
path to the configuration file (default: hrm.rc) |
-u string |
user id (default: login@host) |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage passwd if necessary (no default) |
-at type |
Storage login type [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, NONE] when provided (default GSI) |
-aj login |
Storage project id if necessary (no default) |
-ar login |
Storage read passwd if necessary (no default) |
-aw login |
Storage write passwd if necessary (no default) |
Example 1: srm-ls.sol -s srm://garchive.nersc.gov/nersc/userid/path -at gsi
Example 2: srm-ls.sol -s srm://garchive.nersc.gov/nersc/userid/path -at gsi -r -xml -xd -o /tmp/myoutput
sample output 1 with -xml option:
<list>
<file path="B06.44.atm.1995-02_hb.nc" size="352334560"/>
<file path="B06.44.atm.1995-03_hb.nc" size="390083812"/>
<file path="ncar/B06.81.atm.1890.nc" size="327575548"/>
<file path="review/B06.81.atm.1890.nc" size="327575548"/>
</list>
sample output 2 with -xml -xd or -xmld option:
<list>
<file path="B06.44.atm.1995-02_hb.nc" size="352334560"/>
<file path="B06.44.atm.1995-03_hb.nc" size="390083812"/>
<dir path="ncar/">
<file path="B06.81.atm.1890.nc" size="327575548"/>
</dir>
<dir path="review/"/>
<file path="B06.81.atm.1890.nc" size="327575548"/>
<file path="B06.81.atm.1890.nc" size="327575548"/>
</dir>
</list>
SRM-MKDIR
A component to make a directory at the target with recursive option.
-help |
print this message |
-d |
debugging messages |
-s url |
source URL for creating directories (required) |
-r url |
directory URL for starting point in SOURCE url (default: very beginning) |
-l path |
log file |
-c path |
path to the configuration file (default: hrm.rc) |
-u string |
user id (default: login@host) |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage passwd if necessary (no default) |
-at type |
Storage login type [PLAIN, GSI, ENCRYPT, KERBEROS, SSH] when provided (default GSI) |
-aj login |
Storage project id if necessary (no default) |
-ar login |
Storage read passwd if necessary (no default) |
-aw login |
Storage write passwd if necessary (no default) |
example: srm-mkdir.sol -s srm://garchive.nersc.gov/nersc/userid/path1/new_path -r srm://garchive.nersc.gov/nersc/userid/path1
SRM-STAGE (deprecated)
A component to request a file, a set of files or a directory to be copied over to the designated target directory or as a target file. Basic functionality is the same as the SRM-GET except there is no pulling data to the user’s local path. In other words, the target needs to be where SRM can access.
-help |
print this message |
-d |
debugging messages |
-f path |
input file path for request to get |
-sd url |
source URL on MSS (default: srm://host:port/dir/path) |
-sourced url |
source URL on MSS (default: srm://host:port/dir/path) |
-td url |
target URL for files (default: SRM cache) |
-tkdir |
do not create target directory structure (default: false) |
-trdir |
no target directory structure, and put all files into one flat directory (default: false) |
-s url |
source URL for request to get |
-g string |
Logical file name for source URL for request to get (optional) |
-t url |
target URL for request to get (default: file:/$PWD/same_file_name) |
-b int |
file size of source URL in bytes (default: 2GB) |
-l path |
log file |
-o option |
Option for overwrite mode [yes, no, diff_size, less_size] (default: no) diff: no overwriting, only differences in dir listings |
-e |
release after staging (default false) |
-r |
resursive directory (default false) |
-w |
no waiting after the request (default false) by default, srm-mstage will wait until request is completed and check the messages |
-c path |
path to the configuration file (default: hrm.rc) |
-u string |
user id (default: login@host) |
-x int |
Block size for gridftp |
-y int |
TCP buffer size for gridftp |
-z int |
Number of parallel streams for gridftp |
-ai path |
Path for Storage information if necessary (no default) in case of KERBEROS or SSH |
-al login |
Storage login name if necessary (no default) |
-ap passwd |
Storage passwd if necessary (no default) |
-at type |
Storage login type when provided (default GSI) [PLAIN, GSI, ENCRYPT, KERBEROS, SSH, SCP, NONE] GSI, ENCRYPT and NONE are implemented so far |
-aj login |
Storage project id if necessary (no default) |
-ad int |
Storage Retention period in days if necessary (no default) |
-ar string |
Storage read passwd if necessary (no default) |
-aw string |
Storage write passwd if necessary (no default) |
example 1: srm-stage.sol -f /tmp/myinputfile -t file:/tmp/my_directory_path -at gsi
format of input file: sourceURL fileSize(optional) targetURL(optional)
where sourceURL is required, and max default file size is 2GB.
example 2: srm-stage.sol -sd srm://garchive.nersc.gov/nersc/asim/test -at gsi
example 3: srm-stage.sol -conf ./hrm.rc -sd "srm://srm.lbl.gov:6169/nersc/asim/esg?remoteobj=myOBJ&msshost=garchive.nersc.gov" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -tkdir
example 4: srm-stage.sol -conf ./hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -trdir -tkdir
example 5: srm-stage.sol -conf ./hrm.rc -sd "srm://garchive.nersc.gov/nersc/asim/esg" -td gsiftp://srm.lbl.gov/tmp/client -z 4 -x 1000000 -d -r -e
example 6: srm-stage.sol -s srm://garchive.nersc.gov/nersc/userid/path/file2.dat -t gsiftp://srm.lbl.gov/tmp/client.data -z 4 -at gsi
SRM-STATUS
A component to request the status of the previously submitted request.
-help |
print this message |
-d |
debugging messages |
-f path |
input file path for request to get |
-s url |
source URL for request |
-sd url |
source directory URL for request, if applicable |
-t url |
target URL for request |
-g string |
logical file name for source URL for request (optional) |
-l path |
log file |
-c path |
path to the configuration file (default: hrm.rc) |
-u userid |
user ID for request (required) |
-r requestid |
request ID (required) |
-x int |
Block size for gridftp |
-y int |
TCP buffer size for gridftp |
-z int |
Number of parallel streams for gridftp |
examppe: srm-status.sol -f /tmp/myinputfile
srm-status.sol -s gsiftp://archive.nersc.gov/nersc/usrid/path
srm-status.sol -r my_request_id
format of input file: sourceURL fileSize(optional) targetURL(optional)
where sourceURL is required and max file size is 2GB.
SRM-PING
A component to check the health of HRM/DRM.
-help |
print this message |
-d |
debugging messages |
-c path |
path to the configuration file (default: hrm.rc) |
example: srm-ping.sol -conf hrm.rc
Configuration File:
DataMover programs use a configuration file to get connector information with the “hrm*” prefix. The configuration file can be shared with other HRM or DataMover programs.
NSHost and NSPort specify information about the naming service to connect.
hrm*NSHost=srm.lbl.gov
hrm*NSPort=6171
HRMName specifies the CORBA object name to submit the request via NSHost:NSPort.
hrm*HRMName=HPSSResourceManagerASIM
HRMLogFile specified the path to the event log, when EnableLogging is set to true.
hrm*EnableLogging=true
hrm*HRMLogFile=/home/dm/srm/data2/srm/log/out.client.log
Sample Configuration FILE
Note: Blank lines and a line with starting # character will be ignored.
Notes on prefixes:
common* : will be read by all components
drm* : will be read only by the DRM
trm* : will be read only by the TRM
hrm* : will be read only by the DataMover client programs.
Sample hrm.rc
####################################################
common*NSHost=srm.lbl.gov
common*NSPort=6171
common*VerboseConfig=true
####################################################
# For DataMover client programs
hrm*HRMLogFile=/srm/data2/srm/log/out.client.log
hrm*EnableLogging=true
hrm*HPSSHostName=garchive.nersc.gov
hrm*HRMName=DRMServerOnHRMASIM
hrm*MSSMaximumAllowed=5