Instructions to download and run stand-alone DataMoverLite (DML-4) 
January 26, 2012
 
Questions?� email: [email protected]
 
1) Go to https://codeforge.lbl.gov/projects/dml/ and click on �download DML� to any directory or desktop.
It will download a Gzip�ed file.
 
2) Unzip the file, and extract content into any directory or desktop.
It will create a directory �dml�.

 

3) Locate java directory (usually under Program Files on Windows).

Check that jre1.5.0 or above exists; if not, you need to download it from java.sun.com

 

[Download� latest version of jre on Windows from http://www.oracle.com/technetwork/java/javase/downloads/index.html]

 

4) For Windows: in the dml directory, you need to edit the file: win-dml.bat

Open with Notepad, and add C:\Program Files\Java\jre1.5.0_11\bin\ in front of the first word: java�, then put quotes (�) before and after C:\Program Files\Java\jre1.5.0_11\bin\java.

(For Unix, skip this step).

 

5) For Windows: double click win-dml.bat, it will pop up an DML window.

��� For Unix: double click dml, it will pop up an DML window.

 

6) On main DML screen, click on �browse� to choose a target directory for future downloaded files.� It can be any directory.

 

8) You need to have an wget-script.sh/input-xml file with list of files in the format that is explained below in Appendix D.� An example file that uses three different transfer protocols is the file mixedrequest.xml in under dml/samples.

 

9) Now, in main DML screen, click on "Open" and choose your wget-script.sh/input-xml file.� Select file and click �open�. It will show the information about the file on the DML screen.� (to try a single transfer to verify that installation is OK, select the file: �sample.http.xml� form the dml/samples directory).

 

10) In main DML screen, click on �transfer�.� The screen will show the progress of the file downloads into your target-directory.

By default files are downloaded in the target-directory in hierarchy. If you want them to get downloaded in flat-fashion. Choose Menu->Preferences and unclick the DownInHierachy checkbox.

 

11) Setup of advanced parameters can be done in order to have concurrent file transfers that can speed up the total transfer rates, as well as GridFTP parameter setup for window buffer size and number of parallel streams.� See Appendix E.

 

Appendices

 

A. Obtaining ESGF credential from the myproxy server

Click on Get Credential, "Login/Password" tab, click on the radio button "ESGF Credential"

choose the gateway from the pull-down menu.

Put-in login/passphrase; click on �Get Credential�.

 

B. Using OpenID for login

Click on Get Credential, "OpenId" tab, click on the radio button "OpenID login"

Put-in your OpenID

Put-in password; click on �Get Credential�.

If your login is different than the last part of the openID, please change your login name in the text field

 

C. Catalog Browsing in DML for HTTPS transfers.

Catalog Browsing is enabled in DML now, currently it browses the NASA and PCMDI catalog.

Click on the "BrowseCatalog" and choose the desired site.

Browse and search the catalog and click on the files that you want to transfer. If you want to transfer all files select "TransferAll", else select "Transfer Selected", the selected files will move to the transfer panel, click "Transfer".

 

D. The format of the input-xml file

 

The input-xml file is formatted to contain a list of files and their sizes in an XML format.

For example:

<files>

���� <sourceurl> http://www.lbl.gov/index.html </sourceurl>

���� <size> 24576 </size>

� </files>

Note that the file name is a URL with the protocol being �http� in this case.

Note also that the size is in bytes.

 

Multiple file lists have to be wrapped in <request> labels as shown in mixedrequest.xml for example.

 

E. Speeding up the file transfers

 

DML is capable of transferring multiple files concurrently, in order to achieve better global transfer rates.� For example, if the transfer rate of a single file is limited by the transfer protocol to be 1 MB/s, having 5 concurrent file transfers should provide 5 MB/s if the network connection permits that.� However, some machines (such as regular laptops) cannot support many concurrent transfers since each requires the allocation of a buffer space.�

A new feature SplitTransfer is implemented in DML. Which speeds the normal transfer by 5 times. For each file 5 connections are opened up and files are streamed to the target directory. SplitTransfer can also be used along with concurrency.

 

The default �concurrency� is set to 1.� To setup higher concurrency, click on �Preferences -> concurrent transfers� and change to the desired level.� Start with a few to see if your machine can handle that, and increase till no benefit is achieved or the operations slows down.

 

If you are using GridFTP, two parameters can be setup.� One is �parallel streams� which instructs GridFTP to send multiple streams for the same file transfer.� This is useful if the files transferred are very large, in the order of many GBs.� It is similar to concurrent transfers but apply to a single file transfer.� Again, having too many streams seems to have diminishing return.� It is generally advisable to go no higher than 6-7 parallel streams.

 

The default �parallelism� level is set to 1.� To setup higher concurrency, click on �options -> parallelism� and change to the desired level.� Start with a few (2-3) to see if your machine can handle that, and increase till no benefit is achieved or the operations slows down.

 

The second parameter that can be setup for GridFTP is the �window buffer size�.� This tells the GridFTP transfer software to move data in chunks of a certain size.� Larger �window sizes� are better when large files are transferred.

 

The default �window buffer size� level is set to about 1 MB.� To setup higher or lower buffer size, click on �options -> buffer size� and change to the desired level.� 1 MB is the commonly recommended level, but if the receiving end can handle larger buffer sizes, increasing to 2 or more MBs can speed up transfers.