Use Cases

We described below two use cases: data movement between nodes and data movement from a node to a client (user) site. We expect that each will require a different software tool. However, there are several scenarios that can use the same tool. For this reason, we labeled the three scenarios in use case 1, as scenarios 1.1, 1.2, 1.3. Similarly, we labeled the two scenarios in use case 2 as scenarios 2.1, 2.2.

Use Case 1: Move data between nodes

Scenario 1.1

A data node publishes a new dataset to its gateway
Another node needs to get some subset of the data
Example: PCMDI node needs to get 10% a published dataset from NCAR node (in order to generate or augment core dataset)
Target node (PCMDI) pulls data (can be initiated by a person or a service)
A requestID is returned, and transfer start asynchronoeously (background job)
Target administrator can check status of request using requestID
Size: increments are typically 10-20 TBs.
Note - a subsequent step by replica service is: PCMDI node get the corresponding metadata, publishes that to its gateway, and links the metadata to the physical files

Scenario 1.2

Data needs to be mirrored from source node to a target node
Example: BADC node (Germany) needs to mirror all or part of core data from PCMDI node
Mirroring (target) node is notified, and initiates bulk transfer "manually" - i.e. invokes a local client to pull data
A requestID is returned, and transfer start asynchronoeously (background job)
Target administrator can check status of request using requestID
Note: the above step may be automated at a later stage
Size: 10-500 TBs depending on what target node requests
Note - a subsequent step by target node administrator or some service:
Target node get the corresponding metadata, publishes that to its gateway, and links the metadata to the physical files

Scenario 1.3

A data node wishes to collect data from multiple source nodes in order to generate summary or visualization products
The target node initiates the data transfer by pulling data from the source nodes
A requestID is returned, and transfer start asynchronoeously (background job)
Target administrator can check status of request using requestID
Size: probably a few TBs - depending on purpose
Issue: should client at target be able to issue a single request to all source nodes
In principle - it is desirable and a natural extension of the bulk transfer client, but initially we assume this will be done from each source node at time, or by invoking the data movement client multiple times

Use Case 2: Move data from a node to a client (user) site

Scenario 2.1

A user contacts a gateway, and uses metadata to identify data he/she wishes to get
Note: case not considered here - if volume of data requested is small (a few files), the user can pull these file directly (using wget, https, ...). Gateway may have to pull files into its cache first.
Assume data volume is large - many GBs - few TBs
Gateway returns a requestID to user
User can check status of request using requestID
User can start "request client" right away
Gateway gets files into user's allocated space
User's request client pull data
Note: gateway may need to get files from remote locations/archives to user's allocated space

Scenario 2.2

A user contacts a gateway, and uses metadata to identify data he/she wishes to get
Note: case not considered here - if volume of data requested is small (a few files), the user can pull these file directly (using wget, https, ...). Gateway may have to pull files into its cache first.
Assume data volume is large - many GBs - few TBs
Gateway returns a requestID to user
User can start "request client" right away
User can check status of request using requestID
Request client pulls files directly from source nodes
Suggestion: consider that a task for a later stage