Back to BDM Home
Use Cases
We described below two use cases: data movement between nodes and data movement from a node to a client (user) site. We expect that each will require a different software tool. However, there are several scenarios that can use the same tool. For this reason, we labeled the three scenarios in use case 1, as scenarios 1.1, 1.2, 1.3. Similarly, we labeled the two scenarios in use case 2 as scenarios 2.1, 2.2.
Use Case 1: Move data between nodes
Scenario 1.1
- A data node publishes a new dataset to its gateway
- Another node needs to get some subset of the data
- Example: PCMDI node needs to get 10% a published dataset from NCAR node (in order to generate or augment core dataset)
- Target node (PCMDI) pulls data (can be initiated by a person or a service)
- A requestID is returned, and transfer start asynchronoeously (background job)
- Target administrator can check status of request using requestID
- Size: increments are typically 10-20 TBs.
- Note - a subsequent step by replica service is: PCMDI node get the corresponding metadata, publishes that to its gateway, and links the metadata to the physical files
Scenario 1.2
- Data needs to be mirrored from source node to a target node
- Example: BADC node (Germany) needs to mirror all or part of core data from PCMDI node
- Mirroring (target) node is notified, and initiates bulk transfer "manually" - i.e. invokes a local client to pull data
- A requestID is returned, and transfer start asynchronoeously (background job)
- Target administrator can check status of request using requestID
- Note: the above step may be automated at a later stage
- Size: 10-500 TBs depending on what target node requests
- Note - a subsequent step by target node administrator or some service:
- Target node get the corresponding metadata, publishes that to its gateway, and links the metadata to the physical files
Scenario 1.3
- A data node wishes to collect data from multiple source nodes in order to generate summary or visualization products
- The target node initiates the data transfer by pulling data from the source nodes
- A requestID is returned, and transfer start asynchronoeously (background job)
- Target administrator can check status of request using requestID
- Size: probably a few TBs - depending on purpose
- Issue: should client at target be able to issue a single request to all source nodes
- In principle - it is desirable and a natural extension of the bulk transfer client, but initially we assume this will be done from each source node at time, or by invoking the data movement client multiple times
Use Case 2: Move data from a node to a client (user) site
Scenario 2.1
- A user contacts a gateway, and uses metadata to identify data he/she wishes to get
- Note: case not considered here - if volume of data requested is small (a few files), the user can pull these file directly (using wget, https, ...). Gateway may have to pull files into its cache first.
- Assume data volume is large - many GBs - few TBs
- Gateway returns a requestID to user
- User can check status of request using requestID
- User can start "request client" right away
- Gateway gets files into user's allocated space
- User's request client pull data
- Note: gateway may need to get files from remote locations/archives to user's allocated space
Scenario 2.2
- A user contacts a gateway, and uses metadata to identify data he/she wishes to get
- Note: case not considered here - if volume of data requested is small (a few files), the user can pull these file directly (using wget, https, ...). Gateway may have to pull files into its cache first.
- Assume data volume is large - many GBs - few TBs
- Gateway returns a requestID to user
- User can start "request client" right away
- User can check status of request using requestID
- Request client pulls files directly from source nodes
- Suggestion: consider that a task for a later stage