In this section, we provide a detailed description of three data transfer planes for end-to-end resource provisioning: control plane, data plane, and management plane.
The control plane consists of a storage resource management, network provisioning components, and a generic network plug-in and underlying library to allow transfer applications to directly interact with heterogeneous storage and network resource provisioning systems for the creation of end-to-end paths among storage sites. The data storage management component is expected to dynamically allocate caching space from storage to necessitate data transfer, and to manage data into/out of the storage media via allocated caching space. We will leverage the Storage Resource Managers (SRM) protocol standard [SRM-OGF] and reference implementations from storage communities to allocate caching space, make a request for target data if it is not in the space, pin files into the space during data transfer, and release files after transfer is finished.
The current GridFTP and other transfer protocols assume best-effort IP networks and improve performance with a large number of TCP streams for long, round-trip connections. Fairness and efficiency are all affected in such a brute force data transfer mode. One primary goal is to move beyond the best-effort based data transfer with no delivery time guarantee, and to extend the quality of service and performance guarantees to higher level data transfer applications with existing network and storage provisioning systems. To simplify the implementation and encapsulate network and storage co-scheduling required by the control plane, we will extend the existing storage schedule and reservation module in BeStMan to reserve end-to-end network paths and, therefore, make intelligent optimizations among storage space volume, lifetime reserved space, network bandwidth and reservation lifetime, and reliability, thus reducing the “impedance mismatch” between end user data transfer applications, storage, and the networks. The role of this enhanced BeStMan is to continuously match application needs with currently-reserved network connection capabilities. The intended outcome is to enable scientific teams to benefit from both shared and dedicated high speed connections. We will create system level “bridging” technologies to: 1) effectively share bandwidth reservation between the different communication needs of a single distributed application, and 2) balance the interaction between the constant bandwidth of dedicated channels and the variable capacity of IP networks to improve application performance.
More specifically, we will add network provisioning plug-ins and libraries that will enable BeStMan to make web services calls to either TeraPaths or OSCARS web services, and to exchange specialized messages for path bandwidth reservation, modification, and reservation. The following piece of Java pseudo code provides a conceptual description on how to invoke TeraPaths inside BeStMan. It shows that BeStMan already has a complete view of information regarding data transfer, and the parameters needed by the network provisioning can be derived from BeStMan view. Therefore, BeStMan provides a natural and efficient gateway to invoke network provisioning and QoS. The actual implementation in the proposal will be more complex because it needs to interact with multiple provisioning systems and support more proposed features. We will develop a novel, intelligent plug-in that can work with multiple, heterogeneous network provisioning systems to query individual network domains, and to make realistic bandwidth requests. Additional features can be added to the plug-in architecture, such as advance reservations, paths negotiation, prioritization/reprioritization of among multiple network requests, path extensions and revocation. This network provisioning plug-in for BeStMan provides a critical hinge to couple storage management and network provisioning for upper data transfer applications.
Private initialization { /# Load QoS library and Create a QoS PlugIn #/ if (configuration.getQosPluginClass()!=null) this.qosPlugin = QOSPluginFactory.createInstance(configuration);} private void makeQosReservation(int i) throws MalformedURLException, SRMException { try { CopyFileRequest cfr = (CopyFileRequest) fileRequests[i]; RequestCredential credential = RequestCredential.getRequestCredential(credentialId); QOSTicket qosTicket = qosPlugin.createTicket( credential.getCredentialName(), (storage.getFileMetaData((SRMUser)getUser(),cfr.getFromPath())).size, from_urls[i].getURL(), from_urls[i].getPort(), from_urls[i].getPort(), from_urls[i].getProtocol(), to_urls[i].getURL(), to_urls[i].getPort(), to_urls[i].getPort(), to_urls[i].getProtocol()); qosPlugin.addTicket(qosTicket); if (qosPlugin.submit()) { cfr.setQOSTicket(qosTicket); say("QOS Ticket Received "+qosPlugin.toString()); } } catch(Exception e) { say("Could not create QOS reservation: "+e.getMessage()); } }
The management plane includes: monitoring, error handling/fault tolerance, and security.
In the management plane, monitoring serves the following crucial purposes:
Several monitoring systems in use today, notably FTP progress heartbeat, BeStMan storage resource usage monitor, perfSONAR [perf-web], and MonALISA [Mona-web], provide a wealth of information of each individual component in the data plane. We plan to obtain monitoring data from such systems and process them through the existing software plug-ins. We also anticipate the need for extending the range of information made available to automated data transfer instances through modifying existing means, and via the addition of new data collection mechanisms. These mechanisms will collect specific data that are not available by default but are nevertheless useful for end-to-end resource management purposes. As required, we will develop data transfer specific monitoring components in the context of the widely supported perfSONAR framework to leverage its capabilities and provide inter-operability with other existing perfSONAR sites, and to integrate with existing BeStMan file transfer progress and storage usage monitoring. Such a monitoring framework would provide a complete, global view into the data plane between end-to- end transfer parties. This information can be used by BeStMan to make intelligent optimization decisions and avoid resource mismatches, and to provide faster error diagnosis and fault tolerance than could be done by any individual monitoring component or system.
Due to the distributed and dynamic nature of an end-to-end data transfer environment, several failure scenarios have to be considered in order to mitigate the impact of such failures, recover automatically if possible, and improve the overall reliability of the system. Network device failures, software crashes, failures of data servers, and even individual end systems and network domains joining and leaving the resource pool for any reason, can all have a range of adverse effects upon a data transfer. The data transfer application and BeStMan will rely on both passive network monitoring information collected from the monitoring framework (such as perfSONAR) and active, periodic, user level probing (nothing is more effective than to use the provisioned service directly as users) to quickly detect failures. To extend fault restoration within the end-to-end data plane, maintaining overall status awareness and having all physical network domains and two transfer ends involved cooperate and coordinate in the recovery process is essential. We plan to work closely with BeStMan/OSCAR/INTERNET2 DCN/!TeraPaths collaborators to design the mechanism of a failure alarm notification to responsible parties, as well as a coordinated service recovery procedure. Failure recovery can failover to redundant backup storage and network services reserved in advance, or in real time when failure appears, or, in the worst case, fallback to best-effort networking and opportunistic storage space.
The data transfers traverse heterogeneous components, each of which may have different usage policies. These policies, including authentication, authorization, and application constraints are critical for maintaining resource ownership and security, and allocating and prioritizing resources based on user organization, roles, and groups. There is no single security mechanism, in term of authentication and authorization, which can be applied to all application domains and resources with different security requirements. The end-to-end resource provisioning tools have to support multiple security mechanisms, including Kerboros, simple login/password, SSL, and Grid security Infrastructure (GSI) even within the same data transfer activities. For example, end sites may require Kerboros and the network provisioning system may require plain password. We will leverage the existing security stack in BeStMan and GUMS [GUMS-Paper, GUMS-Web], a Generic Grid User Management System which was initiated by one of the co-PIs.
BeStMan will be enhanced to implement a pluggable authentication and access control stack allowing users to choose from a wide variety of security mechanism, including anonymous access, protected password, X.509 certificate, Kerboros, SSL, etc. To support the complex environment with stringent security requirements, BeStMan can choose to delegate the authentication and authorization to GUMS, where the GUMS service is used to map users or end system credentials to resource-specific identities/credentials (e.g., UNIX accounts or Kerberos principals) in accordance with the site's resource usage policy. GUMS can be configured to map users dynamically per each request submitted. GUMS is already used by all Open Science Grid (OSG) [OSG-web] sites to manage the security and authentication, and mitigate the policy heterogeneity and complexity rising from distributed environment.