People Publications Projects

Back to StorNet Home

Future extension: Resource Co-scheduling Component

In the DOE petascale science environment, the resources located at each site (e.g., computing power and storage space) have to be allocated jointly via network resources to achieve a cost-effective data transfer. For instance, a site with rich storage resources may not be a good candidate for data backup if its network connectivity with other sites is poor. Furthermore, in such an environment where users share and compete for resources, it is critical to achieve efficient resource utilization via good co-scheduling schemes. Therefore, we propose to study a general resource co-scheduling (RCS) problem as follows: Given a set of limited resources of different types, and a variety of data-intensive applications, determine how to optimally allocate and schedule the resources required by each application. Take the application which only requires a simple end-to-end data transfer as example: it may ask for a bandwidth-guaranteed circuit for reliable data transfer, and also a number of CPUs and hard drive disks in order to process and store the data. We need to jointly allocate and co-schedule all types of resources required. We will develop analytical models of resource co- scheduling for two purposes: one is to use these models as the basis for actual schedulers, and the second is to use them to evaluate the performance of scheduling algorithm candidates. In addition, we propose to study efficient scheduling algorithms to solve several specializations of the general RCS problem listed below:

As mentioned, the schedulers to be developed should be able to handle multiple sources and targets of large volume data over large scale networks, multiple classes of load, and multiple transfers per request. Time optimal schedules that can be developed often require perfect information on system state (e.g, available processor and link speed), so there are some advantages to look at schedulers that can operate with less than complete information and at heuristic schedulers. Optimal schedulers can be used as a benchmark against which to compare these heuristic schedulers.