Welcome to the Scientific Data ManagementA SciDAC Center of the Department of EnergyClick here for an organization chart of the Center |
|
|
Managing scientific data has been identified as one of the most important
emerging needs by the scientific community because of the sheer volume
and increasing complexity of data being collected. Effectively generating,
managing, and analyzing this information requires a comprehensive, end-to-end
approach to data management that encompasses all of the stages from the
initial data acquisition to the final analysis of the data. Based on the
community input, we have identified three significant requirements. First,
more efficient access to storage systems is needed. In particular, parallel
file system improvements are needed to write and read large volumes of
data without slowing a simulation, analysis, or visualization engine.
Second, scientists require technologies to facilitate better understanding
of their data, in particular the ability to effectively perform complex
data analysis and searches over large data sets. Specialized feature discovery,
parallel statistical analysis, and efficient indexing are needed before
the data can be understood or visualized. Finally, generating the data,
collecting and storing the results, data post-processing, and analysis
of results is a tedious, fragmented process. Workflow tools for automation
of this process in a robust, tractable, and recoverable fashion are required
to enhance scientific exploration.
We have organized our activities in three layers that abstract the end-to-end data flow described above. We labeled the layers as Storage Efficient Access (SEA), Data Mining and Analytics (DMA), and Scientific Process Automation (SPA). as shown in the figure above. The SEA layer is immediately on top of hardware, operating systems, file systems, and mass storage systems, and provides parallel data access technology and transparent access to archival storage. The DMA layer, which builds on the functionality of the SEA layer, consists of indexing, feature selection, and parallel statistical analysis technology. The SPA layer, which is on top of the DMA layer, provides the ability to compose scientific workflows from the components in the DMA layer as well as application specific modules. Figure 1 shows this organization and the components developed by the center and applied to various scientific applications.
|
|