Abstract
Scientific data sets that are growing rapidly in the volume are often attached with plentiful metadata, such as their associated experiment or simulation information. Otherwise, it becomes difficult for them to be utilized and their value is lost over time. Ideally, metadata should be managed along with its corresponding data by a single storage system, and can be accessed and updated directly. However, existing storage systems in high-performance computing (HPC) environment, such as Lustre parallel file system, still uses a static metadata structure composed of non-extensible and fixed amount of information. The burden of metadata management falls upon the end-users and require ad-hoc metadata management software to be developed.
With the advent of "object-centric" storage systems, there is an opportunity to solve this issue. In this effort, we present SoMeta, a scalable and decentralized metadata management approach for object-centric storage in HPC systems. SoMeta provides a flat namespace that is dynamically partitioned, a tagging approach to manage metadata that can be efficiently searched and updated, and a light-weight and fault tolerant management strategy. In our experiments, SoMeta achieves up to 3.7X speedup over Lustre in performing common metadata operations, and up to 16X faster than SciDB and MongoDB for advanced metadata operations, i.e., tag and search. Additionally, being different from existing storage systems, SoMeta offers scalable user-space metadata management by allowing user-defined number of metadata servers depending on their workload.

Publications

  • Houjun Tang, Suren Byna, Bin Dong, Jialin Liu, and Quincey Koziol, "SoMeta: Scalable Object-centric Metadata Management for High Performance Computing", The IEEE Cluster Conference 2017 [Preprint version]