Investigators: Suren Byna (LBNL), P. Carns (ANL)
Program Manager: Lucy Nowell
LBNL researchers developed a scalable, object-centric metadata management system targeting high-performance computing (HPC) systems, called SoMeta, that achieved 15X to 40X faster performance than Lustre in searching the metadata stored in ~277,000 data objects from Sloan Digital Sky Survey (SDSS) data. SoMeta is also 10X to 90X faster than distributed database technologies, such as SciDB, in searching the SDSS metadata.
An overview of SoMeta system: Scalable service threads are placed in user space, one per compute node. Uses a combination distributed hash table and bloom filter for accelerating metadata object search process.
Performance of SoMeta: In searching SDSS metadata objects, SoMeta outperforms Lustre by up to 40X, and SciDB and MongoDB by up to 90X (Y-axis is in log-scale.)
Scientific experiments and simulations produce massive number of data files with numerous variables. When the files and data variables are stored as objects with extensive metadata attached to them, existing file systems are incapable of locating the data scientists require in a scalable way.