Autonomous Data Management and Integrating Distributed Data

Alex Sim

Autonomous Data Managment

We have worked on a number of techniques such as indexing and reshaping arrays automatically for accessing information in large data efficiently. To better support scientific data analytics without placing burden on scientists, we are currently working on strategies for autonomous and efficient parallelization of workloads, exchanging data neighbors, execution of analytics operations over deep memory and storage hierarchies.

Integrating Distributed Data

On top of autonomous data management, we are working on methods for data integration by collecting features and patterns into searchable indexes. Performing feature- and pattern-based searches on a variety of data types and a range of different query operations, with novel indexing and querying methods.For example, we have applied one of our pattern-based data search methods on data reduction.

Pattern-Based Streaming Data Reduction

Applications such as power grid sensor monitoring generate so much data very quickly, which presents a challenging storage requirement, or network transmission capacity. Our pattern-based statistical similarity method can be applied for data reduction, using a lossy compression method for some repeated information. They are primarily designed to minimize the Euclidean distance between the original data and the compressed data, and this measure of distance severely limits either reconstruction quality or compression performance.

Our method (IDEALEM) based on block-based pattern search redefines the distance measure with a statistical concept known as exchangeability. In our study with a set of power grid monitoring data, it can reduce the volume of data much more than the best-known compression method while maintaining the quality of the compressed data, far exceeding 100. It also has a small memory footprint (64K memory).