Arie Shoshani

Head, Scientific Data Management Group                                                                   Tel: (510) 486-5171

High Performance Computing Research Department                                                   Fax: (510) 486 -4004

Computational Research Division                                                                                Email: [email protected]

Lawrence Berkeley National Laboratory                                                                      http://www.lbl.gov/~arie

 

 

Education

·       Ph.D. Computer Sciences, Princeton University, 1969.

·       M. A. Computer Sciences, Princeton University, 1967.

·       B. S. (Summa cum Laude), Control Engineering, Technion -- Israel Institute of Technology, 1965.

Position History

·       Senior Staff Computer Scientist, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California, 1976-present.

·       Computer systems Specialist, System Development Corporation, Santa Monica, California, 1969-1976.

Awards and Honors

·       R&D 100 2008 award, for FastBit index, with Kesheng “John” Wu, Ekow Otoo, and Kurt Stockinger

·       Keynote speaker, Efficient Indexing Technology for Data Mining of Scientific Data, Keynote Talk, Fifth IEEE International Conference on Data Mining, November 2005.

·       Best paper award, International Supercomputer Conference, Germany, 2005, with John Wu, et al.

·       Patent, Word-Aligned Hybrid compression method.  US Patent 6,831,575. 2004, with John Wu and E. Otto.

·       Hottest Infrastructure Award – SuperComputing 2000 Network Challenge - A Data Management Infrastructure for Climate Modeling Research (a collaboration of several laboratories)

·       Elected to the Very Large Data Bases (VLDB) Endowment Board, 1988-1996.

·       Elected Vice-President of the Very Large Data Bases (VLDB) Endowment Board, 1997-1998.

·       Best paper award, ACM-SIGMOD Conference on Management of Data, 1987.

·       Chairman of Steering Committee for the Scientific and Statistical Data Base Management (SSDBM) Conference, 1982-present.

·       General Chairman of the Fourteenth VLDB Conference, 1998.

·       Associate editor for the ACM Transactions on Database Systems (TODS), 1982 – 1986.

·       Keynote Speaker, 5th Conference on Knowledge and Information Management Conference (CKIM), 1996.

·       Invited tutorial speaker, Symposium on Principles of Database Systems (PODS), 1997.

Research Interests

Semantic data models, query languages, temporal data, efficient access from tertiary storage, statistical and OLAP databases, and database techniques for scientific database applications.

Narrative

I have been the head of the Scientific Data Management Research Group at LBNL since 1978.  Our research activities focus on the development of algorithms and software for the organization, access and manipulation of scientific databases (SDBs).  Our areas of research fall into three main categories: logical modeling and user interfaces (which include modeling of SDBs, query languages, graphical user interfaces, and modeling of temporal, sequence, and multi-dimensional data), physical organization and access methods (which include bitmap indexing of scientific data, temporal data structures, and multi-dimensional data structure), and algorithms for special SDB operators (such as sampling, transposition, and aggregation).  Currently, I am the director of a Scientific Data Management (SDM) Integrated Software Infrastructure Center (budget $3 Million) at the Department of Energy.  In this capacity, I am coordinating the work of collaborators from 4 DOE laboratories and 4 universities (see http://sdmcenter.lbl.gov). 

The Scientific Data Management research group that I am heading has been very productive and visible in the research community.  Our group has been and continues to be involved in practical projects (such as the Human Genome, a Climate modeling, combustion modeling, High Energy Physics, and others), and has been applying their research results by providing prototype software to real scientific data management problems.  Our work has established the fields of Statistical Data Management and Scientific Data Management as important research areas with unique challenging problems.  We have initiated the conferences on Statistical and Scientific Data Base Management (SSDBM). I am continuing to serve as the chair of the steering committee for this conference.  In 1998, a product that was developed in my group, called the OPM database tools, was commercialized, and has been used by biotech and pharmaceutical companies.  We continue to use this product in projects in my group.  More recently a patent was awarded to two members of my group and me for developing a highly efficient specialized bitmap indexing method, which is deployed in various projects.

In addition to management and administrative duties, my own technical work is mainly in the characterization of SDBs unique requirements, query languages, modeling of statistical data, temporal data, sequence data, multi-dimensional data, and data compression.  More recently, I have been involved with several scientific projects, including Storage Resource Management (SRM) for the Grid, a microbial meta-database, distributed access of climate modeling data, and bitmap indexing and organization of High Energy Physics data on tertiary storage.  I have been and continue to be involved in many professional activities outside the Laboratory, Including chairing and participating on various program committees. I have published over 70 papers in refereed Journals and conferences.

Selected Publications

·       Rishi Rakesh Sinha, Marianne Winslett, Kesheng Wu, Kurt Stockinger, Arie Shoshani: Adaptive Bitmap Indexes for Space-Constrained Systems. ICDE 2008: 1418-1420

·       Kesheng Wu, Kurt Stockinger, Arie Shoshani: Breaking the Curse of Cardinality on Bitmap Indexes. SSDBM 2008: 348-365

·       A. Shoshani, et al: Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations. MSST 2007: 47-59

·       Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, Joseph M. Hellerstein: Enabling Real-Time Querying of Live and Historical Stream Data. SSDBM 2007: 28

·       Elaheh Pourabbas, Arie Shoshani: Efficient estimation of joint queries from multiple OLAP databases. ACM Trans. Database Syst. 32(1): 2 (2007)

·       Kesheng Wu, Ekow J. Otoo, Arie Shoshani: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1): 1-38 (2006)

·       Grid Collector: Facilitating Efficient Selective Access from Data Grids, Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur M. Poskanzer, Arie Shoshani, Alexander Sim, and Wei-Ming Zhang, In Proceedings of International Supercomputer Conference 2005 (Best Paper Award)

·       Impact of Admission and Cache Replacement Policies on Response Times of Jobs on Data Grids, Ekow Otoo, Doron Rotem and Arie Shoshani, Cluster Computing Journal, Springer, October 2005, pp. 293-303.

·       RRS: Replica Registration Service for Data Grids, Arie Shoshani, Alex Sim, Kurt Stockinger, Proceedings of VLDB Workshop on Data Management in Grids (VLDB-Grids'05), September 2005.

·       Co-Scheduling of Computation and Data on Computer Clusters, A. Romosan, D. Rotem, A. Shoshani and D. Wright, Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM 2005).

·       On the performance of bitmap indices for high cardinality attributes, Kesheng Wu, Ekow J. Otoo, Arie Shoshani, International conference on Very Large Data Bases (VLDB 2004) 24-35

·       DataMover: Robust Terabyte-Scale Multi-file Replication over Wide-Area Networks, A. Sim, J. Gu, A. Shoshani, V. Natarajan, Scientific and Statistical Database Management conference (SSDBM 2004), 403-411.

·       Storage Resource Managers: Essential Components for the Grid, Arie Shoshani, Alexander Sim, and Junmin Gu, chapter in book: Grid Resource Management: State of the Art and Future Trends, Edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan weglarz, Kluwer Academic Publishers, 2003

·       Using Bitmap Index for Interactive Exploration of Large Datasets. Kesheng Wu, Wendy S. Koegler, Jacqueline Chen, Arie Shoshani, Scientific and Statistical Database Management conference (SSDBM 2003), 65-74.

·       A Performance Comparison of bitmap indexes, Kesheng Wu, Ekow J. Otoo, Arie Shoshani, ACM International Conference on Information and Knowledge Management (CIKM’01), 559-561.

·       Storage Resource Managers: Middleware Components for Grid Storage, ·Arie Shoshani, Alex Sim, Junmin Gu, Nineteenth IEEE Symposium on Mass Storage Systems, 2002 (MSS '02).

·       Extending OLAP Querying to External Object Databases, T. Pedersen, A. Shoshani, J. Gu, C.S. Jensen, 9th International Information and Knowledge Management (CKIM'00).

·       Coordinating Simultaneous Caching of File Bundles from Tertiary Storage, A. Shoshani, A. Sim, L. M. Bernardo, H. Nordberg, Scientific and Statistical Database Management conference (SSDBM 2000).

·       Storage Management Techniques for Very Large Multidimensional Datasets, A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim, Eleventh International Conference on Scientific and Statistical Database Management (SSDBM 1999).

·       Determining the Optimal File Size on Tertiary Storage Systems Based on the Distribution of Query Sizes, L. Bernardo, H. Nordberg, D. Rotem, and A. Shoshani, Tenth International Conference on Scientific and Statistical Database Management, (SSDBM 1998).

·       Summarizability in OLAP and Statistical Databases, (with H. Lenz), Ninth International Conference on Scientific and Statistical Database Management (SSDBM 1997).

·       OLAP and Statistical Databases: Similarities and Differences, in Proceedings of the Symposium on Principles of Database Systems (PODS) 1997 (invited tutorial).

·       A Temporal Data Model Based on Time Sequences, (with A. Segev), book chapter in Temporal Databases: Theory, Design, and Implementation, Edited by A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, Benjamin/Cummings, 1993.

·       Representing Extended Entity-Relationship Structures in Relational Databases: A Modular Approach, (with V. Markowitz), ACM Trans. on Database Systems, 17, 3 (September 1992), pp. 423-464.

·       Logical Modeling of Temporal Data, Best Paper Award, (with A. Segev), Proceedings of the International Conference on Management of Data (SIGMOD), May 1987, best paper award.