Arie Shoshani

Head, Scientific Data Management Group                                                                                    Tel: (510) 486-5171

High Performance Computing Research Department                                                                 Fax: (510) 486 -4004

Computational Research Division                                                                                                   Email: [email protected]

Lawrence Berkeley National Laboratory                                                                                       http://www.lbl.gov/~arie

                                                          

Education

·         Ph.D. Computer Sciences, Princeton University, 1969.

·         M. A. Computer Sciences, Princeton University, 1967.

·         B. S. (Summa cum Laude), Control Engineering, Technion -- Israel Institute of Technology, 1965.

Position History

·         Senior Staff Computer Scientist, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California, 1976-present.

·         Computer systems Specialist, System Development Corporation, Santa Monica, California, 1969-1976.

Awards and Honors

·         R&D 100 2008 award, for FastBit index, with Kesheng “John” Wu, Ekow Otoo, and Kurt Stockinger.

·         Plenary speaker, Scientific Data Management, SIAM Computational Science & Engineering (CSE'07)

·         Keynote speaker, Efficient Indexing Technology for Data Mining of Scientific Data, Keynote Talk, Fifth IEEE International Conference on Data Mining, November 2005.

·         Best paper award, International Supercomputer Conference, Germany, 2005, with John Wu, et al.

·         Patent, Word-Aligned Hybrid compression method.  US Patent 6,831,575. 2004, with John Wu and E. Otto.

·         Elected to the Very Large Data Bases (VLDB) Endowment Board, 1988-1996.

·         Elected Vice-President of the Very Large Data Bases (VLDB) Endowment Board, 1997-1998.

·         Best paper award, ACM-SIGMOD Conference on Management of Data, 1987.

·         Chairman of Steering Committee for the Scientific and Statistical Data Base Management (SSDBM) Conference, 1982-present.

·         General Chairman of the Fourteenth VLDB Conference, 1998.

·         Associate editor for the ACM Transactions on Database Systems (TODS), 1982 – 1986.

·         Editor, Computational Science & Discovery, 2008-2012.

·         Keynote Speaker, 5th Conference on Knowledge and Information Management Conference (CKIM), 1996.

·         Invited tutorial speaker, Symposium on Principles of Database Systems (PODS), 1997.

Research Interests

Semantic data models, query languages, temporal data, efficient access from tertiary storage, statistical and OLAP databases, and database techniques for scientific database applications.

Narrative

I have been the head of the Scientific Data Management Research Group at LBNL since 1978.  Our research activities focus on the development of algorithms and software for the organization, access and manipulation of scientific databases (SDBs).  Our areas of research fall into three main categories: logical modeling and user interfaces (which include modeling of SDBs, query languages, graphical user interfaces, and modeling of temporal, sequence, and multi-dimensional data), physical organization and access methods (which include bitmap indexing of scientific data, temporal data structures, and multi-dimensional data structure), and algorithms for special SDB operators (such as sampling, transposition, and aggregation).  Currently, I am currently the director of the Scalable Data Management, Analysis, and Visualization (SDAV) institute (budget $5 Million/year), funded by Department of Energy.  The institute includes members from 6 DOE laboratories and 7 Universities (see: http://sdav-scidac.org).   Over the last 10 years, I was the director of a Scientific Data Management Center for Enabling Technology (budget $3.3 Million per year).  In this capacity, I coordinated the work of collaborators from 5 DOE laboratories and 5 universities (see http://sdmcenter.lbl.gov).

 

The Scientific Data Management research group that I am heading has been very productive and visible in the research community.  Our group has been and continues to be involved in practical projects (such as the microbial Genome, Climate modeling, combustion modeling, High Energy Physics, and others), and has been applying their research results by providing prototype software to real scientific data management problems.  Our work has established the fields of Statistical Data Management and Scientific Data Management as important research areas with unique challenging problems.  I have initiated in 1982 the conference series on Statistical and Scientific Data Base Management (SSDBM), and am continuing to serve as the chair of the steering committee for this conference.  Recently a patent was awarded to two members of my group and me for developing a highly efficient specialized bitmap indexing method, called FastBit (see: https://sdm.lbl.gov/fastbit/), which has been deployed in many projects. In addition to management and administrative duties, my own technical work is mainly in the characterization of SDBs unique requirements, query languages, modeling of statistical and scientific data, temporal data, sequence data, multi-dimensional data, and data compression.  More recently, I have been involved with several scientific projects, including Storage Resource Management (SRM) for the Grid, a microbial meta-database, distributed access of climate modeling data, and bitmap indexing and access of High Energy Physics data from tertiary storage.  I have been and continue to be involved in many professional activities outside the Laboratory, Including chairing and participating on various program committees. I have published over 150 papers in refereed Journals and conferences.

Selected Publications

·         M. Balman, E. Chaniotakis, A. Shoshani, A. Sim. “An Efficient Reservation Algorithm for Advanced Network Provisioning”, ACM/IEEE Supercomputing Conference 2010 (SC10)

·         E. Pourabbas and A. Shoshani, Improving Estimation Accuracy of Aggregate Queries on Data Cubes, Data & Knowledge Engineering 69 (2010) 50–72.

·         Arie Shoshani and Doron Rotem, Editors, Scientific Data Management: Challenges, Technology, and Deployment, (Chapman & Hall/CRC Computational Science), December 2009.

·         Rishi Rakesh Sinha, Marianne Winslett, Kesheng Wu, Kurt Stockinger, Arie Shoshani: Adaptive Bitmap Indexes for Space-Constrained Systems. ICDE 2008: 1418-1420

·         Kesheng Wu, Kurt Stockinger, Arie Shoshani: Breaking the Curse of Cardinality on Bitmap Indexes. SSDBM 2008: 348-365

·         A. Shoshani et al: Storage Resource Managers: Recent International Experience on Requirements and Multiple Co-Operating Implementations. MSST 2007: 47-59

·         Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, Joseph M. Hellerstein: Enabling Real-Time Querying of Live and Historical Stream Data. SSDBM 2007: 28

·         Elaheh Pourabbas, Arie Shoshani: Efficient estimation of joint queries from multiple OLAP databases. ACM Trans. Database Syst. 32(1): 2 (2007)

·         Kesheng Wu, Ekow J. Otoo, Arie Shoshani: Optimizing bitmap indices with efficient compression. ACM Trans. Database Syst. 31(1): 1-38 (2006)

·         Grid Collector: Facilitating Efficient Selective Access from Data Grids, Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur M. Poskanzer, Arie Shoshani, Alexander Sim, and Wei-Ming Zhang, In Proceedings of International Supercomputer Conference 2005 (Best Paper Award)

·         Impact of Admission and Cache Replacement Policies on Response Times of Jobs on Data Grids, Ekow Otoo, Doron Rotem and Arie Shoshani, Cluster Computing Journal, Springer, October 2005, pp. 293-303.

·         Co-Scheduling of Computation and Data on Computer Clusters, A. Romosan, D. Rotem, A. Shoshani and D. Wright, Proceedings of the Conference on Scientific and Statistical Database Management (SSDBM 2005).

·         On the performance of bitmap indices for high cardinality attributes, Kesheng Wu, Ekow J. Otoo, Arie Shoshani, International conference on Very Large Data Bases (VLDB 2004) 24-35

·         DataMover: Robust Terabyte-Scale Multi-file Replication over Wide-Area Networks, A. Sim, J. Gu, A. Shoshani, V. Natarajan, Scientific and Statistical Database Management conference (SSDBM 2004), 403-411.

·         Storage Resource Managers: Essential Components for the Grid, Arie Shoshani, Alexander Sim, and Junmin Gu, chapter in book: Grid Resource Management: State of the Art and Future Trends, Edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan weglarz, Kluwer Academic Publishers, 2003

·         Using Bitmap Index for Interactive Exploration of Large Datasets. Kesheng Wu, Wendy S. Koegler, Jacqueline Chen, Arie Shoshani, Scientific and Statistical Database Management conference (SSDBM 2003), 65-74.

·         Summarizability in OLAP and Statistical Databases, (with H. Lenz), Ninth International Conference on Scientific and Statistical Database Management (SSDBM 1997).

·         OLAP and Statistical Databases: Similarities and Differences, in Proceedings of the Symposium on Principles of Database Systems (PODS) 1997 (invited tutorial).

·         A Temporal Data Model Based on Time Sequences, (with A. Segev), book chapter in Temporal Databases: Theory, Design, and Implementation, Edited by A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, Benjamin/Cummings, 1993.

·         Logical Modeling of Temporal Data, Best Paper Award, (with A. Segev), Proceedings of the International Conference on Management of Data (SIGMOD), May 1987, best paper award.