Arie Shoshani
Head, Scientific Data Management Group Tel:
(510) 486-5171
High Performance Computing Research Department Fax:
(510) 486 -4004
Computational Research Division Email:
[email protected]
Lawrence Berkeley National Laboratory http://www.lbl.gov/~arie
Education
·
Ph.D. Computer Sciences,
·
M. A. Computer Sciences,
· B. S. (Summa cum Laude), Control Engineering, Technion -- Israel Institute of Technology, 1965.
Position History
· Senior Staff Computer Scientist, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California, 1976-present.
· Computer systems Specialist, System Development Corporation, Santa Monica, California, 1969-1976.
Awards and Honors
· R&D 100 2008 award, for FastBit index, with Kesheng “John” Wu, Ekow Otoo, and Kurt Stockinger
· Keynote speaker, Efficient Indexing Technology for Data Mining of Scientific Data, Keynote Talk, Fifth IEEE International Conference on Data Mining, November 2005.
·
Best paper award, International
Supercomputer Conference,
·
Patent, Word-Aligned
Hybrid compression method. US Patent
6,831,575. 2004, with John Wu and E. Otto.
· Hottest Infrastructure Award – SuperComputing 2000 Network Challenge - A Data Management Infrastructure for Climate Modeling Research (a collaboration of several laboratories)
· Elected to the Very Large Data Bases (VLDB) Endowment Board, 1988-1996.
· Elected Vice-President of the Very Large Data Bases (VLDB) Endowment Board, 1997-1998.
· Best paper award, ACM-SIGMOD Conference on Management of Data, 1987.
· Chairman of Steering Committee for the Scientific and Statistical Data Base Management (SSDBM) Conference, 1982-present.
· General Chairman of the Fourteenth VLDB Conference, 1998.
· Associate editor for the ACM Transactions on Database Systems (TODS), 1982 – 1986.
· Keynote Speaker, 5th Conference on Knowledge and Information Management Conference (CKIM), 1996.
· Invited tutorial speaker, Symposium on Principles of Database Systems (PODS), 1997.
Research Interests
Semantic data
models, query languages, temporal data, efficient access from tertiary storage,
statistical and OLAP databases, and database techniques for scientific database
applications.
Narrative
I
have been the head of the Scientific Data Management Research Group at LBNL since
1978. Our research activities focus on
the development of algorithms and software for the organization, access and
manipulation of scientific databases (SDBs). Our areas of research fall into three main
categories: logical modeling and user interfaces (which include modeling of SDBs, query languages, graphical user interfaces, and
modeling of temporal, sequence, and multi-dimensional data), physical
organization and access methods (which include bitmap indexing of scientific data,
temporal data structures, and multi-dimensional data structure), and algorithms
for special SDB operators (such as sampling, transposition, and
aggregation). Currently, I am the
director of a Scientific Data Management (SDM)
The
Scientific Data Management research group that I am heading has been very
productive and visible in the research community. Our group has been and continues to be involved
in practical projects (such as the Human Genome, a Climate modeling, combustion
modeling, High Energy Physics, and others), and has been applying their
research results by providing prototype software to real scientific data
management problems. Our work has
established the fields of Statistical Data Management and Scientific Data
Management as important research areas with unique challenging problems. We have initiated the conferences on
Statistical and Scientific Data Base Management (SSDBM). I am continuing
to serve as the chair of the steering committee for this conference. In 1998, a product that was developed in my
group, called the OPM database tools, was commercialized, and has been used by
biotech and pharmaceutical companies. We
continue to use this product in projects in my group. More recently a patent was awarded to two
members of my group and me for developing a highly efficient specialized bitmap
indexing method, which is deployed in various projects.
In addition to management and administrative duties, my own technical
work is mainly in the characterization of SDBs unique
requirements, query languages, modeling of statistical data, temporal data,
sequence data, multi-dimensional data, and data compression. More recently, I have been involved with
several scientific projects, including Storage Resource Management (SRM) for
the Grid, a microbial meta-database, distributed
access of climate modeling data, and bitmap indexing and organization of High
Energy Physics data on tertiary storage.
I have been and continue to be involved in many professional activities
outside the Laboratory, Including chairing and participating on various program
committees. I have published over 70 papers in refereed Journals and
conferences.
Selected Publications
·
Rishi
Rakesh Sinha, Marianne Winslett, Kesheng Wu, Kurt Stockinger, Arie Shoshani: Adaptive Bitmap Indexes for
Space-Constrained Systems. ICDE 2008: 1418-1420
·
Kesheng
Wu, Kurt Stockinger, Arie
Shoshani: Breaking the Curse of Cardinality on Bitmap Indexes. SSDBM 2008:
348-365
·
A. Shoshani, et al:
Storage Resource Managers: Recent International Experience on Requirements and
Multiple Co-Operating Implementations. MSST 2007: 47-59
·
Frederick Reiss,
Kurt Stockinger, Kesheng
Wu, Arie Shoshani, Joseph M. Hellerstein:
Enabling Real-Time Querying of Live and Historical Stream Data. SSDBM 2007: 28
·
Elaheh
Pourabbas, Arie Shoshani: Efficient estimation of
joint queries from multiple OLAP databases. ACM Trans. Database Syst. 32(1): 2
(2007)
·
Kesheng
Wu, Ekow J. Otoo, Arie Shoshani: Optimizing bitmap indices with efficient
compression. ACM Trans. Database Syst. 31(1): 1-38 (2006)
·
Grid Collector: Facilitating
Efficient Selective Access from Data Grids, Kesheng
Wu, Junmin Gu, Jerome Lauret, Arthur M. Poskanzer, Arie
Shoshani, Alexander Sim, and Wei-Ming Zhang, In
Proceedings of International Supercomputer Conference 2005 (Best Paper Award)
·
Impact of Admission
and Cache Replacement Policies on Response Times of Jobs on Data Grids,
Ekow Otoo, Doron Rotem and Arie Shoshani,
Cluster Computing Journal, Springer, October 2005, pp. 293-303.
·
RRS: Replica
Registration Service for Data Grids, Arie Shoshani,
Alex Sim, Kurt Stockinger,
Proceedings of VLDB Workshop on Data Management in Grids (VLDB-Grids'05),
September 2005.
·
Co-Scheduling of
Computation and Data on Computer Clusters,
A. Romosan, D. Rotem, A.
Shoshani and D. Wright, Proceedings of the Conference on Scientific and
Statistical Database Management (SSDBM 2005).
·
On the performance of bitmap indices for high cardinality
attributes, Kesheng Wu,
Ekow J. Otoo, Arie
Shoshani, International conference on Very Large Data Bases (VLDB 2004) 24-35
·
DataMover: Robust Terabyte-Scale Multi-file
Replication over Wide-Area Networks, A. Sim, J. Gu, A. Shoshani, V. Natarajan, Scientific and Statistical Database Management
conference (SSDBM 2004), 403-411.
·
Storage Resource Managers:
Essential Components for the Grid, Arie Shoshani, Alexander Sim, and Junmin Gu, chapter in
book: Grid Resource Management: State of the Art and Future Trends, Edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan weglarz, Kluwer Academic Publishers, 2003
·
Using Bitmap Index for Interactive Exploration of Large Datasets. Kesheng Wu, Wendy S. Koegler, Jacqueline
Chen, Arie Shoshani, Scientific and Statistical Database Management conference
(SSDBM 2003), 65-74.
·
A Performance Comparison of bitmap indexes, Kesheng
Wu, Ekow J. Otoo, Arie
Shoshani, ACM International Conference on Information and Knowledge Management
(CIKM’01), 559-561.
·
Storage Resource Managers: Middleware Components for Grid Storage, ·Arie
Shoshani, Alex Sim, Junmin Gu, Nineteenth IEEE Symposium on Mass Storage Systems, 2002
(MSS '02).
· Extending OLAP Querying to External Object Databases, T. Pedersen, A. Shoshani, J. Gu, C.S. Jensen, 9th International Information and Knowledge Management (CKIM'00).
· Coordinating Simultaneous Caching of File Bundles from Tertiary Storage, A. Shoshani, A. Sim, L. M. Bernardo, H. Nordberg, Scientific and Statistical Database Management conference (SSDBM 2000).
· Storage Management Techniques for Very Large Multidimensional Datasets, A. Shoshani, L. M. Bernardo, H. Nordberg, D. Rotem, and A. Sim, Eleventh International Conference on Scientific and Statistical Database Management (SSDBM 1999).
· Determining the Optimal File Size on Tertiary Storage Systems Based on the Distribution of Query Sizes, L. Bernardo, H. Nordberg, D. Rotem, and A. Shoshani, Tenth International Conference on Scientific and Statistical Database Management, (SSDBM 1998).
· Summarizability in OLAP and Statistical Databases, (with H. Lenz), Ninth International Conference on Scientific and Statistical Database Management (SSDBM 1997).
· OLAP and Statistical Databases: Similarities and Differences, in Proceedings of the Symposium on Principles of Database Systems (PODS) 1997 (invited tutorial).
· A Temporal Data Model Based on Time Sequences, (with A. Segev), book chapter in Temporal Databases: Theory, Design, and Implementation, Edited by A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, Benjamin/Cummings, 1993.
· Representing Extended Entity-Relationship Structures in Relational Databases: A Modular Approach, (with V. Markowitz), ACM Trans. on Database Systems, 17, 3 (September 1992), pp. 423-464.
· Logical Modeling of Temporal Data, Best Paper Award, (with A. Segev), Proceedings of the International Conference on Management of Data (SIGMOD), May 1987, best paper award.