Arie Shoshani
Head, Scientific Data Management Group Tel:
(510) 486-5171
High Performance Computing Research
Department Fax:
(510) 486 -4004
Computational Research
Division Email:
[email protected]
Lawrence Berkeley National Laboratory http://www.lbl.gov/~arie
Education
·
Ph.D. Computer Sciences,
·
M. A. Computer Sciences,
· B. S. (Summa cum Laude), Control Engineering, Technion -- Israel Institute of Technology, 1965.
Position History
· Senior Staff Computer Scientist, Lawrence Berkeley National Laboratory (LBNL), Berkeley, California, 1976-present.
· Computer systems Specialist, System Development Corporation, Santa Monica, California, 1969-1976.
Awards and Honors
· R&D 100 2008 award, for FastBit index, with Kesheng “John” Wu, Ekow Otoo, and Kurt Stockinger.
· Plenary speaker, Scientific Data Management, SIAM Computational Science & Engineering (CSE'07)
· Keynote speaker, Efficient Indexing Technology for Data Mining of Scientific Data, Keynote Talk, Fifth IEEE International Conference on Data Mining, November 2005.
·
Best paper award,
International Supercomputer Conference,
·
Patent,
Word-Aligned Hybrid compression method.
US Patent 6,831,575. 2004, with John Wu and E. Otto.
· Elected to the Very Large Data Bases (VLDB) Endowment Board, 1988-1996.
· Elected Vice-President of the Very Large Data Bases (VLDB) Endowment Board, 1997-1998.
· Best paper award, ACM-SIGMOD Conference on Management of Data, 1987.
· Chairman of Steering Committee for the Scientific and Statistical Data Base Management (SSDBM) Conference, 1982-present.
· General Chairman of the Fourteenth VLDB Conference, 1998.
· Associate editor for the ACM Transactions on Database Systems (TODS), 1982 – 1986.
· Editor, Computational Science & Discovery, 2008-2012.
· Keynote Speaker, 5th Conference on Knowledge and Information Management Conference (CKIM), 1996.
· Invited tutorial speaker, Symposium on Principles of Database Systems (PODS), 1997.
Research Interests
Semantic
data models, query languages, temporal data, efficient access from tertiary
storage, statistical and OLAP databases, and database techniques for scientific
database applications.
Narrative
I
have been the head of the Scientific Data Management Research Group at LBNL since
1978. Our research activities focus on
the development of algorithms and software for the organization, access and
manipulation of scientific databases (SDBs).
Our areas of research fall into three main categories: logical modeling
and user interfaces (which include modeling of SDBs, query languages, graphical
user interfaces, and modeling of temporal, sequence, and multi-dimensional
data), physical organization and access methods (which include bitmap indexing
of scientific data, temporal data structures, and multi-dimensional data
structure), and algorithms for special SDB operators (such as sampling,
transposition, and aggregation).
Currently, I am currently the director of the Scalable Data
Management, Analysis, and Visualization (SDAV) institute (budget $5
Million/year), funded by Department of Energy.
The institute includes members from 6 DOE laboratories and 7
Universities (see: http://sdav-scidac.org). Over the last 10 years, I was the director
of a Scientific Data Management Center for Enabling Technology (budget $3.3
Million per year). In this capacity, I
coordinated the work of collaborators from 5 DOE laboratories and 5 universities (see http://sdmcenter.lbl.gov).
The
Scientific Data Management research group that I am heading has been very
productive and visible in the research community. Our group has been and continues to be involved
in practical projects (such as the microbial Genome, Climate modeling, combustion
modeling, High Energy Physics, and others), and has been applying their
research results by providing prototype software to real scientific data
management problems. Our work has
established the fields of Statistical Data Management and Scientific Data
Management as important research areas with unique challenging problems. I have initiated in 1982 the conference
series on Statistical and Scientific Data Base Management (SSDBM), and
am continuing to serve as the chair of the steering committee for this
conference. Recently a patent was
awarded to two members of my group and me for developing a highly efficient
specialized bitmap indexing method, called FastBit
(see: https://sdm.lbl.gov/fastbit/),
which has been deployed in many projects. In addition to management and
administrative duties, my own technical work is mainly in the characterization
of SDBs unique requirements, query languages, modeling of statistical and
scientific data, temporal data, sequence data, multi-dimensional data, and data
compression. More recently, I have been
involved with several scientific projects, including Storage Resource
Management (SRM) for the Grid, a microbial meta-database, distributed access of
climate modeling data, and bitmap indexing and access of High Energy Physics
data from tertiary storage. I have been
and continue to be involved in many professional activities outside the
Laboratory, Including chairing and participating on various program committees.
I have published over 150 papers in refereed Journals and conferences.
Selected Publications
· M. Balman, E. Chaniotakis, A. Shoshani, A. Sim. “An Efficient Reservation Algorithm for Advanced Network Provisioning”, ACM/IEEE Supercomputing Conference 2010 (SC10)
· E. Pourabbas and A. Shoshani, Improving Estimation Accuracy of Aggregate Queries on Data Cubes, Data & Knowledge Engineering 69 (2010) 50–72.
· Arie Shoshani and Doron Rotem, Editors, Scientific Data Management: Challenges, Technology, and Deployment, (Chapman & Hall/CRC Computational Science), December 2009.
·
Rishi
Rakesh Sinha, Marianne Winslett, Kesheng Wu, Kurt Stockinger, Arie Shoshani: Adaptive Bitmap Indexes for
Space-Constrained Systems. ICDE 2008: 1418-1420
·
Kesheng
Wu, Kurt Stockinger, Arie
Shoshani: Breaking the Curse of Cardinality on Bitmap Indexes. SSDBM 2008:
348-365
·
A. Shoshani et al:
Storage Resource Managers: Recent International Experience on Requirements and
Multiple Co-Operating Implementations. MSST 2007: 47-59
·
Frederick Reiss,
Kurt Stockinger, Kesheng
Wu, Arie Shoshani, Joseph M. Hellerstein:
Enabling Real-Time Querying of Live and Historical Stream Data. SSDBM 2007: 28
·
Elaheh
Pourabbas, Arie Shoshani: Efficient estimation of
joint queries from multiple OLAP databases. ACM Trans. Database Syst. 32(1): 2
(2007)
·
Kesheng
Wu, Ekow J. Otoo, Arie Shoshani: Optimizing bitmap indices with efficient
compression. ACM Trans. Database Syst. 31(1): 1-38 (2006)
·
Grid Collector:
Facilitating Efficient Selective Access from Data Grids, Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur M. Poskanzer, Arie Shoshani, Alexander Sim,
and Wei-Ming Zhang, In Proceedings of International Supercomputer Conference
2005 (Best Paper Award)
·
Impact of Admission
and Cache Replacement Policies on Response Times of Jobs on Data Grids,
Ekow Otoo, Doron Rotem and Arie Shoshani,
Cluster Computing Journal, Springer, October 2005, pp. 293-303.
·
Co-Scheduling of
Computation and Data on Computer Clusters, A. Romosan, D. Rotem, A. Shoshani
and D. Wright, Proceedings of the Conference on Scientific and Statistical
Database Management (SSDBM 2005).
·
On the performance of bitmap indices for high cardinality
attributes, Kesheng Wu,
Ekow J. Otoo, Arie
Shoshani, International conference on Very Large Data Bases (VLDB 2004) 24-35
·
DataMover: Robust Terabyte-Scale Multi-file
Replication over Wide-Area Networks, A. Sim, J. Gu, A. Shoshani, V. Natarajan, Scientific and Statistical Database Management
conference (SSDBM 2004), 403-411.
·
Storage Resource Managers:
Essential Components for the Grid, Arie Shoshani, Alexander Sim, and Junmin Gu, chapter in
book: Grid Resource Management: State of the Art and Future Trends, Edited by Jarek Nabrzyski, Jennifer M. Schopf, Jan weglarz, Kluwer Academic Publishers, 2003
·
Using Bitmap Index for Interactive Exploration of Large Datasets. Kesheng Wu, Wendy S. Koegler, Jacqueline
Chen, Arie Shoshani, Scientific and Statistical Database Management conference
(SSDBM 2003), 65-74.
· Summarizability in OLAP and Statistical Databases, (with H. Lenz), Ninth International Conference on Scientific and Statistical Database Management (SSDBM 1997).
· OLAP and Statistical Databases: Similarities and Differences, in Proceedings of the Symposium on Principles of Database Systems (PODS) 1997 (invited tutorial).
· A Temporal Data Model Based on Time Sequences, (with A. Segev), book chapter in Temporal Databases: Theory, Design, and Implementation, Edited by A. Tansel, J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, Benjamin/Cummings, 1993.
· Logical Modeling of Temporal Data, Best Paper Award, (with A. Segev), Proceedings of the International Conference on Management of Data (SIGMOD), May 1987, best paper award.