Efficient Analysis of Live and Historical Streaming Data
and its Application to Cybersecurity

Frederick Reiss, Kurt Stockinger, Kesheng Wu, Arie Shoshani, Joseph M. Hellerstein


This paper describes our experiences building a coherent framework for efficient simultaneous querying of live and archived stream data. This work was motivated by the need to analyze the network traffic patterns of research laboratories funded by the U.S. Department of Energy. We review the requirements of such a system and implement a prototype based on the TelegraphCQ streaming query processor and the FastBit bitmap index. The combined system uses TelegraphCQ to analyze streams of traffic information and FastBit to correlate current behaviors with historical trends. We present a detailed performance analysis of our system based on a complex query workload and real network traffic collected at Lawrence Berkeley National Laboratory (Berkeley Lab). Our experiments identify key performance bottlenecks for stream query processing systems that incorporate historical data. We also identify strategies for mitigating these bottlenecks. With these strategies in place, we demonstrate that it is possible for our system to analyze the entire traffic of the DOE lab network on a small cluster of machines.

full text of LBNL-61080 (Joe's copy)

Published version in SSDBM 2007.
More research work by John Wu
Bitmap Index
Connected Component Labeling
Eigenvalue Computation
Inforamtion available elsewhere on the web
Google Scholar
Contact us

John Wu