Data Analysis and Machine Learning Efforts at SDM Group
For papers under review, send an email to Alex Sim (asim at lbl dot gov)
Network Pattern searching and classification
- J. Kim, A. Sim, J. Kim, K. Wu, “Botnets Detection Using Recurrent Variational Autoencoder”, IEEE Global Communications Conference (Globecom 2020), 2020.
- B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, “Enhancing IoT Anomaly Detection Performance for Federated Learning”, The 16th International Conference on Mobility, Sensing and Networking (MSN2020), 2020.
- Automatic Detection of Network Traffic Anomalies and Changes
ACM Workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
- A New Approach to Online, Multivariate Network Traffic Analysis
Journal of Computer Science and Technology (JCST), 2019.
- Spatio-temporal Analysis of HPC I/O and Connection Data
International Workshop on Scalable Network Traffic Analytics (SNTA 2018), 2018
- Predicting Network Traffic Using TCP Anomalies, Poster
IEEE International Conference on Big Data (Big Data), 2018
- Multivariate Network Traffic Analysis using Clustered Patterns
Journal of Computing, Springer, 2018
- A New Approach to Online, Multivariate Network Traffic Analysis
Network Security Analytics and Automation (NSAA) 2017
- An Approach to Online Network Monitoring Using Clustered Patterns
International Conference on Computing, Networking and Communications (ICNC) 2017
- A Lightweight Network Anomaly Detection Technique
International Workshop on Computing, Networking and Communications (CNC) 2017
Network Performance analysis and prediction
- G. R. Ghosal, D. Ghosal, A. Sim, A. V. Thakur, K. Wu, “A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers”, International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), pp. 253-261, ISBN 978-3-903176-28-7, 2020.
- Time-series Forecast Modeling on High-Bandwidth Wide Area Network Measurements
Journal Grid Computing, 2016
- Network Bandwidth Utilization Forecast Model on High Bandwidth Networks
International Conference on Computing, Networking and Communications (ICNC) 2015
- Demo
- Best Predictive Generalized Linear Mixed Model with Predictive Lasso for High-Speed Network Data Analysis
International Journal of Statistics and Probability (IJSP), 2016
- Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data
International Conference on Machine Learning and Data Mining (MLDM) 2013
Machine Learning Based Data Analysis, Classification and Prediction
- Performance Prediction for Data Transfers in LCLS Workflow
ACM workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
- Consensus Ensemble System for Traffic Flow Prediction
IEEE Transactions on Intelligent Transportation Systems, 2018
- Predicting Baseline for Analysis of Electricity Pricing
International Journal of Big Data Intelligence, Special Issue on Data to Decision, 2018.
- Parallel Variable Selection for Effective Performance Prediction
the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2017
- Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter Data
International Workshop on Artificial Intelligence for Smart Grids and Smart Buildings, in conjunction with AAAI 2017
- Machine Learning Based Job Status Prediction in Scientific Clusters
IEEE SAI Computing Conference 2016
- Extracting Baseline Electricity Usage Using Gradient Tree Boosting
International Conference on Big Data Intelligence and Computing (DataCom) 2015, Best Paper Award
- Statistical Overfitting and Backtest Performance
Quantitative Finance, 2015, and SSRN open paper, 2014
- Invariant Representation and Classification of Fruits from X-ray Images
International Journal of Imaging Systems and Technology, 1996
- Invariant Representation and Hierarchical Network for Inspection of Nuts from X-ray Images
IEEE International Conference on Neural Networks, 1995
- Machine Vision Inspection of Insect Infested Pistachio Nuts from X-ray Images
Vision Interface, 1995
Statistical online streaming data pattern searching, data reduction - IDEALEM
- Similarity-based Compression with Multidimensional Pattern Matching,
ACM Workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
- Multidimensional Compression and Pattern Matching, Poster
Data Compression Conference (DCC), 2019
- Dynamic Online Performance Optimization in Streaming Data Compression
IEEE International Conference on Big Data (Big Data), 2018
- Statistical Data Reduction for Streaming Data
2017 New York Scientific Data Summit (NYSDS), Data-Driven Discovery in Science and Industry, 2017
- Improving Statistical Similarity Based Data Reduction for Non-Stationary Data
International Conference on Scientific and Statistical Database Management (SSBDM) 2017
- Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns, Poster
Data Compression Conference (DCC) 2017
- Novel Data Reduction Based on Statistical Similarity
International Conference on Scientific and Statistical Database Management (SSDBM) 2016
- Relational Dynamic Bayesian Networks with Locally Exchangeable Measures
LBNL 6341E, 2013
- Demo
-
IDEALEM is an implementation of the data reduction and pattern searching algorithm
for streaming data based on Locally Exchangeable Measures.
US Patent no. 10,366,078,
"DATA REDUCTION METHODS, SYSTEMS, AND DEVICES", 2019.
All Other Data Analysis
- Evaluating the Effects of Missing Values and Mixed Data Types on Social Sequence Clustering Using t-SNE Visualization
ACM Journal of Data and Information Quality, 2019
- Identifying Anomalous File Transfer Events in LCLS Workflow
Workshop in Autonomous Infrastructure for Science (AI-Science 2018), 2018
- Modeling Data Transfers: Change Point and Anomaly Detection
International Workshop on Scalable Network Traffic Analytics (SNTA 2018), 2018
- Detecting Anomalies in the LCLS Workflow
the 3rd workshop on Open Science in Big Data (OSBD 2018), in conjunction with IEEE International Conference on Big Data, 2018
- Convolutional Filtering for Accurate Signal Timing from Noisy Streaming Data
IEEE International Conference on Big Data Intelligence and Computing (DataCom2017), 2017
- Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) Metric
Quantitative Financial Risk Management: Theory and Practice, 2015, doi:10.1002/9781119080305.ch13
- A Big Data Approach to Analyzing Market Volatility
Algorithmic Finance, 2013, doi:10.3233/AF-13030
- Efficient Operational Profiling of Systems Using Arrays on Execution Logs
ISSRE, doi:10.1109/ISSRE.2008.45
- Statistical tests for deterministic effects in broad time series
Physica D, 1993, doi:10.1016/0167-2789(93)90188-7
- Posters
- Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable Types
SIAM Conference on Computational Science and Engineering (CSE19), 2019
- Network Traffic Performance Prediction with Multivariate Clusters in Time Windows
SIAM Conference on Computational Science and Engineering (CSE19), 2019.
- Identification of Network Data Transfer Bottlenecks in HPC Systems
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'18), 2018
- Accurate Signal Timing from High Frequency Streaming Data
IEEE International Conference on Big Data (Big Data), 2017.
- Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers
10th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2017), 2017
- Diagnosing Parallel I/O Bottlenecks in HPC Applications
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), 2017 Student Research Competition, 1st place winner.
- Analysis of Variable Selection Methods on Scientific Cluster Measurement Data
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), 2016 Student Research Competition, 2nd place winner.
- Discovering Energy Resource Usage Patterns on Scientific Clusters
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), 2016 Student Research Competition, 3rd place winner.
- I/O Performance Analysis Framework on Measurement Data from Scientific Clusters
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'15), 2015
- Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in Plasma
International Conference for High Performance Computing, Networking, Storage and Analysis (SC'14), 2014
- Patents
- Methods, systems, and devices for accurate signal timing of power component events,
US Patent application no. 2019/0138371 A1, 2019.
- Data reduction methods, systems and devices,
US Patent pending serial no. 14/555,365, 2014.
- Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems,
US Patent 8,705,342, 2014.
|