| Data Analysis and Machine Learning Efforts at SDM Group For papers under review, send an email to Alex Sim (asim at lbl dot gov)
 
 Network Pattern searching and classification
 
J. Kim, A. Sim, J. Kim, K. Wu, “Botnets Detection Using Recurrent Variational Autoencoder”, IEEE Global Communications Conference (Globecom 2020), 2020.B. Weinger, J. Kim, A. Sim, M. Nakashima, N. Moustafa, K. Wu, “Enhancing IoT Anomaly Detection Performance for Federated Learning”, The 16th International Conference on Mobility, Sensing and Networking (MSN2020), 2020. Automatic Detection of Network Traffic Anomalies and ChangesACM Workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
A New Approach to Online, Multivariate Network Traffic AnalysisJournal of Computer Science and Technology (JCST), 2019.
Spatio-temporal Analysis of HPC I/O and Connection DataInternational Workshop on Scalable Network Traffic Analytics (SNTA 2018), 2018
Predicting Network Traffic Using TCP Anomalies, PosterIEEE International Conference on Big Data (Big Data), 2018
 Multivariate Network Traffic Analysis using Clustered PatternsJournal of Computing, Springer, 2018
A New Approach to Online, Multivariate Network Traffic AnalysisNetwork Security Analytics and Automation (NSAA) 2017
An Approach to Online Network Monitoring Using Clustered PatternsInternational Conference on Computing, Networking and Communications (ICNC) 2017
A Lightweight Network Anomaly Detection TechniqueInternational Workshop on Computing, Networking and Communications (CNC) 2017
 Network Performance analysis and prediction
 
G. R. Ghosal, D. Ghosal, A. Sim, A. V. Thakur, K. Wu, “A Deep Deterministic Policy Gradient Based Network Scheduler For Deadline-Driven Data Transfers”, International Federation for Information Processing (IFIP) Networking Conference (NETWORKING 2020), pp. 253-261, ISBN 978-3-903176-28-7, 2020.Time-series Forecast Modeling on High-Bandwidth Wide Area Network MeasurementsJournal Grid Computing, 2016
Network Bandwidth Utilization Forecast Model on High Bandwidth NetworksInternational Conference on Computing, Networking and Communications (ICNC) 2015
Demo
Best Predictive Generalized Linear Mixed Model with Predictive Lasso for High-Speed Network Data AnalysisInternational Journal of Statistics and Probability (IJSP), 2016
Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP dataInternational Conference on Machine Learning and Data Mining (MLDM) 2013
 Machine Learning Based Data Analysis, Classification and Prediction
 
Performance Prediction for Data Transfers in LCLS WorkflowACM workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
Consensus Ensemble System for Traffic Flow PredictionIEEE Transactions on Intelligent Transportation Systems, 2018
Predicting Baseline for Analysis of Electricity PricingInternational Journal of Big Data Intelligence, Special Issue on Data to Decision, 2018.
Parallel Variable Selection for Effective Performance Predictionthe 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) 2017
Comparison of Clustering Techniques for Residential Energy Behavior using Smart Meter DataInternational Workshop on Artificial Intelligence for Smart Grids and Smart Buildings, in conjunction with AAAI 2017
Machine Learning Based Job Status Prediction in Scientific ClustersIEEE SAI Computing Conference 2016
Extracting Baseline Electricity Usage Using Gradient Tree BoostingInternational Conference on Big Data Intelligence and Computing (DataCom) 2015, Best Paper Award
Statistical Overfitting and Backtest PerformanceQuantitative Finance, 2015, and SSRN open paper, 2014
Invariant Representation and Classification of Fruits from X-ray ImagesInternational Journal of Imaging Systems and Technology, 1996
Invariant Representation and Hierarchical Network for Inspection of Nuts from X-ray ImagesIEEE International Conference on Neural Networks, 1995
Machine Vision Inspection of Insect Infested Pistachio Nuts from X-ray ImagesVision Interface, 1995
 Statistical online streaming data pattern searching, data reduction - IDEALEM
 
Similarity-based Compression with Multidimensional Pattern Matching, ACM Workshop on Systems and Network Telemetry and Analytics (SNTA), 2019
Multidimensional Compression and Pattern Matching, Poster Data Compression Conference (DCC), 2019
Dynamic Online Performance Optimization in Streaming Data CompressionIEEE International Conference on Big Data (Big Data), 2018
Statistical Data Reduction for Streaming Data2017 New York Scientific Data Summit (NYSDS), Data-Driven Discovery in Science and Industry, 2017
Improving Statistical Similarity Based Data Reduction for Non-Stationary DataInternational Conference on Scientific and Statistical Database Management (SSBDM) 2017
Expanding Statistical Similarity Based Data Reduction to Capture Diverse Patterns, PosterData Compression Conference (DCC) 2017
Novel Data Reduction Based on Statistical SimilarityInternational Conference on Scientific and Statistical Database Management (SSDBM) 2016
Relational Dynamic Bayesian Networks with Locally Exchangeable MeasuresLBNL 6341E, 2013
Demo
IDEALEM is an implementation of the data reduction and pattern searching algorithm for streaming data based on Locally Exchangeable Measures.
 US Patent no. 10,366,078,
"DATA REDUCTION METHODS, SYSTEMS, AND DEVICES", 2019.
 All Other Data Analysis
 
Evaluating the Effects of Missing Values and Mixed Data Types on Social Sequence Clustering Using t-SNE VisualizationACM Journal of Data and Information Quality, 2019
Identifying Anomalous File Transfer Events in LCLS WorkflowWorkshop in Autonomous Infrastructure for Science (AI-Science 2018), 2018
Modeling Data Transfers: Change Point and Anomaly DetectionInternational Workshop on Scalable Network Traffic Analytics (SNTA 2018), 2018
Detecting Anomalies in the LCLS Workflowthe 3rd workshop on Open Science in Big Data (OSBD 2018), in conjunction with IEEE International Conference on Big Data, 2018
Convolutional Filtering for Accurate Signal Timing from Noisy Streaming DataIEEE International Conference on Big Data Intelligence and Computing (DataCom2017), 2017
Parameter Analysis of the VPIN (Volume synchronized of Informed Trading) MetricQuantitative Financial Risk Management: Theory and Practice, 2015, doi:10.1002/9781119080305.ch13
A Big Data Approach to Analyzing Market VolatilityAlgorithmic Finance, 2013, doi:10.3233/AF-13030
Efficient Operational Profiling of Systems Using Arrays on Execution LogsISSRE, doi:10.1109/ISSRE.2008.45
Statistical tests for deterministic effects in broad time seriesPhysica D, 1993, doi:10.1016/0167-2789(93)90188-7
Posters
Joint Sequence Analysis Challenges: How to Handle Missing Values and Mixed Variable TypesSIAM Conference on Computational Science and Engineering (CSE19), 2019
Network Traffic Performance Prediction with Multivariate Clusters in Time WindowsSIAM Conference on Computational Science and Engineering (CSE19), 2019.
Identification of Network Data Transfer Bottlenecks in HPC SystemsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'18), 2018
Accurate Signal Timing from High Frequency Streaming DataIEEE International Conference on Big Data (Big Data), 2017.
Feature Engineering and Classification Models for Partial Discharge Events in Power Transformers10th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2017), 2017
Diagnosing Parallel I/O Bottlenecks in HPC ApplicationsInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'17), 2017
 Student Research Competition, 1st place winner.
Analysis of Variable Selection Methods on Scientific Cluster Measurement DataInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), 2016
 Student Research Competition, 2nd place winner.
Discovering Energy Resource Usage Patterns on Scientific ClustersInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'16), 2016
 Student Research Competition, 3rd place winner.
I/O Performance Analysis Framework on Measurement Data from Scientific ClustersInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'15), 2015
Real-Time Outlier Detection Algorithm for Finding Blob-Filaments in PlasmaInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC'14), 2014
Patents
Methods, systems, and devices for accurate signal timing of power component events,US Patent application no. 2019/0138371 A1, 2019.
Data reduction methods, systems and devices,US Patent pending serial no. 14/555,365, 2014.
Co-scheduling of network resource provisioning and host-to-host bandwidth reservation on high-performance network and storage systems,US Patent 8,705,342, 2014.
 |