Designing PhelkStat: Big Data Analytics for System Event Logs

dc.contributor.authorSalman, Mohammeden
dc.contributor.authorWelch, Brianen
dc.contributor.authorRaymond, David Richarden
dc.contributor.authorMarchany, Randolph C.en
dc.contributor.authorTront, Joseph G.en
dc.contributor.departmentElectrical and Computer Engineeringen
dc.date.accessioned2017-04-11T13:35:12Zen
dc.date.available2017-04-11T13:35:12Zen
dc.date.issued2017-01-04en
dc.description.abstractWith wider adoption of micro-service based architectures in cloud and distributed systems, logging and monitoring costs have become increasingly relevant topics of research. There are a large number of log analysis tools such as the ELK(ElasticSearch, Logstash and Kibana) stack, Apache Spark, Sumo Logic, and Loggly, among many others. These tools have been deployed to perform anomaly detection, diagnose threats, optimize performance, and troubleshoot systems. Due to the real-time and distributed nature of logging, there will always be a need to optimize the performance of these tools; this performance can be quantified in terms of compute, storage, and network utilization. As part of the Information Technology Security Lab at Virginia Tech, we have the unique ability to leverage production data from the university network for research and testing. We analyzed the workload variations from two production systems at Virginia Tech, finding that the maximum workload is about four times the average workload. Therefore, a static configuration can lead to an inefficient use of resources. To address this, we propose PhelkStat: a tool to evaluate the temporal and spatial attributes of system workloads, using clustering algorithms to categorize the current workload. Using PhelkStat, system parameters can be automatically tweaked based on the workload. This paper reviews publicly available system event log datasets from supercomputing clusters and presents a statistical analysis of these datasets. We also show a correlation between these attributes and the runtime performance.en
dc.description.notesThe paper was accepted as a long paper at HICSS 50 and was presented in the Symposium on Cybersecurity and Data Analytics on January 4 2017.en
dc.format.mimetypeapplication/pdfen
dc.identifier.urihttp://hdl.handle.net/10919/77388en
dc.language.isoenen
dc.publisherHICSS Symposium on Cybersecurity Big Data Analyticsen
dc.rightsCreative Commons Attribution-NonCommercial 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc/3.0/us/en
dc.subjectLog Analysisen
dc.subjectData Miningen
dc.subjectCloud Computingen
dc.subjectCybersecurityen
dc.subjectELK Stacken
dc.titleDesigning PhelkStat: Big Data Analytics for System Event Logsen
dc.typeOtheren
dc.type.dcmitypeTexten
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
designing-phelkstat-big.pdf
Size:
449.8 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: