Browsing by Author "Salman, Mohammed"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- Designing PhelkStat: Big Data Analytics for System Event LogsSalman, Mohammed; Welch, Brian; Raymond, David Richard; Marchany, Randolph C.; Tront, Joseph G. (HICSS Symposium on Cybersecurity Big Data Analytics, 2017-01-04)With wider adoption of micro-service based architectures in cloud and distributed systems, logging and monitoring costs have become increasingly relevant topics of research. There are a large number of log analysis tools such as the ELK(ElasticSearch, Logstash and Kibana) stack, Apache Spark, Sumo Logic, and Loggly, among many others. These tools have been deployed to perform anomaly detection, diagnose threats, optimize performance, and troubleshoot systems. Due to the real-time and distributed nature of logging, there will always be a need to optimize the performance of these tools; this performance can be quantified in terms of compute, storage, and network utilization. As part of the Information Technology Security Lab at Virginia Tech, we have the unique ability to leverage production data from the university network for research and testing. We analyzed the workload variations from two production systems at Virginia Tech, finding that the maximum workload is about four times the average workload. Therefore, a static configuration can lead to an inefficient use of resources. To address this, we propose PhelkStat: a tool to evaluate the temporal and spatial attributes of system workloads, using clustering algorithms to categorize the current workload. Using PhelkStat, system parameters can be automatically tweaked based on the workload. This paper reviews publicly available system event log datasets from supercomputing clusters and presents a statistical analysis of these datasets. We also show a correlation between these attributes and the runtime performance.
- Spark on the ARC - Big data analytics frameworks on HPC clustersDeYoung, Mark E.; Salman, Mohammed; Bedi, Himanshu; Raymond, David Richard; Tront, Joseph G. (ACM, 2017-07)In this paper we document our approach to overcoming service discovery and configuration of Apache Hadoop and Spark frameworks with dynamic resource allocations in a batch oriented Advanced Research Computing (ARC) High Performance Computing (HPC) environment. ARC efforts have produced a wide variety of HPC architectures. A common HPC architectural pattern is multi-node compute clusters with low-latency, high-performance interconnect fabrics and shared central storage. This pattern enables processing of workloads with high data co-dependency, frequently solved with message passing interface (MPI) programming models, and then executed as batch jobs. Unfortunately, many HPC programming paradigms are not well suited to big data workloads which are often easily separable. Our approach lowers barriers of entry to HPC environments by enabling end users to utilize Apache Hadoop and Spark frameworks that support big data oriented programming paradigms appropriate for separable workloads in batch oriented HPC environments.
- Towards Improving Endurance and Performance in Flash Storage ClustersSalman, Mohammed (Virginia Tech, 2017-06-22)NAND flash-based Solid State Devices (SSDs) provide high performance and energy efficiency and at the same time their capacity continues to grow at an unprecedented rate. As a result, SSDs are increasingly being used in high end computing systems such as supercomputing clusters. However, one of the biggest impediments to large scale deployments is the limited erase cycles in flash devices. The natural skewness in I/O workloads can results in Wear imbalance which has a significant impact on the reliability, performance as well as lifetime of the cluster. Current load balancers for storage systems are designed with a critical goal to optimize performance. Data migration techniques are used to handle wear balancing but they suffer from a huge metadata overhead and extra erasures. To overcome these problems, we propose an endurance-aware write off-loading technique (EWO) for balancing the wear across different flash-based servers with minimal extra cost. Extant wear leveling algorithms are designed for a single flash device. With the use of flash devices in enterprise server storage, the wear leveling algorithms need to take into account the variance of the wear at the cluster level. EWO exploits the out-of-place update feature of flash memory by off- loading the writes across flash servers instead of moving data across flash servers to mitigate extra-wear cost. To evenly distribute erasures to flash servers, EWO off-loads writes from the flash servers with high erase cycles to the ones with low erase cycles by first quantitatively calculating the amount of writes based on the frequency of garbage collection. To reduce metadata overhead caused by write off-loading, EWO employs a hot-slice off-loading policy to explore the trade-offs between extra-wear cost and metadata overhead. Evaluation on a 50 to 200 node SSD cluster shows that EWO outperforms data migration based wear balancing techniques, reducing up to 70% aggregate extra erase cycles while improving the write performance by up to 20% compared to data migration.