Browsing by Author "Li, Dong"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Energy-aware Thread and Data Management in Heterogeneous Multi-Core, Multi-Memory SystemsSu, Chun-Yi (Virginia Tech, 2015-02-03)By 2004, microprocessor design focused on multicore scaling"increasing the number of cores per die in each generation "as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitive or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems. In addition, I propose an analytical model based on the queuing method that captures important factors in multi-core, multi-memory systems to quantify the tradeoff between performance and energy. The model considers the effect of these factors in a holistic fashion that provides a general view of performance and energy consumption in contemporary systems. Finally, I focus on resource management of future heterogeneous memory systems, which may combine two heterogeneous memories to scale out memory capacity while maintaining reasonable power use. I present a new memory controller design that combines the best aspects of two baseline heterogeneous page management policies to migrate data between two heterogeneous memories so as to optimize performance and energy.
- Power Saving Experiments for Large Scale Global OptimizationCao, Zhenwei; Easterling, David R.; Watson, Layne T.; Li, Dong; Cameron, Kirk W.; Feng, Wu-chun (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)Green computing, an emerging field of research that seeks to reduce excess power consumption in high performance computing (HPC), is gaining popularity among researchers. Research in this field often relies on simulation or only uses a small cluster, typically 8 or 16 nodes, because of the lack of hardware support. In contrast, System G at Virginia Tech is a 2592 processor supercomputer equipped with power aware components suitable for large scale green computing research. DIRECT is a deterministic global optimization algorithm, implemented in the mathematical software package VTDIRECT95. This paper explores the potential energy savings for the parallel implementation of DIRECT, called pVTdirect, when used with a large scale computational biology application, parameter estimation for a budding yeast cell cycle model, on System G. Two power aware approaches for pVTdirect are developed and compared against the CPUSPEED power saving system tool. The results show that knowledge of the parallel workload of the underlying application is beneficial for power management.
- Scalable and Energy Efficient Execution Methods for Multicore SystemsLi, Dong (Virginia Tech, 2011-01-26)Multicore architectures impose great pressure on resource management. The exploration spaces available for resource management increase explosively, especially for large-scale high end computing systems. The availability of abundant parallelism causes scalability concerns at all levels. Multicore architectures also impose pressure on power management. Growth in the number of cores causes continuous growth in power. In this dissertation, we introduce methods and techniques to enable scalable and energy efficient execution of parallel applications on multicore architectures. We study strategies and methodologies that combine DCT and DVFS for the hybrid MPI/OpenMP programming model. Our algorithms yield substantial energy saving (8.74% on average and up to 13.8%) with either negligible performance loss or performance gain (up to 7.5%). To save additional energy for high-end computing systems, we propose a power-aware MPI task aggregation framework. The framework predicts the performance effect of task aggregation in both computation and communication phases and its impact in terms of execution time and energy of MPI programs. Our framework provides accurate predictions that lead to substantial energy saving through aggregation (64.87% on average and up to 70.03%) with tolerable performance loss (under 5%). As we aggregate multiple MPI tasks within the same node, we have the scalability concern of memory registration for high performance networking. We propose a new memory registration/deregistration strategy to reduce registered memory on multicore architectures with helper threads. We investigate design polices and performance implications of the helper thread approach. Our method efficiently reduces registered memory (23.62% on average and up to 49.39%) and avoids memory registration/deregistration costs for reused communication memory. Our system enables the execution of application input sets that could not run to the completion with the memory registration limitation.
- Stormwater biofilter response to high nitrogen loading under transient flow conditions: Ammonium and nitrate fates, and nitrous oxide emissionsFeraud, Marina; Ahearn, Sean P.; Parker, Emily A.; Avasarala, Sumant; Rugh, Megyn B.; Hung, Wei-Cheng; Li, Dong; Van De Werfhorst, Laurie C.; Kefela, Timnit; Hemati, Azadeh; Mehring, Andrew S.; Cao, Yiping; Jay, Jennifer A.; Liu, Haizhou; Grant, Stanley B.; Holden, Patricia A. (Pergamon-Elsevier, 2022-12-17)Nitrogen (N) in urban runoff is often treated with green infrastructure including biofilters. However, N fates across biofilters are insufficiently understood because prior studies emphasize low N loading under laboratory conditions, or use “steady-state” flow regimes over short time scales. Here, we tested field scale biofilter N fates during simulated storms delivering realistic transient flows with high N loading. Biofilter outflow ammonium (NH4+-N) was 60.7 to 92.3% lower than that of the inflow. Yet the characteristic times for nitrification (days to weeks) and denitrification (days) relative to N residence times (7 to 30 h) suggested low N transformation across the biofilters. Still, across 7 successive storms, total outflow nitrate (NO3−-N) greatly exceeded (3100 to 3900%) inflow nitrate, a result only explainable by biofilter soil N nitrification occurring between storms. Archaeal, and bacterial amoA gene copies (2.1 × 105 to 1.2 × 106 gc g soil−1), nitrifier presence by16S rRNA gene sequencing, and outflow δ18O-NO3− values (-3.0 to 17.1 ‰) reinforced that nitrification was occurring. A ratio of δ18O-NO3− to δ15N-NO3− of 1.83 for soil eluates indicated additional processes: N assimilation, and N mineralization. Denitrification potential was suggested by enzyme activities and soil denitrifying gene copies (nirK + nirS: 3.0 × 106 to 1.8 × 107; nosZ: 5.0 × 105 to 2.2 × 106 gc g soil−1). However, nitrous oxide (N2O-N) emissions (13.5 to 84.3 μg N m −2 h −1) and N2O export (0.014 g N) were low, and soil nitrification enzyme activities (0.45 to 1.63 mg N kg soil−1day−1) exceeded those for denitrification (0.17 to 0.49 mg N kg soil−1 day−1). Taken together, chemical, bacterial, and isotopic metrics evidenced that storm inflow NH4+sorbs and, along with mineralized soil N, nitrifies during biofilter dry-down; little denitrification and associated N2O emissions ensue, and thus subsequent storms export copious NO3−-N. As such, pulsed pass-through biofilters require redesign to promote plant assimilation and/or denitrification of mineralized and nitrified N, to minimize NO3−-N generation and export.