Browsing by Author "Sosonkina, Masha"
Now showing 1 - 12 of 12
Results Per Page
Sort Options
- Adjusting process count on demand for petascale global optimization⋆Radcliffe, Nicholas R.; Watson, Layne T.; Sosonkina, Masha; Haftka, Raphael T.; Trosset, Michael W. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011)There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This paper describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, the modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed.
- Algorithm XXX: VTDIRECT95: Serial and Parallel Codes for the Global Optimization Algorithm DIRECTHe, Jian; Watson, Layne T.; Sosonkina, Masha (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007)VTDIRECT95 is a Fortran 95 implementation of D.R. Jones' deterministic global optimization algorithm called DIRECT, which is widely used in multidisciplinary engineering design, biological science, and physical science applications. The package includes both a serial code and a data-distributed massively parallel code for different problem scales and optimization (exploration vs. exploitation) goals. Dynamic data structures are used to organize local data, handle unpredictable memory requirements, reduce the memory usage, and share the data across multiple processors. The parallel code employs a multilevel functional and data parallelism to boost concurrency and mitigate the data dependency, thus improving the load balancing and scalability. In addition, checkpointing features are integrated into both versions to provide fault tolerance and hot restarts. Important alogrithm modifications and design considerations are discussed regarding data structures, parallel schemes, error handling, and portability. Using several benchmark functions and real-world applications, the software is evaluated on different systems in terms of optimization effectiveness, data structure efficency, parallel performance, and checkpointing overhead. The package organization and usage are also described in detail.
- Design and Implementation of a Massively Parallel Version of DIRECTHe, Jian; Verstak, Alex; Watson, Layne T.; Sosonkina, Masha (Department of Computer Science, Virginia Polytechnic Institute & State University, 2006)This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.
- HOMPACK90: A Suite of FORTRAN 90 Codes for Globally Convergent Homotopy AlgorithmsWatson, Layne T.; Sosonkina, Masha; Melville, Robert C.; Morgan, Alexander P.; Walker, Homer F. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1996-07-01)HOMPACK90 is a FORTRAN 90 version of the FORTRAN 77 package HOMPACK (Algorithm 652), a collection of codes for finding zeros or fixed points of nonlinear systems using globally convergent probability-one homotopy algorithms. Three qualitatively different algorithms - ordinary differential equation based, normal flow, quasi-Newton augmented Jacobian matrix - are provided for tracking homotopy zero curves, as well as separate routines for dense and sparse Jacobian matrices. A high level driver for the special case of polynomial systems is also provided. Changes to HOMPACK include numerous minor improvements, simpler and more elegant interfaces, use of modules, new end games, support for several sparse matrix data structures, and new iterative algorithms for large sparse Jacobian matrices.
- A New Adaptive GMRES Algorithm for Achieving High AccuracySosonkina, Masha; Watson, Layne T.; Kapania, Rakesh K. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1996-05-01)GMRES(k) is widely used for solving nonsymmetric linear systems. However, it is inadequate either when it converges only for k close to the problem size or when numerical error in the modified Gram-Schmidt process used in the GMRES orthogonalization phase dramatically affects the algorithm performance. An adaptive version of GMRES(k) which tunes the restart value k based on criteria estimating the GMRES conversion rate for the given problem is proposed here. This adaptive GMRES(k) procedure outperforms standard GMRES(k), several other GMRES-like methods, and QMR on actual large scale sparse structural mechanics postbuckling and analog circuit simulation problems. There are some applications, such as homotopy methods for high Reynolds number viscous flows, solid mechanics postbuckling analysis, and analog circuit simulation, where very high accuracy in the linear system solutions is essential. In this context, the modified Gram-Schmidt process in GMRES can fail causing the entire GMRES iteration to fail. It is shown that the adaptive GMRES(k) with the orthogonalization performed by Householder transformations succeeds whenever GMRES(k) with the orthogonolization performed by the modified Gram-Schmidt process fails, and the extra cost of computing Householder transformations is justified for these applications.
- Note on the End Game in Homotopy Zero Curve TrackingSosonkina, Masha; Watson, Layne T.; Stewart, David E. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1995-03-01)Homotopy algorithms to solve a nonlinear system of equations f(x)=0 involve tracking the zero curve of a homotopy map p(a,theta,x) from theta=0 until theta=1. When the algorithm nears or crosses the hyperplane theta=1, an "end game" phase is begun to compute the solution x(bar) satisfying p(a,theta,x(bar))=f(x(bar))=0. This note compares several end game strategies, including the one implemented in the normal flow code FIXPNF in the homotopy software package HOMPACK.
- Parallel Adaptive GMRES Implementations for Homotopy MethodsSosonkina, Masha; Allison, Donald C. S.; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-11-01)The success of homotopy methods in solving large-scale optimization problems and nonlinear systems of equations depends heavily on the solution of large sparse nonsymmetric linear systems on parallel architectures. Iterative solution techniques, such as GMRES(k), favor parallel implementations. However, their straightforward parallelization usually leads to a poor parallel performance because of global communication incurred by processors. One variation of GMRES(k) considered here is to adapt the restart value k for any given problem and use Householder reflections in the orthogonalization phase to achieve high accuracy and reduce the communication overhead.
- Parallel Cost Analysis of Adaptive GMRES Implementations for Homotopy MethodsSosonkina, Masha; Allison, Donald C. S.; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-12-01)The success of homotopy methods in solving large-scale optimization problems and nonlinear systems of equations depends heavily on the solution of large sparse nonsymmetric linear systems on parallel architectures. Iterative solution techniques, such as GMRES(k), favor parallel implementations. However, their straightforward parallelization usually leads to a poor parallel performance because of global communication incurred by processors. One variation on GMRES(k) considered here is to adapt the restart value k for any given problem and use Householder reflections in the orthogonalization phase to achieve high accuracy and to reduce the communication overhead. The Householder transformations can be performed without global communications and modified to utilize an arbitrary row distribution of the coefficient matrix. The effect of this modification on the GMRES(k) performance is discussed here, as well as the abilities of parallel GMRES implementations using Householder reflections to maintain fixed efficiency with increase in problem size and number of processors. Theoretical communication cost and isoefficiency analyses are compared with experimental results on an Intel Paragon, Gray T3E, and IBM SP2.
- Performance Modeling and Analysis of a Massively Parallel DIRECT— Part 1He, Jian; Verstak, Alex; Watson, Layne T.; Sosonkina, Masha (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007)Modeling and analysis techniques are used to investigate the performance of a massively parallel version of DIRECT, a global search algorithm widely used in multidisciplinary design optimization applications. Several highdimensional benchmark functions and real world problems are used to test the design effectiveness under various problem structures. Theoretical and experimental results are compared for two parallel clusters with different system scale and network connectivity. The present work aims at studying the performance sensitivity to important parameters for problem configurations, parallel schemes, and system settings. The performance metrics include the memory usage, load balancing, parallel efficiency, and scalability. An analytical bounding model is constructed to measure the load balancing performance under different schemes. Additionally, linear regression models are used to characterize two major overhead sources—interprocessor communication and processor idleness, and also applied to the isoefficiency functions in scalability analysis. For a variety of highdimensional problems and large scale systems, the massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the generalized design considerations and analysis techniques are beneficial for transforming many global search algorithms to become effective large scale parallel optimization tools.
- Performance Modeling and Analysis of a Massively Parallel DIRECT— Part 2He, Jian; Verstak, Alex; Sosonkina, Masha; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007)Modeling and analysis techniques are used to investigate the performance of a massively parallel version of DIRECT, a global search algorithm widely used in multidisciplinary design optimization applications. Several highdimensional benchmark functions and real world problems are used to test the design effectiveness under various problem structures. In this second part of a twopart work, theoretical and experimental results are compared for two parallel clusters with different system scale and network connectivity. The first part studied performance sensitivity to important parameters for problem configurations and parallel schemes, using performance metrics such as memory usage, load balancing, and parallel efficiency. Here linear regression models are used to characterize two major overhead sources—interprocessor communication and processor idleness—and also applied to the isoefficiency functions in scalability analysis. For a variety of highdimensional problems and large scale systems, the massively parallel design has achieved reasonable performance. The results of the performance study provide guidance for efficient problem and scheme configuration. More importantly, the design considerations and analysis techniques generalize to the transformation of other global search algorithms into effective large scale parallel optimization tools.
- POLSYS GLP: A Parallel General Linear Product Homotopy Code for Solving Polynomial Systems of EquationsSu, Hai-Jun; McCarthy, J. Michael; Sosonkina, Masha; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2004)Globally convergent, probability-one homotopy methods have proven to be very effective for finding all the isolated solutions to polynomial systems of equations. After many years of development, homotopy path trackers based on probability-one homotopy methods are reliable and fast. Now, theoretical advances reducing the number of homotopy paths that must be tracked, and in the handling of singular solutions, have made probability-one homotopy methods even more practical. POLSYS GLP consists of Fortran 95 modules for nding all isolated solutions of a complex coefficient polynomial system of equations. The package is intended to be used on a distributed memory multiprocessor in conjunction with HOMPACK90 (Algorithm 777), and makes extensive use of Fortran 95 derived data types and MPI to support a general linear product (GLP) polynomial system structure. GLP structure is intermediate between the partitioned linear product structure used by POLSYS PLP (Algorithm 801) and the BKK-based structure used by PHCPACK. The code requires a GLP structure as input, and although nding the optimal GLP structure is a dicult combinatorial problem, generally physical or engineering intuition about a problem yields a very good GLP structure. POLSYS GLP employs a sophisticated power series end game for handling singular solutions, and provides support for problem denition both at a high level and via hand-crafted code. Dierent GLP structures and their corresponding Bezout numbers can be systematically explored before committing to root finding.
- Scalability Analysis of Parallel GMRES ImplementationsSosonkina, Masha; Allison, Donald C. S.; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2001)Applications involving large sparse nonsymmetric linear systems encourage parallel implementations of robust iterative solution methods, such as GMRES(k). Two parallel versions of GMRES(k) based on different data distributions and using Householder reflections in the orthogonalization phase, and variations of these which adapt the restart value k, are analyzed with respect to scalability (their ability to maintain fixed efficiency with an increase in problem size and number of processors).A theoretical algorithm-machine model for scalability is derived and validated by experiments on three parallel computers, each with different machine characteristics.