Browsing by Author "Van Mullekom, Jennifer H."
Now showing 1 - 13 of 13
Results Per Page
Sort Options
- Advanced Nonparametric Bayesian Functional ModelingGao, Wenyu (Virginia Tech, 2020-09-04)Functional analyses have gained more interest as we have easier access to massive data sets. However, such data sets often contain large heterogeneities, noise, and dimensionalities. When generalizing the analyses from vectors to functions, classical methods might not work directly. This dissertation considers noisy information reduction in functional analyses from two perspectives: functional variable selection to reduce the dimensionality and functional clustering to group similar observations and thus reduce the sample size. The complicated data structures and relations can be easily modeled by a Bayesian hierarchical model, or developed from a more generic one by changing the prior distributions. Hence, this dissertation focuses on the development of Bayesian approaches for functional analyses due to their flexibilities. A nonparametric Bayesian approach, such as the Dirichlet process mixture (DPM) model, has a nonparametric distribution as the prior. This approach provides flexibility and reduces assumptions, especially for functional clustering, because the DPM model has an automatic clustering property, so the number of clusters does not need to be specified in advance. Furthermore, a weighted Dirichlet process mixture (WDPM) model allows for more heterogeneities from the data by assuming more than one unknown prior distribution. It also gathers more information from the data by introducing a weight function that assigns different candidate priors, such that the less similar observations are more separated. Thus, the WDPM model will improve the clustering and model estimation results. In this dissertation, we used an advanced nonparametric Bayesian approach to study functional variable selection and functional clustering methods. We proposed 1) a stochastic search functional selection method with application to 1-M matched case-crossover studies for aseptic meningitis, to examine the time-varying unknown relationship and find out important covariates affecting disease contractions; 2) a functional clustering method via the WDPM model, with application to three pathways related to genetic diabetes data, to identify essential genes distinguishing between normal and disease groups; and 3) a combined functional clustering, with the WDPM model, and variable selection approach with application to high-frequency spectral data, to select wavelengths associated with breast cancer racial disparities.
- Age-dependent ventilator-induced lung injury: Mathematical modeling, experimental data, and statistical analysisHay, Quintessa; Grubb, Christopher; Minucci, Sarah; Valentine, Michael S.; Van Mullekom, Jennifer H.; Heise, Rebecca L.; Reynolds, Angela M. (PLOS, 2024-02-22)A variety of pulmonary insults can prompt the need for life-saving mechanical ventilation; however, misuse, prolonged use, or an excessive inflammatory response, can result in ventilator-induced lung injury. Past research has observed an increased instance of respiratory distress in older patients and differences in the inflammatory response. To address this, we performed high pressure ventilation on young (2-3 months) and old (20-25 months) mice for 2 hours and collected data for macrophage phenotypes and lung tissue integrity. Large differences in macrophage activation at baseline and airspace enlargement after ventilation were observed in the old mice. The experimental data was used to determine plausible trajectories for a mathematical model of the inflammatory response to lung injury which includes variables for the innate inflammatory cells and mediators, epithelial cells in varying states, and repair mediators. Classification methods were used to identify influential parameters separating the parameter sets associated with the young or old data and separating the response to ventilation, which was measured by changes in the epithelial state variables. Classification methods ranked parameters involved in repair and damage to the epithelial cells and those associated with classically activated macrophages to be influential. Sensitivity results were used to determine candidate in-silico interventions and these interventions were most impact for transients associated with the old data, specifically those with poorer lung health prior to ventilation. Model results identified dynamics involved in M1 macrophages as a focus for further research, potentially driving the age-dependent differences in all macrophage phenotypes. The model also supported the pro-inflammatory response as a potential indicator of age-dependent differences in response to ventilation. This mathematical model can serve as a baseline model for incorporating other pulmonary injuries.
- Deep Gaussian Process Surrogates for Computer ExperimentsSauer, Annie Elizabeth (Virginia Tech, 2023-04-27)Deep Gaussian processes (DGPs) upgrade ordinary GPs through functional composition, in which intermediate GP layers warp the original inputs, providing flexibility to model non-stationary dynamics. Recent applications in machine learning favor approximate, optimization-based inference for fast predictions, but applications to computer surrogate modeling - with an eye towards downstream tasks like Bayesian optimization and reliability analysis - demand broader uncertainty quantification (UQ). I prioritize UQ through full posterior integration in a Bayesian scheme, hinging on elliptical slice sampling of latent layers. I demonstrate how my DGP's non-stationary flexibility, combined with appropriate UQ, allows for active learning: a virtuous cycle of data acquisition and model updating that departs from traditional space-filling designs and yields more accurate surrogates for fixed simulation effort. I propose new sequential design schemes that rely on optimization of acquisition criteria through evaluation of strategically allocated candidates instead of numerical optimizations, with a motivating application to contour location in an aeronautics simulation. Alternatively, when simulation runs are cheap and readily available, large datasets present a challenge for full DGP posterior integration due to cubic scaling bottlenecks. For this case I introduce the Vecchia approximation, popular for ordinary GPs in spatial data settings. I show that Vecchia-induced sparsity of Cholesky factors allows for linear computational scaling without compromising DGP accuracy or UQ. I vet both active learning and Vecchia-approximated DGPs on numerous illustrative examples and real computer experiments. I provide open-source implementations in the "deepgp" package for R on CRAN.
- Effects of establishment fertilization on Landsat-assessed leaf area development of loblolly pine standsHouse, Matthew N.; Wynne, Randolph H.; Thomas, Valerie A.; Cook, Rachel L.; Carter, David R.; Van Mullekom, Jennifer H.; Rakestraw, Jim; Schroeder, Todd A. (Elsevier, 2024-03-15)Loblolly pine (Pinus taeda L.) plantations in the southeastern United States are among the world's most intensively managed forest plantations. Under intensive management, a common practice is fertilizing at establishment. The objective of this study was to investigate the effect of establishment fertilization on leaf area development of loblolly pine plantation stands (n = 3997) over 16 years compared to stands that did not receive nutrient additions at planting. Leaf area index (LAI) is a meaningful biophysical indicator of vigor and an important functional and structural element of a planted stand. The study area was stratified by plant hardiness zone to account for climatic differences and soil type (texture and drainage class), using the Cooperative Research in Forest Fertilization (CRIFF) groupings. LAI was estimated from Landsat imagery to create trajectories of mean stand LAI over 16 years. Establishment fertilization, on average, (1) increased stand LAI beginning at year two, with a peak at years six and seven, and (2) decreased the time required for a stand to reach a winter LAI of 1.5 by almost two years. Fertilization responses varied by climate zone and soil drainage class, where the warmest zones benefited the most, particularly in poorly drained soils. Past year 10, the differences in LAI between fertilized and unfertilized stands were not practically important. Using Landsat data in a cloud-computing environment, we demonstrated the benefits of establishment fertilization to stand LAI development using a large sample over the native range of loblolly pine.
- Inference for Populations: Uncertainty Propagation via Bayesian Population SynthesisGrubb, Christopher Thomas (Virginia Tech, 2023-08-16)In this dissertation, we develop a new type of prior distribution, specifically for populations themselves, which we denote the Dirichlet Spacing prior. This prior solves a specific problem that arises when attempting to create synthetic populations from a known subset: the unfortunate reality that assuming independence between population members means that every synthetic population will be essentially the same. This is a problem because any model which only yields one result (several very similar results), when we have very incomplete information, is fundamentally flawed. We motivate our need for this new class of priors using Agent-based Models, though this prior could be used in any situation requiring synthetic populations.
- Mechanistic insights into the effect of humidity on airborne influenza virus survival, transmission and incidenceMarr, Linsey C.; Tang, Julian W.; Van Mullekom, Jennifer H.; Lakdawala, Seema S. (Royal Society Publishing, 2019-01-16)Influenza incidence and seasonality, along with virus survival and transmission, appear to depend at least partly on humidity, and recent studies have suggested that absolute humidity (AH) is more important than relative humidity (RH) in modulating observed patterns. In this perspective article, we re-evaluate studies of influenza virus survival in aerosols, transmission in animal models and influenza incidence to show that the combination of temperature and RH is equally valid as AH as a predictor. Collinearity must be considered, as higher levels of AH are only possible at higher temperatures, where it is well established that virus decay is more rapid. In studies of incidence that employ meteorological data, outdoor AH may be serving as a proxy for indoor RH in temperate regions during the wintertime heating season. Finally, we present a mechanistic explanation based on droplet evaporation and its impact on droplet physics and chemistry for why RH is more likely than AH to modulate virus survival and transmission.
- Metabolic Reprogramming of Ovarian Cancer Spheroids during AdhesionCompton, Stephanie L. E.; Grieco, Joseph P.; Gollamudi, Benita; Bae, Eric; Van Mullekom, Jennifer H.; Schmelz, Eva M. (MDPI, 2022-03-09)Ovarian cancer remains a deadly disease and its recurrence disease is due in part to the presence of disseminating ovarian cancer aggregates not removed by debulking surgery. During dissemination in a dynamic ascitic environment, the spheroid cells’ metabolism is characterized by low respiration and fragmented mitochondria, a metabolic phenotype that may not support secondary outgrowth after adhesion. Here, we investigated how adhesion affects cellular respiration and substrate utilization of spheroids mimicking early stages of secondary metastasis. Using different glucose and oxygen levels, we investigated cellular metabolism at early time points of adherence (24 h and less) comparing slow and fast-developing disease models. We found that adhesion over time showed changes in cellular energy metabolism and substrate utilization, with a switch in the utilization of mostly glutamine to glucose but no changes in fatty acid oxidation. Interestingly, low glucose levels had less of an impact on cellular metabolism than hypoxia. A resilience to culture conditions and the capacity to utilize a broader spectrum of substrates more efficiently distinguished the highly aggressive cells from the cells representing slow-developing disease, suggesting a flexible metabolism contributes to the stem-like properties. These results indicate that adhesion to secondary sites initiates a metabolic switch in the oxidation of substrates that could support outgrowth and successful metastasis.
- Multiple immunity-related genes control susceptibility of Arabidopsis thaliana to the parasitic weed Phelipanche aegyptiacaClarke, Christopher R.; Park, So-Yon; Tuosto, Robert; Jia, Xiaoyan; Yoder, Amanda; Van Mullekom, Jennifer H.; Westwood, James H. (2020-06-08)Parasitic weeds represent a major threat to agricultural production across the world. Little is known about which host genetic pathways determine compatibility for any host-parasitic plant interaction. We developed a quantitative assay to characterize the growth of the parasitic weed Phelipanche aegyptiaca on 46 mutant lines of the host plant Arabidopsis thaliana to identify host genes that are essential for susceptibility to the parasite. A. thaliana host plants with mutations in genes involved in jasmonic acid biosynthesis/signaling or the negative regulation of plant immunity were less susceptible to P. aegyptiaca parasitization. In contrast, A. thaliana plants with a mutant allele of the putative immunity hub gene Pfd6 were more susceptible to parasitization. Additionally, quantitative PCR revealed that P. aegyptiaca parasitization leads to transcriptional reprograming of several hormone signaling pathways. While most tested A. thaliana lines were fully susceptible to P. aegyptiaca parasitization, this work revealed several host genes essential for full susceptibility or resistance to parasitism. Altering these pathways may be a viable approach for limiting host plant susceptibility to parasitism.
- Optimal weight settings in locally weighted regression: A guidance through cross-validation approachPuri, Roshan (Virginia Tech, 2023)Locally weighted regression is a powerful tool that allows the estimation of different sets of coefficients for each location in the underlying data, challenging the assumption of stationary regression coefficients across a study region. The accuracy of LWR largely depends on how a researcher establishes the relationship across locations, which is often constructed using a weight matrix or function. This paper explores the different kernel functions used to assign weights to observations, including Gaussian, bi-square, and tri-cubic, and how the choice of weight variables and window size affects the accuracy of the estimates. We guide this choice through the cross-validation approach and show that the bi-square function outperforms the choice of other kernel functions. Our findings demonstrate that an optimal window size for LWR models depends on the cross-validation (CV) approach employed. In our empirical application, the full-sample CV guides the choice of a higher window-size case, and CV by proxy guides the choice of a lower window size. Since the CV by Proxy approach focuses on the predictive ability of the model in the vicinity of one specific point (usually a policy point/site), we note that guiding a model choice through this approach makes more intuitive sense when the aim of the researcher is to predict the outcome in one specific site (policy or target point). To identify the optimal weight variables, while we suggest exploring various combinations of weight variables, we argue that an efficient alternative is to merge all continuous variables in the dataset into a single weight variable.
- The predictive capability of immunohistochemistry and DNA sequencing for determining TP53 functional mutation status: a comparative study of 41 glioblastoma patientsRoshandel, Aarash K.; Busch, Christopher M.; Van Mullekom, Jennifer H.; Cuoco, Joshua A.; Rogers, Cara M.; Apfel, Lisa S.; Marvin, Eric A.; Sontheimer, Harald; Umans, Robyn A. (Impact Journals, 2019-10-22)Tumor protein 53 (p53) regulates fundamental pathways of cellular growth and differentiation. Aberrant p53 expression in glioblastoma multiforme, a terminal brain cancer, has been associated with worse patient outcomes and decreased chemosensitivity. Therefore, correctly identifying p53 status in glioblastoma is of great clinical significance. p53 immunohistochemistry is used to detect pathological presence of the TP53 gene product. Here, we examined the relationship between p53 immunoreactivity and TP53 mutation status by DNA Sanger sequencing in adult glioblastoma. Of 41 histologically confirmed samples, 27 (66%) were immunopositive for a p53 mutation via immunohistochemistry. Utilizing gene sequencing, we identified only eight samples (20%) with TP53 functional mutations and one sample with a silent mutation. Therefore, a ≥10% p53 immunohistochemistry threshold for predicting TP53 functional mutation status in glioma is insufficient. Implementing this ≥10% threshold, we demonstrated a remarkably low positive predictive value (30%). Furthermore, the sensitivity and specificity with ≥10% p53 immunohistochemistry to predict TP53 functional mutation status were 100% and 42%, respectively. Our data suggests that unless reliable sequencing methodology is available for confirming TP53 status, raising the immunoreactivity threshold would increase positive and negative predictive values as well as the specificity without changing the sensitivity of the immunohistochemistry assay.
- Relationship of dietary antioxidant intake, antioxidant serum capacity, physical activity and inflammation in breast cancer survivors and individuals without a history of cancerMozhi, Dimple Aneka (Virginia Tech, 2018-07-02)Background: Dietary and serum antioxidants and physical activity can effect inflammation, which is associated with breast cancer risk and recurrence. This study investigated the relationship between diet, serum antioxidant capacity, physical activity, and inflammation in breast cancer survivors and individuals without cancer. Methods: Existing demographic, dietary intake, and physical activity data of 78 breast cancer survivors and 30 individuals without cancer from the Day and Night Study conducted at Virginia Commonwealth University were used. Participants were recruited from southern Virginia. Metabolic equivalents were calculated through type, intensity, and duration of physical activity. Dietary antioxidant intake (FRAP) was calculated from Harvard Food Frequency Questionnaire data. Serum samples were analyzed for inflammation (hsCRP,IL-6,IL-1,and TNF alpha) and serum antioxidant capacity (ORAC) at Virginia Tech. Results: Anthropometrics and inflammation were higher, and FRAP and ORAC lower in breast cancer survivors compared to individuals without cancer, although not significant. There was a significant direct relationship between FRAP and ORAC and inverse relationship between FRAP and hsCRP. Breast cancer survivors 6+ years since diagnosis showed significant direct FRAP and IL-1 association, and inverse ORAC and TNF-alpha association. BMI was directly associated with IL-6 and CRP. Inflammation was not associated with METs or weekly activity, although there was an increasing inverse relation between METs, IL-1 and TNF- α with increasing ORAC. Conclusion: There is a significant relationship between dietary antioxidant intake and serum antioxidant capacity and inflammation. Increased body mass index increases inflammation. Diets high in antioxidants and maintaining a healthy weight may help reduce inflammation in breast cancer survivors.
- The role of statistical distributions in vulnerability to poverty analysisPoghosyan, Armine (Virginia Tech, 2024-04-11)In regions characterized by semi-arid climates where households’ welfare primarily relies on rainfed agricultural activities, extreme weather events such as droughts can present existential challenges to their livelihoods. To mitigate these risks, numerous social protection programs have been established to assist vulnerable households affected by weather events. Despite efforts to monitor environmental changes through remotely sensed technology, estimating the impact of weather variability on livelihoods remains challenging. This is compounded by the need to select appropriate statistical distribution for weather anomaly measures and household characteristics. We address these challenges by analyzing household consumption data from the Living Standards Measurement Study survey in Niger and systematically evaluating how each input factor affects vulnerability estimates. Our findings show that the choice of statistical distribution can significantly alter outcomes. For instance, using alternative statistical distribution for vegetation index readings could lead to differences of up to 0.7%, which means around 150,000 more households might be misclassified as not vulnerable. Similarly, variations in household characteristics could result in differences of up to 10 percentage points, equivalent to approximately 2 million households. Understanding these sensitivities helps policymakers refine targeting and intervention strategies effectively. By tailoring assistance programs more precisely to the needs of vulnerable households, policymakers ensure that resources are directed where they can make the most impact in lessening the adverse effects of extreme weather events. This enhances the resilience of communities in semi-arid regions.
- Worlds Collide through Gaussian Processes: Statistics, Geoscience and Mathematical ProgrammingChristianson, Ryan Beck (Virginia Tech, 2023-05-04)Gaussian process (GP) regression is the canonical method for nonlinear spatial modeling among the statistics and machine learning communities. Geostatisticians use a subtly different technique known as kriging. I shall highlight key similarities and differences between GPs and kriging through the use of large scale gold mining data. Most importantly GPs are largely hands-off, automatically learning from the data whereas kriging requires an expert human in the loop to guide analysis. To emphasize this, I show an imputation method for left censored values frequently seen in mining data. Oftentimes geologists ignore censored values due to the difficulty of imputing with kriging, but GPs execute imputation with relative ease leading to better estimates of the gold surface. My hope is that this research can serve as a springboard to encourage the mining community to consider using GPs over kriging for diverse utility after GP model fitting. Another common use of GPs that would be inefficient for kriging is Bayesian Optimization (BO). Traditionally BO is designed to find a global optima by sequentially sampling from a function of interest using an acquisition function. When two or more local or global optima of the function of interest have similar objective values, it often makes some sense to target the more "robust" solution with a wider domain of attraction. However, traditional BO weighs these solutions the same, favoring whichever has a slightly better objective value. By combining the idea of expected improvement (EI) from the BO community with mathematical programming's concept of an adversary, I introduce a novel algorithm to target robust solutions called robust expected improvement (REI). The adversary penalizes "peaked" areas of the objective function making those values appear less desirable. REI performs acquisitions using EI on the adversarial space yielding data sets focused on the robust solution that exhibit EI's already proven excellent balance of exploration and exploitation.