Scholarly Works, Statistics

Permanent URI for this collection

Research articles, presentations, and other scholarship

Browse

Recent Submissions

Now showing 1 - 20 of 258
  • A framework for developing a real-time lake phytoplankton forecasting system to support water quality management in the face of global change
    Carey, Cayelan C.; Calder, Ryan S. D.; Figueiredo, Renato J.; Gramacy, Robert B.; Lofton, Mary E.; Schreiber, Madeline E.; Thomas, R. Quinn (Springer, 2024-09-20)
    Phytoplankton blooms create harmful toxins, scums, and taste and odor compounds and thus pose a major risk to drinking water safety. Climate and land use change are increasing the frequency and severity of blooms, motivating the development of new approaches for preemptive, rather than reactive, water management. While several real-time phytoplankton forecasts have been developed to date, none are both automated and quantify uncertainty in their predictions, which is critical for manager use. In response to this need, we outline a framework for developing the first automated, real-time lake phytoplankton forecasting system that quantifies uncertainty, thereby enabling managers to adapt operations and mitigate blooms. Implementation of this system calls for new, integrated ecosystem and statistical models; automated cyberinfrastructure; effective decision support tools; and training for forecasters and decision makers. We provide a research agenda for the creation of this system, as well as recommendations for developing real-time phytoplankton forecasts to support management.
  • Augmenting a Simulation Campaign for Hybrid Computer Model and Field Data Experiments
    Koermer, Scott; Loda, Justin; Noble, Aaron; Gramacy, Robert B. (Taylor & Francis, 2024-05-24)
    The Kennedy and O’Hagan (KOH) calibration framework uses coupled Gaussian processes (GPs) to meta-model an expensive simulator (first GP), tune its “knobs” (calibration inputs) to best match observations from a real physical/field experiment and correct for any modeling bias (second GP) when predicting under new field conditions (design inputs). There are well-established methods for placement of design inputs for data-efficient planning of a simulation campaign in isolation, that is, without field data: space-filling, or via criterion like minimum integrated mean-squared prediction error (IMSPE). Analogues within the coupled GP KOH framework are mostly absent from the literature. Here we derive a closed form IMSPE criterion for sequentially acquiring new simulator data for KOH. We illustrate how acquisitions space-fill in design space, but concentrate in calibration space. Closed form IMSPE precipitates a closed-form gradient for efficient numerical optimization. We demonstrate that our KOH-IMSPE strategy leads to a more efficient simulation campaign on benchmark problems, and conclude with a showcase on an application to equilibrium concentrations of rare earth elements for a liquid–liquid extraction reaction.
  • Semiparametric change points detection using single index spatial random effects model in environmental epidemiology study
    Mahmoud, Hamdy F. F.; Kim, Inyoung (Public Library of Science, 2024-12-12)
    Environmental health studies are of great interest in research to evaluate the mortality-temperature relationship by adjusting spatially correlated random effects as well as identifying significant change points in temperature. However, this relationship is often not expressed using parametric models, which makes identifying change points an even more challenging problem. This paper proposes a unified semiparametric approach to simultaneously identify the nonlinear mortality-temperature relationship and detect spatially-dependent change points. A unified method is proposed for the model estimation, spatially dependent change points detection, and testing whether they are significant simultaneously by a permutation-based test. We operate under the assumption that change points remain constant, yet acknowledge the uncertainty regarding their precise number. These change points are influenced by the smoothing of an unknown function, which in turn relies on a smoothing variable and spatial random effects. Consequently, the detection of change points may be influenced by spatial effects. In this paper, several simulation studies are conducted to evaluate the performance of our proposed approach. The advantages of this unified approach are demonstrated using epidemiological data on mortality and temperature.
  • Plasma SOMAmer proteomics of postoperative delirium
    Leung, Jacqueline M.; Rojas, Julio C.; Sands, Laura P.; Chan, Brandon; Rajbanshi, Binita; Du, Zhiyuan; Du, Pang (Wiley, 2024-02-12)
    Background: Postoperative delirium is prevalent in older adults and has been shown to increase the risk of long-term cognitive decline. Plasma biomarkers to identify the risk for postoperative delirium and the risk of Alzheimer's disease and related dementias are needed. Methods: This biomarker discovery case–control study aimed to identify plasma biomarkers associated with postoperative delirium. Patients aged ≥65 years undergoing major elective noncardiac surgery were recruited. The preoperative plasma proteome was interrogated with SOMAmer-based technology targeting 1433 biomarkers. Results: In 40 patients (20 with vs. 20 without postoperative delirium), a preoperative panel of 12 biomarkers discriminated patients with postoperative delirium with an accuracy of 97.5%. The final model of five biomarkers delivered a leave-one-out cross-validation accuracy of 80%. Represented biological pathways included lysosomal and immune response functions. Conclusion: In older patients who have undergone major surgery, plasma SOMAmer proteomics may provide a relatively non-invasive benchmark to identify biomarkers associated with postoperative delirium.
  • Identifying Barriers and Bridging Gaps Between Researchers and Decision Makers in Water Quality Modeling
    Chowdhury, Mahabub; Carey, Cayelan C.; Figueiredo, Renato; Gramacy, Robert; Hoffman, Kathryn; Lofton, Mary; Patil, Parul; Schreiber, Madeline; Thomas, R. Quinn; Calder, Ryan S. D. (2024-12-12)
  • Predicting patients with septic shock and sepsis through analyzing whole-blood expression of NK cell-related hub genes using an advanced machine learning framework
    Du, Chao; Tan, Stephanie C.; Bu, Heng-Fu; Subramanian, Saravanan; Geng, Hua; Wang, Xiao; Xie, Hehuang; Wu, Xiaowei; Zhou, Tingfa; Liu, Ruijin; Xu, Zhen; Liu, Bing; Tan, Xiao-Di (Frontiers, 2024-11-28)
    Background: Sepsis is a life-threatening condition that causes millions of deaths globally each year. The need for biomarkers to predict the progression of sepsis to septic shock remains critical, with rapid, reliable methods still lacking. Transcriptomics data has recently emerged as a valuable resource for disease phenotyping and endotyping, making it a promising tool for predicting disease stages. Therefore, we aimed to establish an advanced machine learning framework to predict sepsis and septic shock using transcriptomics datasets with rapid turnaround methods. Methods: We retrieved four NCBI GEO transcriptomics datasets previously generated from peripheral blood samples of healthy individuals and patients with sepsis and septic shock. The datasets were processed for bioinformatic analysis and supplemented with a series of bench experiments, leading to the identification of a hub gene panel relevant to sepsis and septic shock. The hub gene panel was used to establish a novel prediction model to distinguish sepsis from septic shock through a multistage machine learning pipeline, incorporating linear discriminant analysis, risk score analysis, and ensemble method combined with Least Absolute Shrinkage and Selection Operator analysis. Finally, we validated the prediction model with the hub gene dataset generated by RT-qPCR using peripheral blood samples from newly recruited patients. Results: Our analysis led to identify six hub genes (GZMB, PRF1, KLRD1, SH2D1A, LCK, and CD247) which are related to NK cell cytotoxicity and septic shock, collectively termed 6-HubGss. Using this panel, we created SepxFindeR, a machine learning model that demonstrated high accuracy in predicting sepsis and septic shock and distinguishing septic shock from sepsis in a cross-database context. Remarkably, the SepxFindeR model proved compatible with RT-qPCR datasets based on the 6-HubGss panel, facilitating the identification of newly recruited patients with sepsis and septic shock. Conclusions: Our bioinformatic approach led to the discovery of the 6-HubGss biomarker panel and the development of the SepxFindeR machine learning model, enabling accurate prediction of septic shock and distinction from sepsis with rapid processing capabilities.
  • Near-term ecological forecasting for climate change action
    Dietze, Michael; White, Ethan P.; Abeyta, Antoinette; Boettiger, Carl; Bueno Watts, Nievita; Carey, Cayelan C.; Chaplin-Kramer, Rebecca; Emanuel, Ryan E.; Ernest, S. K. Morgan; Figueiredo, Renato J.; Gerst, Michael D.; Johnson, Leah R.; Kenney, Melissa A.; McLachlan, Jason S.; Paschalidis, Ioannis Ch.; Peters, Jody A.; Rollinson, Christine R.; Simonis, Juniper; Sullivan-Wiley, Kira; Thomas, R. Quinn; Wardle, Glenda M.; Willson, Alyssa M.; Zwart, Jacob (Springer Nature, 2024-11-08)
    A substantial increase in predictive capacity is needed to anticipate and mitigate the widespread change in ecosystems and their services in the face of climate and biodiversity crises. In this era of accelerating change, we cannot rely on historical patterns or focus primarily on long-term projections that extend decades into the future. In this Perspective, we discuss the potential of near-term (daily to decadal) iterative ecological forecasting to improve decision-making on actionable time frames. We summarize the current status of ecological forecasting and focus on how to scale up, build on lessons from weather forecasting, and take advantage of recent technological advances. We also highlight the need to focus on equity, workforce development, and broad cross-disciplinary and non-academic partnerships.
  • Selective Reduction of Socioeconomic Disparities in the Experimental Tobacco Marketplace: Effects of Cigarette and E-cigarette Flavor Restrictions
    Freitas-Lemos, Roberta; Tegge, Allison N.; Shevorykin, Alina; Tomlinson, Devin C.; Athamneh, Liqa N.; Stein, Jeffrey S.; Sheffer, Christine E.; Shields, Peter G.; Hatsukami, Dorothy K. (Oxford University Press, 2024-06)
    Introduction: Cigarette smoking accounts for >30% of the socioeconomic gap in life expectancy. Flavored restrictions claim to promote equity; however, no previous studies have compared the effect of cigarette and e-cigarette flavor restrictions among individuals who smoke with lower and higher socioeconomic status (SES). Aims and Methods: In a between-group within-subject design, individuals with lower (n = 155) and higher (n = 125) SES completed hypothetical purchasing trials in the experimental tobacco marketplace (ETM). Conditions were presented in a 2 × 2 factorial design (cigarette flavors restricted or unrestricted and e-cigarette flavors restricted or unrestricted) with increasing cigarette prices across trials. Results: Results show (1) SES differences in cigarette, e-cigarette, and NRT purchases under unrestricted policies, with lower SES showing higher cigarette demand and lower e-cigarette and NRT substitution than higher SES, (2) cigarette restrictions decreased cigarette and increased NRT purchases among lower SES, but no significant changes among higher SES, (3) decreased SES differences in cigarette demand under cigarette restrictions, but persistence under e-cigarette restrictions or their combination, (4) persistence of SES differences in e-cigarette purchases when all restrictions were enforced, and (5) waning of SES differences in NRT purchasing under all restrictions. Conclusions: Flavor restrictions differentially affected individuals based on SES. Within-group comparisons demonstrated restrictions significantly impacted lower SES, but not higher SES. Between-group comparisons showed SES differences in cigarette purchasing decreased under cigarette restrictions, but persisted under e-cigarette-restrictions or their combination. Additionally, SES differences in NRT substitution decreased under flavor restrictions. These findings highlight the utility of the ETM to investigate SES disparities. Implications: With increasing trends of socioeconomic differences in smoking prevalence and cessation rates, smoking-related health disparities are expected to continue to widen. Restricting menthol flavor in cigarettes while enhancing the availability and affordability of NRT have the potential to alleviate SES disparities in tobacco use, therefore, positively impacting health equity. However, this effect may depend on flavor availability in other tobacco products.
  • Local knowledge reconstructs historical resource use
    Castello, Leandro; Martins, Eduardo G.; Sorice, Michael G.; Smith, Eric P.; Almedia, Morgana; Bastos, Gastao C.C.; Gardoso, Luis G.; Clauzet, Mariana; Dopona, Alisson P.; Ferreira, Beatrice; Haimovic, Manuel; Jorge, Marcelo; Mendonça, Jocemar; Ávila-da- Silva, Antonio O.; Roman, Ana P.O.; Ramires, Milena; de Miranda, Laura V.; Lopes, Priscila F.M. (Wiley, 2024-03-07)
    Information on natural resource exploitation is vital for conservation but scarce in developing nations, which encompass most of the world and often lack the capacity to produce it. A growing approach to generate information about resource use in the context of developing nations relies on surveys of resource users about their recollections (recall) of past harvests. However, the reliability of harvest recalls remains unclear. Here, we show that harvest recalls can be as accurate to data collected by standardized protocols, despite that recalls are variable and affected by the age of the recollecting person and the length of time elapsed since the event. Samples of harvest recalls permit relatively reliable reconstruction of harvests for up to 39 years in the past. Harvest recalls therefore have strong potential to inform data-poor resource systems and curb shifting baselines around the world at a fraction of the cost of conventional approaches.
  • Learning Common Knowledge Networks Via Exponential Random Graph Models
    Liu, Xueying; Hu, Zhihao; Deng, Xinwei; Kuhlman, Chris (ACM, 2023-11-06)
    Common knowledge (CK) is a phenomenon where each individual within a group knows the same information and everyone knows that everyone knows the information, infinitely recursively. CK spreads information as a contagion through social networks in ways different from other models like susceptibleinfectious- recovered (SIR) model. In a model of CK on Facebook, the biclique serves as the characterizing graph substructure for generating CK, as all nodes within a biclique share CK through their walls. To understand the effects of network structure on CKbased contagion, it is necessary to control the numbers and sizes of bicliques in networks. Thus, learning how to generate these CK networks (CKNs) is important. Consequently, we develop an exponential random graph model (ERGM) that constructs networks while controlling for bicliques. Our method offers powerful prediction and inference, reduces computational costs significantly, and has proven its merit in contagion dynamics through numerical experiments.
  • Antibiotic exposure is associated with decreased risk of psychiatric disorders
    Kerman, Ilan A.; Glover, Matthew E.; Lin, Yezhe; West, Jennifer L.; Hanlon, Alexandra L.; Kablinger, Anita S.; Clinton, Sarah M. (Frontiers, 2024-01-08)
    Objective: This study sought to investigate the relationship between antibiotic exposure and subsequent risk of psychiatric disorders. Methods: This retrospective cohort study used a national database of 69 million patients from 54 large healthcare organizations. We identified a cohort of 20,214 (42.5% male; 57.9 ± 15.1 years old [mean ± SD]) adults without prior neuropsychiatric diagnoses who received antibiotics during hospitalization. Matched controls included 41,555 (39.6% male; 57.3 ± 15.5 years old) hospitalized adults without antibiotic exposure. The two cohorts were balanced for potential confounders, including demographics and variables with potential to affect: the microbiome, mental health, medical comorbidity, and overall health status. Data were stratified by age and by sex, and outcome measures were assessed starting 6 months after hospital discharge. Results: Antibiotic exposure was consistently associated with a significant decrease in the risk of novel mood disorders and anxiety and stressor-related disorders in: men (mood (OR 0.84, 95% CI 0.77, 0.91), anxiety (OR 0.88, 95% CI 0.82, 0.95), women (mood (OR 0.94, 95% CI 0.89,1.00), anxiety (OR 0.93, 95% CI 0.88, 0.98), those who are 26–49 years old (mood (OR 0.87, 95% CI 0.80, 0.94), anxiety (OR 0.90, 95% CI 0.84, 0.97)), and in those ≥50 years old (mood (OR 0.91, 95% CI 0.86, 0.97), anxiety (OR 0.92, 95% CI 0.87, 0.97). Risk of intentional harm and suicidality was decreased in men (OR 0.73, 95% CI 0.55, 0.98) and in those ≥50 years old (OR 0.67, 95% CI 0.49, 0.92). Risk of psychotic disorders was also decreased in subjects ≥50 years old (OR 0.83, 95 CI: 0.69, 0.99). Conclusion: Use of antibiotics in the inpatient setting is associated with protective effects against multiple psychiatric outcomes in an age- and sex-dependent manner.
  • Effects of establishment fertilization on Landsat-assessed leaf area development of loblolly pine stands
    House, Matthew N.; Wynne, Randolph H.; Thomas, Valerie A.; Cook, Rachel L.; Carter, David R.; Van Mullekom, Jennifer H.; Rakestraw, Jim; Schroeder, Todd A. (Elsevier, 2024-03-15)
    Loblolly pine (Pinus taeda L.) plantations in the southeastern United States are among the world's most intensively managed forest plantations. Under intensive management, a common practice is fertilizing at establishment. The objective of this study was to investigate the effect of establishment fertilization on leaf area development of loblolly pine plantation stands (n = 3997) over 16 years compared to stands that did not receive nutrient additions at planting. Leaf area index (LAI) is a meaningful biophysical indicator of vigor and an important functional and structural element of a planted stand. The study area was stratified by plant hardiness zone to account for climatic differences and soil type (texture and drainage class), using the Cooperative Research in Forest Fertilization (CRIFF) groupings. LAI was estimated from Landsat imagery to create trajectories of mean stand LAI over 16 years. Establishment fertilization, on average, (1) increased stand LAI beginning at year two, with a peak at years six and seven, and (2) decreased the time required for a stand to reach a winter LAI of 1.5 by almost two years. Fertilization responses varied by climate zone and soil drainage class, where the warmest zones benefited the most, particularly in poorly drained soils. Past year 10, the differences in LAI between fertilized and unfertilized stands were not practically important. Using Landsat data in a cloud-computing environment, we demonstrated the benefits of establishment fertilization to stand LAI development using a large sample over the native range of loblolly pine.
  • Age-dependent ventilator-induced lung injury: Mathematical modeling, experimental data, and statistical analysis
    Hay, Quintessa; Grubb, Christopher; Minucci, Sarah; Valentine, Michael S.; Van Mullekom, Jennifer H.; Heise, Rebecca L.; Reynolds, Angela M. (PLOS, 2024-02-22)
    A variety of pulmonary insults can prompt the need for life-saving mechanical ventilation; however, misuse, prolonged use, or an excessive inflammatory response, can result in ventilator-induced lung injury. Past research has observed an increased instance of respiratory distress in older patients and differences in the inflammatory response. To address this, we performed high pressure ventilation on young (2-3 months) and old (20-25 months) mice for 2 hours and collected data for macrophage phenotypes and lung tissue integrity. Large differences in macrophage activation at baseline and airspace enlargement after ventilation were observed in the old mice. The experimental data was used to determine plausible trajectories for a mathematical model of the inflammatory response to lung injury which includes variables for the innate inflammatory cells and mediators, epithelial cells in varying states, and repair mediators. Classification methods were used to identify influential parameters separating the parameter sets associated with the young or old data and separating the response to ventilation, which was measured by changes in the epithelial state variables. Classification methods ranked parameters involved in repair and damage to the epithelial cells and those associated with classically activated macrophages to be influential. Sensitivity results were used to determine candidate in-silico interventions and these interventions were most impact for transients associated with the old data, specifically those with poorer lung health prior to ventilation. Model results identified dynamics involved in M1 macrophages as a focus for further research, potentially driving the age-dependent differences in all macrophage phenotypes. The model also supported the pro-inflammatory response as a potential indicator of age-dependent differences in response to ventilation. This mathematical model can serve as a baseline model for incorporating other pulmonary injuries.
  • Nonparametric Bayesian Functional Clustering with Applications to Racial Disparities in Breast Cancer
    Gao, Wenyu; Kim, Inyoung; Nam, Wonil; Ren, Xiang; Zhou, Wei; Agah, Masoud (Wiley, 2024-01)
    As we have easier access to massive data sets, functional analyses have gained more interest. However, such data sets often contain large heterogeneities, noises, and dimensionalities. When generalizing the analyses from vectors to functions, classical methods might not work directly. This paper considers noisy information reduction in functional analyses from two perspectives: functional clustering to group similar observations and thus reduce the sample size and functional variable selection to reduce the dimensionality. The complicated data structures and relations can be easily modeled by a Bayesian hierarchical model due to its flexibility. Hence, this paper proposes a nonparametric Bayesian functional clustering and peak point selection method via weighted Dirichlet process mixture (WDPM) modeling that automatically clusters and provides accurate estimations, together with conditional Laplace prior, which is a conjugate variable selection prior. The proposed method is named WDPM-VS for short, and is able to simultaneously perform the following tasks: (1) Automatic cluster without specifying the number of clusters or cluster centers beforehand; (2) Cluster for heterogeneously behaved functions; (3) Select vibrational peak points; and (4) Reduce noisy information from the two perspectives: sample size and dimensionality. The method will greatly outperform its comparison methods in root mean squared errors. Based on this proposed method, we are able to identify biological factors that can explain the breast cancer racial disparities.
  • Head Impact Exposure in Youth and Collegiate American Football
    Choi, Grace B.; Smith, Eric P.; Duma, Stefan M.; Rowson, Steven; Campolettano, Eamon; Kelley, Mireille E.; Jones, Derek A.; Stitzel, Joel D.; Urban, Jillian E.; Genemaras, Amaris; Beckwith, Jonathan G.; Greenwald, Richard M.; Maerlender, Arthur; Crisco, Joseph J. (Springer, 2022-05-04)
    The relationship between head impact and subsequent brain injury for American football players is not well-defined, especially for youth. The objective of this study is to quantify and assess Head Impact Exposure (HIE) metrics among youth and collegiate football players. This multi-season study enrolled 639 unique athletes (354 collegiate; 285 youth, ages 9–14), recording 476,209 head impacts (367,337 collegiate; 108,872 youth) over 971 sessions (480 collegiate; 491 youth). Youth players experienced 43 and 65% fewer impacts per competition and practice, respectively, and lower impact magnitudes compared to collegiate players (95th percentile peak linear acceleration (PLA, g) competition: 45.6 vs 61.9; 95th percentile PLA practice: 42.6 vs 58.8; 95th percentile peak rotational acceleration (PRA, rad·s−2) competition: 2262 vs 4422; 95th percentile PRA practice: 2081 vs 4052; 95th percentile HITsp competition: 25.4 vs 32.8; 95th percentile HITsp practice: 23.9 vs 30.2). Impacts during competition were more frequent and of greater magnitude than during practice at both levels. Quantified comparisons of head impact frequency and magnitude between youth and collegiate athletes reveal HIE differences as a function of age, and expanded insight better informs the development of age-appropriate guidelines for helmet design, prevention measures, standardized testing, brain injury diagnosis, and recovery management.
  • Public and industry knowledge and perceptions of US swine industry castration practices
    Neary, Jessica M.; Guthrie, Adeline P.; Jacobs, Leonie (Cambridge University Press, 2023-12-22)
    In the United States (US), surgical castration of male piglets is typically performed without any form of analgesia. This may raise concerns with the public; however, there is no information regarding current public knowledge on swine industry practices in the US. In this study we gained insight into public knowledge and perception on castration with and without analgesia in comparison to knowledge of industry stakeholders on these same topics. Through an online survey, 119 respondents were asked four questions about castration in the US swine industry. Industry respondents were contacted via social media and networking. The general public sample was accessed through Mechanical Turk. Survey responses were categorised by experience (industry vs public). Industry respondents were more aware of practices compared to the general public. Most public respondents were unaware of castration practices and the lack of analgesia use. Respondents from rural communities were more aware of castration practices than (sub)urban communities and more aware of analgesia use than those from urban communities. Those with more education had greater awareness of castration practices (occurrence not frequency). Based on the results from this first US sample, knowledge on industry practices was especially lacking for public respondents, but also for a minority of industry respondents, indicating opportunities for education and further research on the topic.
  • Paradoxes
    Datta, Jyotishka (International Indian Statistical Association, 2023-12-29)
  • Merging Two Cultures: Deep and Statistical Learning
    Bhadra, Anindya; Datta, Jyotishka; Polson, Nick; Sokolov, Vadim; Xu, Jianeng (2021-10-21)
    Merging the two cultures of deep and statistical learning provides insights into structured high-dimensional data. Traditional statistical modeling is still a dominant strategy for structured tabular data. Deep learning can be viewed through the lens of generalized linear models (GLMs) with composite link functions. Sufficient dimensionality reduction (SDR) and sparsity performs nonlinear feature engineering. We show that prediction, interpolation and uncertainty quantification can be achieved using probabilistic methods at the output layer of the model. Thus a general framework for machine learning arises that first generates nonlinear features (a.k.a factors) via sparse regularization and stochastic gradient optimisation and second uses a stochastic output layer for predictive uncertainty. Rather than using shallow additive architectures as in many statistical models, deep learning uses layers of semi affine input transformations to provide a predictive rule. Applying these layers of transformations leads to a set of attributes (a.k.a features) to which predictive statistical methods can be applied. Thus we achieve the best of both worlds: scalability and fast predictive rule construction together with uncertainty quantification. Sparse regularisation with un-supervised or supervised learning finds the features. We clarify the duality between shallow and wide models such as PCA, PPR, RRR and deep but skinny architectures such as autoencoders, MLPs, CNN, and LSTM. The connection with data transformations is of practical importance for finding good network architectures. By incorporating probabilistic components at the output level we allow for predictive uncertainty. For interpolation we use deep Gaussian process and ReLU trees for classification. We provide applications to regression, classification and interpolation. Finally, we conclude with directions for future research.
  • Nonparametric Bayes multiresolution testing for high-dimensional rare events
    Datta, Jyotishka; Banerjee, Sayantan; Dunson, David B. (2024-01)
    In a variety of application areas, there is interest in assessing evidence of differences in the intensity of event realizations between groups. For example, in cancer genomic studies collecting data on rare variants, the focus is on assessing whether and how the variant profile changes with the disease subtype. Motivated by this application, we develop multiresolution nonparametric Bayes tests for differential mutation rates across groups. The multiresolution approach yields fast and accurate detection of spatial clusters of rare variants, and our nonparametric Bayes framework provides great flexibility for modeling the intensities of rare variants. Some theoretical properties are also assessed, including weak consistency of our Dirichlet Process-Poisson-Gamma mixture over multiple resolutions. Simulation studies illustrate excellent small sample properties relative to competitors, and we apply the method to detect rare variants related to common variable immunodeficiency from whole exome sequencing data on 215 patients and over 60,027 control subjects.
  • Ultra-Fast Approximate Inference Using Variational Functional Mixed Models
    Huo, Shuning; Morris, Jeffrey S.; Zhu, Hongxiao (Taylor & Francis, 2023-04-03)
    While Bayesian functional mixed models have been shown effective to model functional data with various complex structures, their application to extremely high-dimensional data is limited due to computational challenges involved in posterior sampling. We introduce a new computational framework that enables ultra-fast approximate inference for high-dimensional data in functional form. This framework adopts parsimonious basis to represent functional observations, which facilitates efficient compression and parallel computing in basis space. Instead of performing expensive Markov chain Monte Carlo sampling, we approximate the posterior distribution using variational Bayes and adopt a fast iterative algorithm to estimate parameters of the approximate distribution. Our approach facilitates a fast multiple testing procedure in basis space, which can be used to identify significant local regions that reflect differences across groups of samples. We perform two simulation studies to assess the performance of approximate inference, and demonstrate applications of the proposed approach by using a proteomic mass spectrometry dataset and a brain imaging dataset. Supplementary materials for this article are available online.