VTechWorks Repository :: Browsing by Author "Du, Pang"

Browsing by Author "Du, Pang"

Now showing 1 - 20 of 45

Advanced Nonparametric Bayesian Functional Modeling
Gao, Wenyu (Virginia Tech, 2020-09-04)
Functional analyses have gained more interest as we have easier access to massive data sets. However, such data sets often contain large heterogeneities, noise, and dimensionalities. When generalizing the analyses from vectors to functions, classical methods might not work directly. This dissertation considers noisy information reduction in functional analyses from two perspectives: functional variable selection to reduce the dimensionality and functional clustering to group similar observations and thus reduce the sample size. The complicated data structures and relations can be easily modeled by a Bayesian hierarchical model, or developed from a more generic one by changing the prior distributions. Hence, this dissertation focuses on the development of Bayesian approaches for functional analyses due to their flexibilities. A nonparametric Bayesian approach, such as the Dirichlet process mixture (DPM) model, has a nonparametric distribution as the prior. This approach provides flexibility and reduces assumptions, especially for functional clustering, because the DPM model has an automatic clustering property, so the number of clusters does not need to be specified in advance. Furthermore, a weighted Dirichlet process mixture (WDPM) model allows for more heterogeneities from the data by assuming more than one unknown prior distribution. It also gathers more information from the data by introducing a weight function that assigns different candidate priors, such that the less similar observations are more separated. Thus, the WDPM model will improve the clustering and model estimation results. In this dissertation, we used an advanced nonparametric Bayesian approach to study functional variable selection and functional clustering methods. We proposed 1) a stochastic search functional selection method with application to 1-M matched case-crossover studies for aseptic meningitis, to examine the time-varying unknown relationship and find out important covariates affecting disease contractions; 2) a functional clustering method via the WDPM model, with application to three pathways related to genetic diabetes data, to identify essential genes distinguishing between normal and disease groups; and 3) a combined functional clustering, with the WDPM model, and variable selection approach with application to high-frequency spectral data, to select wavelengths associated with breast cancer racial disparities.
Alterations in the molecular composition of COVID-19 patient urine, detected using Raman spectroscopic/computational analysis
Robertson, John L.; Senger, Ryan S.; Talty, Janine; Du, Pang; Sayed-Issa, Amr; Avellar, Maggie L.; Ngo, Lacy T.; Gomez de la Espriella, Mariana; Fazili, Tasaduq N.; Jackson-Akers, Jasmine Y.; Guruli, Georgi; Orlando, Giuseppe (PLOS, 2022-07-01)
We developed and tested a method to detect COVID-19 disease, using urine specimens. The technology is based on Raman spectroscopy and computational analysis. It does not detect SARS-CoV-2 virus or viral components, but rather a urine ‘molecular fingerprint’, representing systemic metabolic, inflammatory, and immunologic reactions to infection. We analyzed voided urine specimens from 46 symptomatic COVID-19 patients with positive real time-polymerase chain reaction (RT-PCR) tests for infection or household contact with test-positive patients. We compared their urine Raman spectra with urine Raman spectra from healthy individuals (n = 185), peritoneal dialysis patients (n = 20), and patients with active bladder cancer (n = 17), collected between 2016–2018 (i.e., pre-COVID-19). We also compared all urine Raman spectra with urine specimens collected from healthy, fully vaccinated volunteers (n = 19) from July to September 2021. Disease severity (primarily respiratory) ranged among mild (n = 25), moderate (n = 14), and severe (n = 7). Seventy percent of patients sought evaluation within 14 days of onset. One severely affected patient was hospitalized, the remainder being managed with home/ambulatory care. Twenty patients had clinical pathology profiling. Seven of 20 patients had mildly elevated serum creatinine values (>0.9 mg/dl; range 0.9–1.34 mg/dl) and 6/7 of these patients also had estimated glomerular filtration rates (eGFR) <90 mL/min/1.73m2 (range 59–84 mL/min/1.73m2). We could not determine if any of these patients had antecedent clinical pathology abnormalities. Our technology (Raman Chemometric Urinalysis—Rametrix®) had an overall prediction accuracy of 97.6% for detecting complex, multimolecular fingerprints in urine associated with COVID-19 disease. The sensitivity of this model for detecting COVID-19 was 90.9%. The specificity was 98.8%, the positive predictive value was 93.0%, and the negative predictive value was 98.4%. In assessing severity, the method showed to be accurate in identifying symptoms as mild, moderate, or severe (random chance = 33%) based on the urine multimolecular fingerprint. Finally, a fingerprint of ‘Long COVID-19’ symptoms (defined as lasting longer than 30 days) was located in urine. Our methods were able to locate the presence of this fingerprint with 70.0% sensitivity and 98.7% specificity in leave-one-out cross-validation analysis. Further validation testing will include sampling more patients, examining correlations of disease severity and/or duration, and employing metabolomic analysis (Gas Chromatography–Mass Spectrometry [GC-MS], High Performance Liquid Chromatography [HPLC]) to identify individual components contributing to COVID-19 molecular fingerprints.
Anomaly Detection in Heterogeneous Data Environments with Applications to Mechanical Engineering Signals & Systems
Milo, Michael William (Virginia Tech, 2013-11-08)
Anomaly detection is a relevant problem in the field of Mechanical Engineering, because the analysis of mechanical systems often relies on identifying deviations from what is considered "normal". The mechanical sciences are represented by a heterogeneous collection of data types: some systems may be highly dimensional, may contain exclusively spatial or temporal data, may be spatiotemporally linked, or may be non-deterministic and best described probabilistically. Given the broad range of data types in this field, it is not possible to propose a single processing method that will be appropriate, or even usable, for all data types. This has led to human observation remaining a common, albeit costly and inefficient, approach to detecting anomalous signals or patterns in mechanical data. The advantages of automated anomaly detection in mechanical systems include reduced monitoring costs, increased reliability of fault detection, and improved safety for users and operators. This dissertation proposes a hierarchical framework for anomaly detection through machine learning, and applies it to three distinct and heterogeneous data types: state-based data, parameter-driven data, and spatiotemporal sensor network data. In time-series data, anomaly detection results were robust in synthetic data generated using multiple simulation algorithms, as well as experimental data from rolling element bearings, with highly accurate detection rates (>99% detection, <1% false alarm). Significant developments were shown in parameter-driven data by reducing the sample sizes necessary for analysis, as well as reducing the time required for computation. The event-space model extends previous work into a geospatial sensor network and demonstrates applications of this type of event modeling at various timescales, and compares the model to results obtained using other approaches. Each data type is processed in a unique way relative to the others, but all are fitted to the same hierarchical structure for system modeling. This hierarchical model is the key development proposed by this dissertation, and makes both novel and significant contributions to the fields of mechanical analysis and data processing. This work demonstrates the effectiveness of the developed approaches, details how they differ from other relevant industry standard methods, and concludes with a proposal for additional research into other data types.
Cluster_Based Profile Monitoring in Phase I Analysis
Chen, Yajuan (Virginia Tech, 2014-03-26)
Profile monitoring is a well-known approach used in statistical process control where the quality of the product or process is characterized by a profile or a relationship between a response variable and one or more explanatory variables. Profile monitoring is conducted over two phases, labeled as Phase I and Phase II. In Phase I profile monitoring, regression methods are used to model each profile and to detect the possible presence of out-of-control profiles in the historical data set (HDS). The out-of-control profiles can be detected by using the statis-tic. However, previous methods of calculating the statistic are based on using all the data in the HDS including the data from the out-of-control process. Consequently, the ability of using this method can be distorted if the HDS contains data from the out-of-control process. This work provides a new profile monitoring methodology for Phase I analysis. The proposed method, referred to as the cluster-based profile monitoring method, incorporates a cluster analysis phase before calculating the statistic. Before introducing our proposed cluster-based method in profile monitoring, this cluster-based method is demonstrated to work efficiently in robust regression, referred to as cluster-based bounded influence regression or CBI. It will be demonstrated that the CBI method provides a robust, efficient and high breakdown regression parameter estimator. The CBI method first represents the data space via a special set of points, referred to as anchor points. Then a collection of single-point-added ordinary least squares regression estimators forms the basis of a metric used in defining the similarity between any two observations. Cluster analysis then yields a main cluster containing at least half the observations, with the remaining observations comprising one or more minor clusters. An initial regression estimator arises from the main cluster, with a group-additive DFFITS argument used to carefully activate the minor clusters through a bounded influence regression frame work. CBI achieves a 50% breakdown point, is regression equivariant, scale and affine equivariant and distributionally is asymptotically normal. Case studies and Monte Carlo results demonstrate the performance advantage of CBI over other popular robust regression procedures regarding coefficient stabil-ity, scale estimation and standard errors. The cluster-based method in Phase I profile monitoring first replaces the data from each sampled unit with an estimated profile, using some appropriate regression method. The estimated parameters for the parametric profiles are obtained from parametric models while the estimated parameters for the nonparametric profiles are obtained from the p-spline model. The cluster phase clusters the profiles based on their estimated parameters and this yields an initial main cluster which contains at least half the profiles. The initial estimated parameters for the population average (PA) profile are obtained by fitting a mixed model (parametric or nonparametric) to those profiles in the main cluster. Profiles that are not contained in the initial main cluster are iteratively added to the main cluster provided their statistics are "small" and the mixed model (parametric or nonparametric) is used to update the estimated parameters for the PA profile. Those profiles contained in the final main cluster are considered as resulting from the in-control process while those not included are considered as resulting from an out-of-control process. This cluster-based method has been applied to monitor both parametric and nonparametric profiles. A simulated example, a Monte Carlo study and an application to a real data set demonstrates the detail of the algorithm and the performance advantage of this proposed method over a non-cluster-based method is demonstrated with respect to more accurate estimates of the PA parameters and improved classification performance criteria. When the profiles can be represented by vectors, the profile monitoring process is equivalent to the detection of multivariate outliers. For this reason, we also compared our proposed method to a popular method used to identify outliers when dealing with a multivariate response. Our study demonstrated that when the out-of-control process corresponds to a sustained shift, the cluster-based method using the successive difference estimator is clearly the superior method, among those methods we considered, based on all performance criteria. In addition, the influence of accurate Phase I estimates on the performance of Phase II control charts is presented to show the further advantage of the proposed method. A simple example and Monte Carlo results show that more accurate estimates from Phase I would provide more efficient Phase II control charts.
Cure Rate Model with Spline Estimated Components
Wang, Lu (Virginia Tech, 2010-07-13)
In some survival analysis of medical studies, there are often long term survivors who can be considered as permanently cured. The goals in these studies are to estimate the cure probability of the whole population and the hazard rate of the noncured subpopulation. The existing methods for cure rate models have been limited to parametric and semiparametric models. More specifically, the hazard function part is estimated by parametric or semiparametric model where the effect of covariate takes a parametric form. And the cure rate part is often estimated by a parametric logistic regression model. We introduce a non-parametric model employing smoothing splines. It provides non-parametric smooth estimates for both hazard function and cure rate. By introducing a latent cure status variable, we implement the method using a smooth EM algorithm. Louis' formula for covariance estimation in an EM algorithm is generalized to yield point-wise confidence intervals for both functions. A simple model selection procedure based on the Kullback-Leibler geometry is derived for the proposed cure rate model. Numerical studies demonstrate excellent performance of the proposed method in estimation, inference and model selection. The application of the method is illustrated by the analysis of a melanoma study.
Cure Rate Models with Nonparametric Form of Covariate Effects
Chen, Tianlei (Virginia Tech, 2015-06-02)
This thesis focuses on development of spline-based hazard estimation models for cure rate data. Such data can be found in survival studies with long term survivors. Consequently, the population consists of the susceptible and non-susceptible sub-populations with the latter termed as "cured". The modeling of both the cure probability and the hazard function of the susceptible sub-population is of practical interest. Here we propose two smoothing-splines based models falling respectively into the popular classes of two component mixture cure rate models and promotion time cure rate models. Under the framework of two component mixture cure rate model, Wang, Du and Liang (2012) have developed a nonparametric model where the covariate effects on both the cure probability and the hazard component are estimated by smoothing splines. Our first development falls under the same framework but estimates the hazard component based on the accelerated failure time model, instead of the proportional hazards model in Wang, Du and Liang (2012). Our new model has better interpretation in practice. The promotion time cure rate model, motivated from a simplified biological interpretation of cancer metastasis, was first proposed only a few decades ago. Nonetheless, it has quickly become a competitor to the mixture models. Our second development aims to provide a nonparametric alternative to the existing parametric or semiparametric promotion time models.
Development of Predictability and Condition Assessability Indices for PCCP Water Mains
Kola, Rajyalakshmi (Virginia Tech, 2010-01-25)
The condition of water and wastewater pipelines has been deteriorating with time and since this infrastructure is out-of-sight, the assessment has been neglected over the years. The advancement of technology in various fields has provided pathway for development of several technologies for assessment of the condition of pipeline systems. However, there is no standard guidance or tool for the utilities to use these technologies appropriately. The utilities are unaware of the present state-of- the- art technologies. The predictability and condition assessability indices will help utilities predict a probable failure and take steps to prevent it. The predictability index will indicate the inherent, theoretical predictability of key types of pipe failures. The pipe failure predictability index would be a score calculated by identifying high priority pipe types, characterizing their failure modes, mechanism, conditions, and indicators, reliability of indicators, lead-time of the indicators, and other factors. The condition assessability index will indicate the technical and economical methods of preventing key types of pipe failures. The pipe failure condition assessability index is similar to the predictability index, but it takes into account the capability of existing inspection technologies for measuring the required failure indicator parameters. Prestressed Concrete Cylinder Pipes are used in large diameter water pipelines throughout the United States to convey large volumes of water. Prestressed Concrete Cylinder Pipes are complex composite pipes. Therefore, prediction and prevention of failure of these pipelines is complex and requires a better understanding of the system. This research concentrates around development of Predictability and Condition Assessability Indices for PCCP pipelines.
Effect of antibiotic use and composting on antibiotic resistance gene abundance and resistome risks of soils receiving manure-derived amendments
Chen, Chaoqi; Pankow, Christine A.; Oh, Min; Heath, Lenwood S.; Zhang, Liqing; Du, Pang; Xia, Kang; Pruden, Amy (Elsevier, 2019-05-03)
Manure-derived amendments are commonly applied to soil, raising questions about whether antibiotic use in livestock could influence the soil resistome (collective antibiotic resistance genes (ARGs)) and ultimately contribute to the spread of antibiotic resistance to humans during food production. Here, we examined the metagenomes of soils amended with raw or composted manure generated from dairy cows administered pirlimycin and cephapirin (antibiotic) or no antibiotics (control) relative to unamended soils. Initial amendment (Day 1) with manure or compost significantly increased the diversity (richness) of ARGs in soils (p < 0.01) and resulted in distinct abundances of individual ARG types. Notably, initial amendment with antibiotic-manure significantly increased the total ARG relative abundances (per 16S rRNA gene) in the soils (2.21×unamended soils, p < 0.001). After incubating 120 days, to simulate a wait period before crop harvest, 282 ARGs reduced 4.33- fold (median) up to 307-fold while 210 ARGs increased 2.89-fold (median) up to 76-fold in the antibioticmanure- amended soils, resulting in reduced total ARG relative abundances equivalent to those of the unamended soils. We further assembled the metagenomic data and calculated resistome risk scores, which was recently defined as a relative index comparing co-occurrence of sequences corresponding to ARGs, mobile genetic elements, and putative pathogens on the same scaffold. Initial amendment of manure significantly increased the soil resistome risk scores, especially when generated by cows administered antibiotics, while composting reduced the effects and resulted in soil resistomes more similar to the background. The risk scores of manure-amended soils reduced to levels comparable to the unamended soils after 120 days. Overall, this study provides an integrated, high-resolution examination of the effects of prior antibiotic use, composting, and a 120-day wait period on soil resistomes following manure-derived amendment, demonstrating that all three management practices have measurable effects and should be taken into consideration in the development of policy and practice for mitigating the spread of antibiotic resistance.
Evaluating Time-varying Effect in Single-type and Multi-type Semi-parametric Recurrent Event Models
Chen, Chen (Virginia Tech, 2015-12-11)
This dissertation aims to develop statistical methodologies for estimating the effects of time-fixed and time-varying factors in recurrent events modeling context. The research is motivated by the traffic safety research question of evaluating the influence of crash on driving risk and driver behavior. The methodologies developed, however, are general and can be applied to other fields. Four alternative approaches based on various data settings are elaborated and applied to 100-Car Naturalistic Driving Study in the following Chapters. Chapter 1 provides a general introduction and background of each method, with a sketch of 100-Car Naturalistic Driving Study. In Chapter 2, I assessed the impact of crash on driving behavior by comparing the frequency of distraction events in per-defined windows. A count-based approach based on mixed-effect binomial regression models was used. In Chapter 3, I introduced intensity-based recurrent event models by treating number of Safety Critical Incidents and Near Crash over time as a counting process. Recurrent event models fit the natural generation scheme of the data in this study. Four semi-parametric models are explored: Andersen-Gill model, Andersen-Gill model with stratified baseline functions, frailty model, and frailty model with stratified baseline functions. I derived model estimation procedure and and conducted model comparison via simulation and application. The recurrent event models in Chapter 3 are all based on proportional assumption, where effects are constant. However, the change of effects over time is often of primary interest. In Chapter 4, I developed time-varying coefficient model using penalized B-spline function to approximate varying coefficients. Shared frailty terms was used to incorporate correlation within subjects. Inference and statistical test are also provided. Frailty representation was proposed to link time-varying coefficient model with regular frailty model. In Chapter 5, I further extended framework to accommodate multi-type recurrent events with time-varying coefficient. Two types of recurrent-event models were developed. These models incorporate correlation among intensity functions from different type of events by correlated frailty terms. Chapter 6 gives a general review on the contributions of this dissertation and discussion of future research directions.
Functional Data Models for Raman Spectral Data and Degradation Analysis
Do, Quyen Ngoc (Virginia Tech, 2022-08-16)
Functional data analysis (FDA) studies data in the form of measurements over a domain as whole entities. Our first focus is on the post-hoc analysis with pairwise and contrast comparisons of the popular functional ANOVA model comparing groups of functional data. Existing contrast tests assume independent functional observations within group. In reality, this assumption may not be satisfactory since functional data are often collected continually overtime on a subject. In this work, we introduce a new linear contrast test that accounts for time dependency among functional group members. For a significant contrast test, it can be beneficial to identify the region of significant difference. In the second part, we propose a non-parametric regression procedure to obtain a locally sparse estimate of functional contrast. Our work is motivated by a biomedical study using Raman spectroscopy to monitor hemodialysis treatment near real-time. With contrast test and sparse estimation, practitioners can monitor the progress of the hemodialysis within session and identify important chemicals for dialysis adequacy monitoring. In the third part, we propose a functional data model for degradation analysis of functional data. Motivated by degradation analysis application of rechargeable Li-ion batteries, we combine state-of-the-art functional linear models to produce fully functional prediction for curves on heterogenous domains. Simulation studies and data analysis demonstrate the advantage of the proposed method in predicting degradation measure than existing method using aggregation method.
GLR Control Charts for Monitoring a Proportion
Huang, Wandi (Virginia Tech, 2011-12-06)
The generalized likelihood ratio (GLR) control charts are studied for monitoring a process proportion of defective or nonconforming items. The type of process change considered is an abrupt sustained increase in the process proportion, which implies deterioration of the process quality. The objective is to effectively detect a wide range of shift sizes. For the first part of this research, we assume samples are collected using rational subgrouping with sample size n>1, and the binomial GLR statistic is constructed based on a moving window of past sample statistics that follow a binomial distribution. Steady state performance is evaluated for the binomial GLR chart and the other widely used binomial charts. We find that in terms of the overall performance, the binomial GLR chart is at least as good as the other charts. In addition, since it has only two charting parameters that both can be easily obtained based on the approach we propose, less effort is required to design the binomial GLR chart for practical applications. The second part of this research develops a Bernoulli GLR chart to monitor processes based on the continuous inspection, in which case samples of size n=1 are observed. A constant upper bound is imposed on the estimate of the process shift, preventing the corresponding Bernoulli GLR statistic from being undefined. Performance comparisons between the Bernoulli GLR chart and the other charts show that the Bernoulli GLR chart has better overall performance than its competitors, especially for detecting small shifts.
GLR Control Charts for Process Monitoring with Sequential Sampling
Peng, Yiming (Virginia Tech, 2014-11-06)
The objective of this dissertation is to investigate GLR control charts based on a sequential sampling scheme (SS GLR charts). Phase II monitoring is considered and the goal is to quickly detect a wide range of changes in the univariate normal process mean parameter and/or the variance parameter. The performance of the SS GLR charts is evaluated and design guidelines for SS GLR charts are provided so that practitioners can easily apply the SS GLR charts in applications. More specifically, the structure of this dissertation is as follows: We first develop a two-sided SS GLR chart for monitoring the mean μ of a normal process. The performance of the SS GLR chart is evaluated and compared with other control charts. The SS GLR chart has much better performance than that of the fixed sampling rate GLR chart. It is also shown that the overall performance of the SS GLR chart is better than that of the variable sampling interval (VSI) GLR chart and the variable sampling rate (VSR) CUSUM chart. The SS GLR chart has the additional advantage that it requires fewer parameters to be specified than other VSR charts. The optimal parameter choices are given, and regression equations are provided to find the limits for the SS GLR chart. If detecting one-sided shifts in μ is of interest, the above SS GLR chart can be modified to be a one-sided chart. The performance of this modified SS GLR chart is investigated. Next we develop an SS GLR chart for simultaneously monitoring the mean μ and the variance 𝜎² of a normal process. The performance and properties of this chart are evaluated. The design methodology and some illustrative examples are provided so that the SS GLR chart can be easily used in applications. The optimal parameter choices are given, and the performance of the SS GLR chart remains very good as long as the parameter choices are not too far away from the optimized choices.
Investigating Selection Criteria of Constrained Cluster Analysis: Applications in Forestry
Corral, Gavin Richard (Virginia Tech, 2014-10-29)
Forest measurements are inherently spatial. Soil productivity varies spatially at fine scales and tree growth responds by changes in growth-age trajectories. Measuring spatial variability is a perquisite to more effective analysis and statistical testing. In this study, current techniques of partial redundancy analysis and constrained cluster analysis are used to explore how spatial variables determine structure in a managed regular spaced plantation. We will test for spatial relationships in the data and then explore how those spatial relationships are manifested into spatially recognizable structures. The objectives of this research are to measure, test, and map spatial variability in simulated forest plots. Partial redundancy analysis was found to be a good method for detecting the presence or absence of spatial relationships (~95% accuracy). We found that the Calinski-Harabasz method consistently performed better at detecting the correct number of clusters when compared to several other methods. While there is still more work that can be done we believe that constrained cluster analysis has promising applications in forestry and that the Calinski-Harabasz criterion will be most useful.
Investigating the performance of process-observation-error-estimator and robust estimators in surplus production model: a simulation study
He, Qing (Virginia Tech, 2010-05-13)
This study investigated the performance of the three estimators of surplus production model including process-observation-error-estimator with normal distribution (POE_N), observation-error-estimator with normal distribution (OE_N), and process-error-estimator with normal distribution (PE_N). The estimators with fat-tailed distributions including Student's t distribution and Cauchy distribution were also proposed and their performances were compared with the estimators with normal distribution. This study used Bayesian method, revised Metropolis Hastings within Gibbs sampling algorithm (MHGS) that was previously used to solve POE_N (Millar and Meyer, 2000), developed the MHGS for the other estimators, and developed the methodologies which enabled all the estimators to deal with data containing multiple indices based on catch-per-unit-effort (CPUE). Simulation study was conducted based on parameter estimation from two example fisheries: the Atlantic weakfish (Cynoscion regalis) and the black sea bass (Centropristis striata) southern stock. Our results indicated that POE_N is the estimator with best performance among all six estimators with regard to both accuracy and precision for most of the cases. POE_N is also the robust estimator to outliers, atypical values, and autocorrelated errors. OE_N is the second best estimator. PE_N is often imprecise. Estimators with fat-tailed distribution usually result in some estimates more biased than estimators with normal distribution. The performance of POE_N and OE_N can be improved by fitting multiple indices. Our study suggested that POE_N be used for population dynamic models in future stock assessment. Multiple indices from valid surveys should be incorporated into stock assessment models. OE_N can be considered when multiple indices are available.
Likelihood Ratio Combination of Multiple Biomarkers and Change Point Detection in Functional Time Series
Du, Zhiyuan (Virginia Tech, 2024-09-24)
Utilizing multiple biomarkers in medical research is crucial for the diagnostic accuracy of detecting diseases. An optimal method for combining these biomarkers is essential to maximize the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC). The optimality of the likelihood ratio has been proven but the challenges persist in estimating the likelihood ratio, primarily on the estimation of multivariate density functions. In this study, we propose a non-parametric approach for estimating multivariate density functions by utilizing Smoothing Spline density estimation to approximate the full likelihood function for both diseased and non-diseased groups, which compose the likelihood ratio. Simulation results demonstrate the efficiency of our method compared to other biomarker combination techniques under various settings for generated biomarker values. Additionally, we apply the proposed method to a real-world study aimed at detecting childhood autism spectrum disorder (ASD), showcasing its practical relevance and potential for future applications in medical research. Change point detection for functional time series has attracted considerable attention from researchers. Existing methods either rely on FPCA, which may perform poorly with complex data, or use bootstrap approaches in forms that fall short in effectively detecting diverse change functions. In our study, we propose a novel self-normalized test for functional time series implemented via a non-overlapping block bootstrap to circumvent reliance on FPCA. The SN factor ensures both monotonic power and adaptability for detecting diverse change functions on complex data. We also demonstrate our test's robustness in detecting changes in the autocovariance operator. Simulation studies confirm the superior performance of our test across various settings, and real-world applications further illustrate its practical utility.
Methods for Evaluating Aquifer-System Parameters from a Cumulative Compaction Record
Vanhaitsma, Amanda Joy (Virginia Tech, 2016-08-12)
Although many efforts and strategies have been implemented to reduce over-pumping of aquifer-systems, land subsidence is still a serious issue worldwide. Accurate aquifer characterization is critical to understand the response of an aquifer-system to prolonged pumping but is often difficult and expensive to conduct. The purpose of this thesis is to determine the validity of estimating aquifer-system parameters from a single cumulative compaction record and corresponding nested water-level data deconvolved into temporal components. Over a decade of compaction and water-level data were collected from an extensometer and multi-level piezometer at the Lorenzi site in Las Vegas Valley and when graphed yearly, seasonal, and daily signals are observed. Each temporal signal reflects different characteristics of the aquifer-system, including the distinction between aquifer and aquitard parameters, as the three temporal stresses influence the compaction record uniquely. Maximum cross-correlation was used to determine the hydrodynamic lag between changing water-levels and subsidence within the seasonal signal while principal components analysis was used to statistically verify the presence of the three temporal signals. Assumptions had to be made but nearly all estimated Lorenzi site aquifer-system parameters fell either within the reasonable range or were similar in magnitude to parameter values estimated in previous studies. Unfortunately, principal components analysis was unable to detect the three temporal signals. A cumulative compaction record may be difficult to obtain but analyzing the precision measurements of an extensometer results in precise aquifer-system parameters and as the precision of aquifer-system parameters increase so does the ability to sustainably manage groundwater.
MOSS—Multi-Modal Best Subset Modeling in Smart Manufacturing
Wang, Lening; Du, Pang; Jin, Ran (MDPI, 2021-01-01)
Smart manufacturing, which integrates a multi-sensing system with physical manufacturing processes, has been widely adopted in the industry to support online and real-time decision making to improve manufacturing quality. A multi-sensing system for each specific manufacturing process can efficiently collect the in situ process variables from different sensor modalities to reflect the process variations in real-time. However, in practice, we usually do not have enough budget to equip too many sensors in each manufacturing process due to the cost consideration. Moreover, it is also important to better interpret the relationship between the sensing modalities and the quality variables based on the model. Therefore, it is necessary to model the quality-process relationship by selecting the most relevant sensor modalities with the specific quality measurement from the multi-modal sensing system in smart manufacturing. In this research, we adopted the concept of best subset variable selection and proposed a new model called Multi-mOdal beSt Subset modeling (MOSS). The proposed MOSS can effectively select the important sensor modalities and improve the modeling accuracy in quality-process modeling via functional norms that characterize the overall effects of individual modalities. The significance of sensor modalities can be used to determine the sensor placement strategy in smart manufacturing. Moreover, the selected modalities can better interpret the quality-process model by identifying the most correlated root cause of quality variations. The merits of the proposed model are illustrated by both simulations and a real case study in an additive manufacturing (i.e., fused deposition modeling) process.
Neighborhood change in metropolitan America
Wei, Fang (Virginia Tech, 2013-01-24)
This dissertation presents an integrated framework that was developed to examine trajectories of neighborhood change, mechanisms of suburban diversity, and the relationships between neighborhood change and employment accessibility. First, this dissertation extends the study of neighborhood change to a greater time and spatial span, systematically examining the trajectories of neighborhood change at the census tract level. The results show that neighborhood change is complicated and exhibits various trajectories. The dominant patterns do not always conform to classical models of neighborhood change, providing counterpoints to some long-established assumptions. This dissertation also provides evidence of the mechanisms through which metropolitan and suburban characteristics influence suburban diversity. Most importantly, it highlights a remarkable increase in suburban diversity with respect to neighborhood composition. Finally, this dissertation investigates the relationships between neighborhood change, spatial transformation, and employment accessibility in the North Carolina Piedmont region during the last three decades. Spatial patterns of the neighborhood distributions suggest that job accessibility varies by neighborhood typology. A detailed analysis of the trajectories of neighborhood change shows interesting patterns in both central city and suburban ecological succession and transformation. These geographical shifts of neighborhoods were shown to be associated with changes in job accessibility to a certain extent. In sum, by introducing an integrated framework including social, spatial, and employment factors, this dissertation develops a more balanced understanding of neighborhood change in the United States.
Objective Bayesian Analysis of Kullback-Liebler Divergence of two Multivariate Normal Distributions with Common Covariance Matrix and Star-shape Gaussian Graphical Model
Li, Zhonggai (Virginia Tech, 2008-06-18)
This dissertation consists of four independent but related parts, each in a Chapter. The first part is an introductory. It serves as the background introduction and offer preparations for later parts. The second part discusses two population multivariate normal distributions with common covariance matrix. The goal for this part is to derive objective/non-informative priors for the parameterizations and use these priors to build up constructive random posteriors of the Kullback-Liebler (KL) divergence of the two multivariate normal populations, which is proportional to the distance between the two means, weighted by the common precision matrix. We use the Cholesky decomposition for re-parameterization of the precision matrix. The KL divergence is a true distance measurement for divergence between the two multivariate normal populations with common covariance matrix. Frequentist properties of the Bayesian procedure using these objective priors are studied through analytical and numerical tools. The third part considers the star-shape Gaussian graphical model, which is a special case of undirected Gaussian graphical models. It is a multivariate normal distribution where the variables are grouped into one "global" group of variable set and several "local" groups of variable set. When conditioned on the global variable set, the local variable sets are independent of each other. We adopt the Cholesky decomposition for re-parametrization of precision matrix and derive Jeffreys' prior, reference prior, and invariant priors for new parameterizations. The frequentist properties of the Bayesian procedure using these objective priors are also studied. The last part concentrates on the discussion of objective Bayesian analysis for partial correlation coefficient and its application to multivariate Gaussian models.
On Independent Reference Priors
Lee, Mi Hyun (Virginia Tech, 2007-12-05)
In Bayesian inference, the choice of prior has been of great interest. Subjective priors are ideal if sufficient information on priors is available. However, in practice, we cannot collect enough information on priors. Then objective priors are a good substitute for subjective priors. In this dissertation, an independent reference prior based on a class of objective priors is examined. It is a reference prior derived by assuming that the parameters are independent. The independent reference prior introduced by Sun and Berger (1998) is extended and generalized. We provide an iterative algorithm to derive the general independent reference prior. We also propose a sufficient condition under which a closed form of the independent reference prior is derived without going through the iterations in the iterative algorithm. The independent reference prior is then shown to be useful in respect of the invariance and the first order matching property. It is proven that the independent reference prior is invariant under a type of one-to-one transformation of the parameters. It is also seen that the independent reference prior is a first order probability matching prior under a sufficient condition. We derive the independent reference priors for various examples. It is observed that they are first order matching priors and the reference priors in most of the examples. We also study an independent reference prior in some types of non-regular cases considered by Ghosal (1997).

Browsing by Author "Du, Pang"

Results Per Page

Sort Options