Browsing by Author "Wang, Chen"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- From network to pathway: integrative network analysis of genomic dataWang, Chen (Virginia Tech, 2011-06-14)The advent of various types of high-throughput genomic data has enabled researchers to investigate complex biological systems in a systemic way and started to shed light on the underlying molecular mechanisms in cancers. To analyze huge amounts of genomic data, effective statistical and machine learning tools are clearly needed; more importantly, integrative approaches are especially needed to combine different types of genomic data for a network or pathway view of biological systems. Motivated by such needs, we make efforts in this dissertation to develop integrative framework for pathway analysis. Specifically, we dissect the molecular pathway into two parts: protein-DNA interaction network and protein-protein interaction network. Several novel approaches are proposed to integrate gene expression data with various forms of biological knowledge, such as protein-DNA interaction and protein-protein interaction for reliable molecular network identification. The first part of this dissertation seeks to infer condition-specific transcriptional regulatory network by integrating gene expression data and protein-DNA binding information. Protein-DNA binding information provides initial relationships between transcription factors (TFs) and their target genes, and this information is essential to derive biologically meaningful integrative algorithms. Based on the availability of this information, we discuss the inference task based on two different situations: (a) if protein-DNA binding information of multiple TFs is available: based on the protein-DNA data of multiple TFs, which are derived from sequence analysis between DNA motifs and gene promoter regions, we can construct initial connection matrix and solve the network inference using a constraint least-squares approach named motif-guided network component analysis (mNCA). However, connection matrix usually contains a considerable amount of false positives and false negatives that make inference results questionable. To circumvent this problem, we propose a knowledge based stability analysis (kSA) approach to test the conditional relevance of individual TFs, by checking the discrepancy of multiple estimations of transcription factor activity with respect to different perturbations on the connections. The rationale behind stability analysis is that the consistency of observed gene expression and true network connection shall remain stable after small perturbations are applied to initial connection matrix. With condition-specific TFs prioritized by kSA, we further propose to use multivariate regression to highlight condition-specific target genes. Through simulation studies comparing with several competing methods, we show that the proposed schemes are more sensitive to detect relevant TFs and target genes for network inference purpose. Experimentally, we have applied stability analysis to yeast cell cycle experiment and further to a series of anti-estrogen breast cancer studies. In both experiments not only biologically relevant regulators are highlighted, the condition-specific transcriptional regulatory networks are also constructed, which could provide further insights into the corresponding cellular mechanisms. (b) if only single TF's protein-DNA information is available: this happens when protein-DNA binding relationship of individual TF is measured through experiments. Since original mNCA requires a complete connection matrix to perform estimation, an incomplete knowledge of single TF is not applicable for such approach. Moreover, binding information derived from experiments could still be inconsistent with gene expression levels. To overcome these limitations, we propose a linear extraction scheme called regulatory component analysis (RCA), which can infer underlying regulation relationships, even with partial biological knowledge. Numerical simulations show significant improvement of RCA over other traditional methods to identify target genes, not only in low signal-to-noise-ratio situations and but also when the given biological knowledge is incomplete and inconsistent to data. Furthermore, biological experiments on Escherichia coli regulatory network inferences are performed to fairly compare traditional methods, where the effectiveness and superior performance of RCA are confirmed. The second part of the dissertation moves from protein-DNA interaction network up to protein-protein interaction network, to identify dys-regulated protein sub-networks by integrating gene expression data and protein-protein interaction information. Specifically, we propose a statistically principled method, namely Metropolis random walk on graph (MRWOG), to highlight condition-specific PPI sub-networks in a probabilistic way. The method is based on the Markov chain Monte Carlo (MCMC) theory to generate a series of samples that will eventually converge to some desired equilibrium distribution, and each sample indicates the selection of one particular sub-network during the process of Metropolis random walk. The central idea of MRWOG is built upon that the essentiality of one gene to be included in a sub-network depends on not only its expression but also its topological importance. Contrasted to most existing methods constructing sub-networks in a deterministic way and therefore lacking relevance score for each protein, MRWOG is capable of assessing the importance of each individual protein node in a global way, not only reflecting its individual association with clinical outcome but also indicating its topological role (hub, bridge) to connect other important proteins. Moreover, each protein node is associated with a sampling frequency score, which enables the statistical justification of each individual node and flexible scaling of sub-network results. Based on MRWOG approach, we further propose two strategies: one is bootstrapping used for assessing statistical confidence of detected sub-networks; the other is graphic division to separate a large sub-network to several smaller sub-networks for facilitating interpretations. MRWOG is easy to use with only two parameters need to be adjusted, one is beta value for performing random walk and another is Quantile level for calculating truncated posteriori mean. Through extensive simulations, we show that the proposed scheme is not sensitive to these two parameters in a relatively wide range. We also compare MRWOG with deterministic approaches for identifying sub-network and prioritizing topologically important proteins, in both cases MRWG outperforms existing methods in terms of both precision and recall. By utilizing MRWOG generated node/edge sampling frequency, which is actually posteriori mean of corresponding protein node/interaction edge, we illustrate that condition-specific nodes/interactions can be better prioritized than the schemes based on scores of individual node/interaction. Experimentally, we have applied MRWOG to study yeast knockout experiment for galactose utilization pathways to reveal important components of corresponding biological functions; we also applied MRWSOG to study breast cancer patient prognostics problems, where the sub-network analysis could lead to an understanding of the molecular mechanisms of antiestrogen resistance in breast cancer. Finally, we conclude this dissertation with a summary of the original contributions, and the future work for deepening the theoretical justification of the proposed methods and broadening their potential biological applications such as cancer studies.
- Knowledge-guided gene ranking by coordinative component analysisWang, Chen; Xuan, Jianhua; Li, Huai; Wang, Yue; Zhan, Ming; Hoffman, Eric P.; Clarke, Robert (2010-03-30)Background In cancer, gene networks and pathways often exhibit dynamic behavior, particularly during the process of carcinogenesis. Thus, it is important to prioritize those genes that are strongly associated with the functionality of a network. Traditional statistical methods are often inept to identify biologically relevant member genes, motivating researchers to incorporate biological knowledge into gene ranking methods. However, current integration strategies are often heuristic and fail to incorporate fully the true interplay between biological knowledge and gene expression data. Results To improve knowledge-guided gene ranking, we propose a novel method called coordinative component analysis (COCA) in this paper. COCA explicitly captures those genes within a specific biological context that are likely to be expressed in a coordinative manner. Formulated as an optimization problem to maximize the coordinative effort, COCA is designed to first extract the coordinative components based on a partial guidance from knowledge genes and then rank the genes according to their participation strengths. An embedded bootstrapping procedure is implemented to improve statistical robustness of the solutions. COCA was initially tested on simulation data and then on published gene expression microarray data to demonstrate its improved performance as compared to traditional statistical methods. Finally, the COCA approach has been applied to stem cell data to identify biologically relevant genes in signaling pathways. As a result, the COCA approach uncovers novel pathway members that may shed light into the pathway deregulation in cancers. Conclusion We have developed a new integrative strategy to combine biological knowledge and microarray data for gene ranking. The method utilizes knowledge genes for a guidance to first extract coordinative components, and then rank the genes according to their contribution related to a network or pathway. The experimental results show that such a knowledge-guided strategy can provide context-specific gene ranking with an improved performance in pathway member identification.
- Knowledge-guided multi-scale independent component analysis for biomarker identificationChen, Li; Xuan, Jianhua; Wang, Chen; Shih, Ie-Ming; Wang, Yue; Zhang, Zhen; Hoffman, Eric P.; Clarke, Robert (2008-10-06)Background Many statistical methods have been proposed to identify disease biomarkers from gene expression profiles. However, from gene expression profile data alone, statistical methods often fail to identify biologically meaningful biomarkers related to a specific disease under study. In this paper, we develop a novel strategy, namely knowledge-guided multi-scale independent component analysis (ICA), to first infer regulatory signals and then identify biologically relevant biomarkers from microarray data. Results Since gene expression levels reflect the joint effect of several underlying biological functions, disease-specific biomarkers may be involved in several distinct biological functions. To identify disease-specific biomarkers that provide unique mechanistic insights, a meta-data "knowledge gene pool" (KGP) is first constructed from multiple data sources to provide important information on the likely functions (such as gene ontology information) and regulatory events (such as promoter responsive elements) associated with potential genes of interest. The gene expression and biological meta data associated with the members of the KGP can then be used to guide subsequent analysis. ICA is then applied to multi-scale gene clusters to reveal regulatory modes reflecting the underlying biological mechanisms. Finally disease-specific biomarkers are extracted by their weighted connectivity scores associated with the extracted regulatory modes. A statistical significance test is used to evaluate the significance of transcription factor enrichment for the extracted gene set based on motif information. We applied the proposed method to yeast cell cycle microarray data and Rsf-1-induced ovarian cancer microarray data. The results show that our knowledge-guided ICA approach can extract biologically meaningful regulatory modes and outperform several baseline methods for biomarker identification. Conclusion We have proposed a novel method, namely knowledge-guided multi-scale ICA, to identify disease-specific biomarkers. The goal is to infer knowledge-relevant regulatory signals and then identify corresponding biomarkers through a multi-scale strategy. The approach has been successfully applied to two expression profiling experiments to demonstrate its improved performance in extracting biologically meaningful and disease-related biomarkers. More importantly, the proposed approach shows promising results to infer novel biomarkers for ovarian cancer and extend current knowledge.
- Lateral Motion Prediction of On-Road Preceding Vehicles: A Data-Driven ApproachWang, Chen; Delport, Jacques; Wang, Yan (MDPI, 2019-05-07)Drivers’ behaviors and decision making on the road directly affect the safety of themselves, other drivers, and pedestrians. However, as distinct entities, people cannot predict the motions of surrounding vehicles and they have difficulty in performing safe reactionary driving maneuvers in a short time period. To overcome the limitations of making an immediate prediction, in this work, we propose a two-stage data-driven approach: classifying driving patterns of on-road surrounding vehicles using the Gaussian mixture models (GMM); and predicting vehicles’ short-term lateral motions (i.e., left/right turn and left/right lane change) based on real-world vehicle mobility data, provided by the U.S. Department of Transportation, with different ensemble decision trees. We considered several important kinetic features and higher order kinematic variables. The research results of our proposed approach demonstrate the effectiveness of pattern classification and on-road lateral motion prediction. This methodology framework has the potential to be incorporated into current data-driven collision warning systems, to enable more practical on-road preprocessing in intelligent vehicles, and to be applied in autopilot-driving scenarios.
- Modeling multi-attribute utility theory with object-oriented programmingWang, Chen (Virginia Tech, 1994-12-12)System complexity has continued to increase with the development and application of new technologies. This increased complexity has created great concerns among people about the potential impact of a system on its ecological environment when considering such as plants, wildlife and clean air. A complete awareness of the potential impact requires a thorough understanding of how a system interacts with its ecological environment, and the results are dependent on the expertise of the engineer who is responsible for the design of the system and the analyst who evaluates the system Due to the complexity of these interactions and the difficulty in measuring the appropriate cause-and-effect relationships, a system's impact on its ecological environment has not received due attention. The above complexity and difficulty have led to two deficiencies in the current research of the system's environmental impact. One is the insufficient evaluation of its qualitative attributes. The other is an unstructured evaluation process where the analyst has to rely on qualitative attributes as major inputs while his/her expertise could not be modeled. As a consequence, the current research and evaluation process is deficient because of biases and lack of clarity. This report seeks to instill the necessary clarity into the decision-making process by structuring the decision maker's subjective knowledge. It is concluded that subjective preferences can be quantified and evaluated through utility function assessment. Alternatives are ranked and a final choice is made based on their utility. The modeling process described herein is made a lot more efficient and economical because of the computer software that integrates the assessment mechanisms into a user-friendly operational environment. After the deficiencies in the current evaluation process are identified, possible solutions are explored. The effectiveness of the Analytic Hierarchy Process (AHP), Multi-attribute Value Theory (MA VT), and Multi-attribute Utility Theory (MAUT) are compared. MAUT is the preferred approach based on solution requirements.
- Motif-directed network component analysis for regulatory network inferenceWang, Chen; Xuan, Jianhua; Chen, Li; Zhao, Po; Wang, Yue; Clarke, Robert; Hoffman, Eric P. (2008-02-13)Background Network Component Analysis (NCA) has shown its effectiveness in discovering regulators and inferring transcription factor activities (TFAs) when both microarray data and ChIP-on-chip data are available. However, a NCA scheme is not applicable to many biological studies due to limited topology information available, such as lack of ChIP-on-chip data. We propose a new approach, motif-directed NCA (mNCA), to integrate motif information and gene expression data to infer regulatory networks. Results We develop motif-directed NCA (mNCA) to incorporate motif information into NCA for regulatory network inference. While motif information is readily available from knowledge databases, it is a "noisy" source of network topology information consisting of many false positives. To overcome this problem, we develop a stability analysis procedure embedded in mNCA to resolve the inconsistency between motif information and gene expression data, and to enable the identification of stable TFAs. The mNCA approach has been applied to a time course microarray data set of muscle regeneration. The experimental results show that the inferred TFAs are not only numerically stable but also biologically relevant to muscle differentiation process. In particular, several inferred TFAs like those of MyoD, myogenin and YY1 are well supported by biological experiments. Conclusion A novel computational approach, mNCA, has been developed to integrate motif information and gene expression data for regulatory network reconstruction. Specifically, motif analysis is used to obtain initial network topology, and stability analysis is developed and applied with mNCA to extract stable TFAs. Experimental results on muscle regeneration microarray data have demonstrated that mNCA is a practical and reliable computational method for regulatory network inference and pathway discovery.
- Renewable Energy Integrated Power System Stability Assessment with Validated System Model Based on PMU MeasurementsWang, Chen (Virginia Tech, 2019-06-14)Renewable energy is playing an increasingly significant role in power system operation and stability assessment with its numerous penetration expansion. This is not only brought by its uncertain power output and inverter-based equipment structures but also its operation characteristics like Low Voltage Ride Through (LVRT). It is thus necessary to take these characteristics into consideration and further to find more adaptive schemes to implement them for more effective analysis and safer power system operation. All the aforementioned is based on the accurate identification of the system fundamental information. In this dissertation, a systematic approach is proposed to find the valid system model by estimating the transmission line parameters in the system with PMU measurements. The system transient stability assessment is conducted based on this validated model. The constrained stability region is estimated with Lyapunov functions family based method in the center of angles reference frame considering renewables LVRT as operation limits. In order to integrate the LVRT constraints, a polytopic inner approximation mechanism is introduced to linearize and organize the transformed constraints in state space, which brings much scalability to the whole process. From the voltage stability perspective, an approach to adaptively adjust LVRT settings of the renewable energy sources in the system is formulated to guarantee the system load margin and thus the voltage security. A voltage prediction method is introduced for critical renewable energy sources identification. Estimation methods based on interpolation and sensitivities are developed and conducted for saving computation effort brought by continuation power flows. Multiple test cases are studied utilizing the proposed approaches and results are demonstrated.
- Transmission Lines Positive Sequence Parameters Estimation and Instrument Transformers Calibration Based on PMU Measurement Error ModelWang, Chen; Centeno, Virgilio A.; Jones, Kevin David; Yang, Duotong (IEEE, 2019-10-17)Phasor Measurement Unit measurement data have been widely used in nowadays power system applications both in steady state and dynamic analysis. The performance of these applications running in utilities' energy management system depends heavily on an accurate positive sequence power system model. However, it is impractical to nd this accurate model with transmission line parameters calculated directly with the PMU measurements due to ratio errors brought by instrument transformers and communication errors brought by PMUs. Therefore, a methodology is proposed in this paper to estimate the actual transmission lines parameters throughout the whole system and, at the same time, calibrate the corresponding instrument transformers. A PMU positive sequence measurement error model is proposed targeting at the aforementioned errors, which is applicable to both transposed and un-transposed transmission lines. A single line parameters estimation method is designed based on Least Squares Estimation and this error model. This method requires only one set of reference measurements and the accuracy can be propagated throughout the whole network along with the topology acquired by the introduced Edge-based Breadth-rst Search algorithm. The IEEE 118-bus system and the Texas 2000-bus system are used to demonstrate the effectiveness and efciency of the proposed method. The potential for deployment in reality is also discussed.