Reconstruction of metabolic pathways by the exploration of gene expression data with factor analysis
Microarray gene expression data for thousands of genes in many organisms is quickly becoming available. The information this data can provide the experimental biologist is powerful. This data may provide information clarifying the regulatory linkages between genes within a single metabolic pathway, or alternative pathway routes under different environmental conditions, or provide information leading to the identification of genes for selection in animal and plant genetic improvement programs or targets for drug therapy. Many analysis methods to unlock this information have been both proposed and utilized, but not evaluated under known conditions (e.g. simulations). Within this dissertation, an analysis method is proposed and evaluated for identifying independent and linked metabolic pathways and compared to a popular analysis method. Also, this same analysis method is investigated for its ability to identify regulatory linkages within a single metabolic pathway. Lastly, a variant of this same method is used to analyze time series microarray data.
In Chapter 2, Factor Analysis is shown to identify and group genes according to membership within independent metabolic pathways for steady state microarray gene expression data. There were cases, however, where the allocation of all genes to a pathway was not complete. A competing analysis method, Hierarchical Clustering, was shown to perform poorly when negatively correlated genes are assumed unrelated, but performance improved when the sign of the correlation coefficient was ignored.
In Chapter 3, Factor Analysis is shown to identify regulatory relationships between genes within a single metabolic pathway. These relationships can be explained using metabolic control analysis, along with external knowledge of the pathway structure and activation and inhibition of transcription regulation. In this chapter, it is also shown why factor analysis can group genes by metabolic pathway using metabolic control analysis.
In Chapter 4, a Bayesian exploratory factor analysis is developed and used to analyze microarray gene expression data. This Bayesian model differs from a previous implementation in that it is purely exploratory and can be used with vague or uninformative priors. Additionally, 95% highest posterior density regions can be calculated for each factor loading to aid in interpretation of factor loadings. A correlated Bayesian exploratory factor analysis model is also developed in this chapter for application to time series microarray gene expression data. While this method is appropriate for the analysis of correlated observation vectors, it fails to group genes by metabolic pathway for simulated time series data.