Browsing by Author "Miller, David J."
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- Asymmetric independence modeling identifies novel gene-environment interactionsYu, Guoqiang; Miller, David J.; Wu, Chiung-Ting; Hoffman, Eric P.; Liu, Chunyu; Herrington, David M.; Wang, Yue (Springer Nature, 2019-02-21)Most genetic or environmental factors work together in determining complex disease risk. Detecting gene-environment interactions may allow us to elucidate novel and targetable molecular mechanisms on how environmental exposures modify genetic effects. Unfortunately, standard logistic regression (LR) assumes a convenient mathematical structure for the null hypothesis that however results in both poor detection power and type 1 error, and is also susceptible to missing factor, imperfect surrogate, and disease heterogeneity confounding effects. Here we describe a new baseline framework, the asymmetric independence model (AIM) in case-control studies, and provide mathematical proofs and simulation studies verifying its validity across a wide range of conditions. We show that AIM mathematically preserves the asymmetric nature of maintaining health versus acquiring a disease, unlike LR, and thus is more powerful and robust to detect synergistic interactions. We present examples from four clinically discrete domains where AIM identified interactions that were previously either inconsistent or recognized with less statistical certainty.
- Automated Functional Analysis of Astrocytes from Chronic Time-Lapse Calcium Imaging DataWang, Yinxue; Shi, Guilai; Miller, David J.; Wang, Yizhi; Wang, Congchao; Broussard, Gerard J.; Wang, Yue; Tian, Lin; Yu, Goquiang (Frontiers, 2017-07-14)Recent discoveries that astrocytes exert proactive regulatory effects on neural information processing and that they are deeply involved in normal brain development and disease pathology have stimulated broad interest in understanding astrocyte functional roles in brain circuit. Measuring astrocyte functional status is now technically feasible, due to recent advances in modern microscopy and ultrasensitive cell-type specific genetically encoded Ca²⁺ indicators for chronic imaging. However, there is a big gap between the capability of generating large dataset via calcium imaging and the availability of sophisticated analytical tools for decoding the astrocyte function. Current practice is essentially manual, which not only limits analysis throughput but also risks introducing bias and missing important information latent in complex, dynamic big data. Here, we report a suite of computational tools, called Functional AStrocyte Phenotyping (FASP), for automatically quantifying the functional status of astrocytes. Considering the complex nature of Ca²⁺ signaling in astrocytes and low signal to noise ratio, FASP is designed with data-driven and probabilistic principles, to flexibly account for various patterns and to perform robustly with noisy data. In particular, FASP explicitly models signal propagation, which rules out the applicability of tools designed for other types of data. We demonstrate the effectiveness of FASP using extensive synthetic and real data sets. The findings by FASP were verified by manual inspection. FASP also detected signals that were missed by purely manual analysis but could be confirmed by more careful manual examination under the guidance of automatic analysis. All algorithms and the analysis pipeline are packaged into a plugin for Fiji (ImageJ), with the source code freely available online at https://github.com/VTcbil/FASP.
- caBIG VISDA: modeling, visualization, and discovery for cluster analysis of genomic dataZhu, Yitan; Li, Huai; Miller, David J.; Wang, Zuyi; Xuan, Jianhua; Clarke, Robert; Hoffman, Eric P.; Wang, Yue (2008-09-18)Background The main limitations of most existing clustering methods used in genomic data analysis include heuristic or random algorithm initialization, the potential of finding poor local optima, the lack of cluster number detection, an inability to incorporate prior/expert knowledge, black-box and non-adaptive designs, in addition to the curse of dimensionality and the discernment of uninformative, uninteresting cluster structure associated with confounding variables. Results In an effort to partially address these limitations, we develop the VIsual Statistical Data Analyzer (VISDA) for cluster modeling, visualization, and discovery in genomic data. VISDA performs progressive, coarse-to-fine (divisive) hierarchical clustering and visualization, supported by hierarchical mixture modeling, supervised/unsupervised informative gene selection, supervised/unsupervised data visualization, and user/prior knowledge guidance, to discover hidden clusters within complex, high-dimensional genomic data. The hierarchical visualization and clustering scheme of VISDA uses multiple local visualization subspaces (one at each node of the hierarchy) and consequent subspace data modeling to reveal both global and local cluster structures in a "divide and conquer" scenario. Multiple projection methods, each sensitive to a distinct type of clustering tendency, are used for data visualization, which increases the likelihood that cluster structures of interest are revealed. Initialization of the full dimensional model is based on first learning models with user/prior knowledge guidance on data projected into the low-dimensional visualization spaces. Model order selection for the high dimensional data is accomplished by Bayesian theoretic criteria and user justification applied via the hierarchy of low-dimensional visualization subspaces. Based on its complementary building blocks and flexible functionality, VISDA is generally applicable for gene clustering, sample clustering, and phenotype clustering (wherein phenotype labels for samples are known), albeit with minor algorithm modifications customized to each of these tasks. Conclusion VISDA achieved robust and superior clustering accuracy, compared with several benchmark clustering schemes. The model order selection scheme in VISDA was shown to be effective for high dimensional genomic data clustering. On muscular dystrophy data and muscle regeneration data, VISDA identified biologically relevant co-expressed gene clusters. VISDA also captured the pathological relationships among different phenotypes revealed at the molecular level, through phenotype clustering on muscular dystrophy data and multi-category cancer data.
- Comparative analysis of methods for detecting interacting lociChen, Li; Yu, Guoqiang; Langefeld, Carl D.; Miller, David J.; Guy, Richard T.; Raghuram, Jayaram; Yuan, Xiguo; Herrington, David M.; Wang, Yue (Biomed Central, 2011-07-05)Background: Interactions among genetic loci are believed to play an important role in disease risk. While many methods have been proposed for detecting such interactions, their relative performance remains largely unclear, mainly because different data sources, detection performance criteria, and experimental protocols were used in the papers introducing these methods and in subsequent studies. Moreover, there have been very few studies strictly focused on comparison of existing methods. Given the importance of detecting gene-gene and gene-environment interactions, a rigorous, comprehensive comparison of performance and limitations of available interaction detection methods is warranted. Results: We report a comparison of eight representative methods, of which seven were specifically designed to detect interactions among single nucleotide polymorphisms (SNPs), with the last a popular main-effect testing method used as a baseline for performance evaluation. The selected methods, multifactor dimensionality reduction (MDR), full interaction model (FIM), information gain (IG), Bayesian epistasis association mapping (BEAM), SNP harvester (SH), maximum entropy conditional probability modeling (MECPM), logistic regression with an interaction term (LRIT), and logistic regression (LR) were compared on a large number of simulated data sets, each, consistent with complex disease models, embedding multiple sets of interacting SNPs, under different interaction models. The assessment criteria included several relevant detection power measures, family-wise type I error rate, and computational complexity. There are several important results from this study. First, while some SNPs in interactions with strong effects are successfully detected, most of the methods miss many interacting SNPs at an acceptable rate of false positives. In this study, the best-performing method was MECPM. Second, the statistical significance assessment criteria, used by some of the methods to control the type I error rate, are quite conservative, thereby limiting their power and making it difficult to fairly compare them. Third, as expected, power varies for different models and as a function of penetrance, minor allele frequency, linkage disequilibrium and marginal effects. Fourth, the analytical relationships between power and these factors are derived, aiding in the interpretation of the study results. Fifth, for these methods the magnitude of the main effect influences the power of the tests. Sixth, most methods can detect some ground-truth SNPs but have modest power to detect the whole set of interacting SNPs. Conclusion: This comparison study provides new insights into the strengths and limitations of current methods for detecting interacting loci. This study, along with freely available simulation tools we provide, should help support development of improved methods. The simulation tools are available at: http://code.google.com/p/simulationtool-bmc-ms9169818735220977/downloads/list.
- Convex Analysis of Mixtures for Separating Non-negative Well-grounded SourcesZhu, Yitan; Wang, Niya; Miller, David J.; Wang, Yue (Springer Nature, 2016-12-06)Blind Source Separation (BSS) is a powerful tool for analyzing composite data patterns in many areas, such as computational biology. We introduce a novel BSS method, Convex Analysis of Mixtures (CAM), for separating non-negative well-grounded sources, which learns the mixing matrix by identifying the lateral edges of the convex data scatter plot. We propose and prove a sufficient and necessary condition for identifying the mixing matrix through edge detection in the noise-free case, which enables CAM to identify the mixing matrix not only in the exact-determined and over-determined scenarios, but also in the under-determined scenario. We show the optimality of the edge detection strategy, even for cases where source well-groundedness is not strictly satisfied. The CAM algorithm integrates plug-in noise filtering using sector-based clustering, an efficient geometric convex analysis scheme, and stability-based model order selection. The superior performance of CAM against a panel of benchmark BSS techniques is demonstrated on numerically mixed gene expression data of ovarian cancer subtypes. We apply CAM to dissect dynamic contrast-enhanced magnetic resonance imaging data taken from breast tumors and time-course microarray gene expression data derived from in-vivo muscle regeneration in mice, both producing biologically plausible decomposition results.
- Effect of oil age on polyaromatic hydrocarbon emissions from automobilesMiller, David J. (Virginia Polytechnic Institute and State University, 1986)Automobiles are known to emit polyaromatic hydrocarbons. The literature indicates that the emission levels of these compounds are correlated with oil age, and it has been hypothesized that entry of oil into the combustion chamber is a major cause of these emissions. This experiment investigated the relationship between oil age and these polyaromatic hydrocarbon emissions. It was found that the three polyaromatics of interest seem to be emitted inconsistently and irregularly. It is possible that this was due to a buildup on the combustion chamber walls of these compounds: polyaromatics are formed in the quench layer near these walls and can accumulate there until dynamic equilibrium is reached. This may not have been reached at the time of the investigation since the engine was relatively new. This would be of interest for future investigations.