Browsing by Author "Wang, Xiao"
Now showing 1 - 8 of 8
Results Per Page
Sort Options
- BADGE: A novel Bayesian model for accurate abundance quantification and differential analysis of RNA-Seq dataGu, Jinghua; Wang, Xiao; Hilakivi-Clarke, Leena; Clarke, Robert; Xuan, Jianhua (2014-09-10)Background Recent advances in RNA sequencing (RNA-Seq) technology have offered unprecedented scope and resolution for transcriptome analysis. However, precise quantification of mRNA abundance and identification of differentially expressed genes are complicated due to biological and technical variations in RNA-Seq data. Results We systematically study the variation in count data and dissect the sources of variation into between-sample variation and within-sample variation. A novel Bayesian framework is developed for joint estimate of gene level mRNA abundance and differential state, which models the intrinsic variability in RNA-Seq to improve the estimation. Specifically, a Poisson-Lognormal model is incorporated into the Bayesian framework to model within-sample variation; a Gamma-Gamma model is then used to model between-sample variation, which accounts for over-dispersion of read counts among multiple samples. Simulation studies, where sequencing counts are synthesized based on parameters learned from real datasets, have demonstrated the advantage of the proposed method in both quantification of mRNA abundance and identification of differentially expressed genes. Moreover, performance comparison on data from the Sequencing Quality Control (SEQC) Project with ERCC spike-in controls has shown that the proposed method outperforms existing RNA-Seq methods in differential analysis. Application on breast cancer dataset has further illustrated that the proposed Bayesian model can 'blindly' estimate sources of variation caused by sequencing biases. Conclusions We have developed a novel Bayesian hierarchical approach to investigate within-sample and between-sample variations in RNA-Seq data. Simulation and real data applications have validated desirable performance of the proposed method. The software package is available at http://www.cbil.ece.vt.edu/software.htm.
- A Bayesian approach for accurate de novo transcriptome assemblyShi, Xu; Wang, Xiao; Neuwald, Andrew F.; Hilakivi-Clarke, Leena; Clarke, Robert; Xuan, Jianhua (2021-09-03)De novo transcriptome assembly from billions of RNA-seq reads is very challenging due to alternative splicing and various levels of expression, which often leads to incorrect, mis-assembled transcripts. BayesDenovo addresses this problem by using both a read-guided strategy to accurately reconstruct splicing graphs from the RNA-seq data and a Bayesian strategy to estimate, from these graphs, the probability of transcript expression without penalizing poorly expressed transcripts. Simulation and cell line benchmark studies demonstrate that BayesDenovo is very effective in reducing false positives and achieves much higher accuracy than other assemblers, especially for alternatively spliced genes and for highly or poorly expressed transcripts. Moreover, BayesDenovo is more robust on multiple replicates by assembling a larger portion of common transcripts. When applied to breast cancer data, BayesDenovo identifies phenotype-specific transcripts associated with breast cancer recurrence.
- BMRF-MI: integrative identification of protein interaction network by modeling the gene dependencyShi, Xu; Wang, Xiao; Shajahan, Ayesha; Hilakivi-Clarke, Leena; Clarke, Robert; Xuan, Jianhua (2015-06-11)Background Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes. Results We proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis. Conclusions We have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients.
- Computational Modeling for Differential Analysis of RNA-seq and Methylation dataWang, Xiao (Virginia Tech, 2016-08-16)Computational systems biology is an inter-disciplinary field that aims to develop computational approaches for a system-level understanding of biological systems. Advances in high-throughput biotechnology offer broad scope and high resolution in multiple disciplines. However, it is still a major challenge to extract biologically meaningful information from the overwhelming amount of data generated from biological systems. Effective computational approaches are of pressing need to reveal the functional components. Thus, in this dissertation work, we aim to develop computational approaches for differential analysis of RNA-seq and methylation data to detect aberrant events associated with cancers. We develop a novel Bayesian approach, BayesIso, to identify differentially expressed isoforms from RNA-seq data. BayesIso features a joint model of the variability of RNA-seq data and the differential state of isoforms. BayesIso can not only account for the variability of RNA-seq data but also combines the differential states of isoforms as hidden variables for differential analysis. The differential states of isoforms are estimated jointly with other model parameters through a sampling process, providing an improved performance in detecting isoforms of less differentially expressed. We propose to develop a novel probabilistic approach, DM-BLD, in a Bayesian framework to identify differentially methylated genes. The DM-BLD approach features a hierarchical model, built upon Markov random field models, to capture both the local dependency of measured loci and the dependency of methylation change. A Gibbs sampling procedure is designed to estimate the posterior distribution of the methylation change of CpG sites. Then, the differential methylation score of a gene is calculated from the estimated methylation changes of the involved CpG sites and the significance of genes is assessed by permutation-based statistical tests. We have demonstrated the advantage of the proposed Bayesian approaches over conventional methods for differential analysis of RNA-seq data and methylation data. The joint estimation of the posterior distributions of the variables and model parameters using sampling procedure has demonstrated the advantage in detecting isoforms or methylated genes of less differential. The applications to breast cancer data shed light on understanding the molecular mechanisms underlying breast cancer recurrence, aiming to identify new molecular targets for breast cancer treatment.
- Evaluation of Aerogel Spheres Derived from Salix psammophila in Removal of Heavy Metal Ions in Aqueous SolutionZhong, Yuan; An, Yuhong; Wang, Kebing; Zhang, Wanqi; Hu, Zichu; Chen, Zhangjing; Wang, Sunguo; Wang, Boyun; Wang, Xiao; Li, Xinran; Zhang, Xiaotao; Wang, Ximing (MDPI, 2022-01-04)Heavy metal wastewater treatment is a huge problem facing human beings, and the application degree of Salix psammophila resources produced by flat stubble is low. Therefore, it is very important to develop high-value products of Salix psammophila resources and apply them in the removal heavy metal from effluent. In this work, we extracted the cellulose from Salix psammophila, and cellulose nanofibers (CNFs) were prepared through TEMPO oxidation/ultrasound. The aerogel spheres derived from Salix psammophila (ASSP) were prepared with the hanging drop method. The experimental results showed that the Cu(II) adsorption capacity of the ASSP composite (267.64 mg/g) doped with TOCNF was significantly higher than that of pure cellulose aerogel spheres (52.75 mg/g). The presence of carboxyl and hydroxyl groups in ASSP enhanced the adsorption capacity of heavy metals. ASSP is an excellent heavy metal adsorbent, and its maximum adsorption values for Cu(II), Mn(II), and Zn(II) were found to be 272.69, 253.25, and 143.00 mg/g, respectively. The abandoned sand shrub resource of SP was used to adsorb heavy metals from effluent, which provides an important reference value for the development of forestry in this sandy area and will have a great application potential in the fields of the adsorption of heavy metals in soil and antibiotics in water.
- Optimization and prediction of the electron-nuclear dipolar and scalar interaction in 1H and 13C liquid state dynamic nuclear polarizationWang, Xiao; Isley, William C., III; Salido, Sandra I.; Sun, Z.; Song, Li; Tsai, K. H.; Cramer, Christopher J.; Dorn, Harry C. (The Royal Society of Chemistry, 2015-07-29)During the last 10–15 years, dynamic nuclear polarization (DNP) has evolved as a powerful tool for hyperpolarization of NMR and MRI nuclides. However, it is not as well appreciated that solution-state dynamic nuclear polarization is a powerful approach to study intermolecular interactions in solution. For solutions and fluids, the 1H nuclide is usually dominated by an Overhauser dipolar enhancement and can be significantly increased by decreasing the correlation time (τc) of the substrate/nitroxide interaction by utilizing supercritical fluids (SF CO2). For molecules containing the ubiquitous 13C nuclide, the Overhauser enhancement is usually a profile of both scalar and dipolar interactions. For carbon atoms without an attached hydrogen, a dipolar enhancement usually dominates as we illustrate for sp2 hybridized carbons in the fullerenes, C60 and C70. However, the scalar interaction is dependent on a Fermi contact interaction which does not have the magnetic field dependence inherent in the dipolar interaction. For a comprehensive range of molecular systems we show that molecules that exhibit weakly acidic complexation interaction(s) with nitroxides provide corresponding large scalar enhancements. For the first time, we report that sp hybridized (H–C) alkyne systems, for example, the phenylacetylene–nitroxide system exhibit very large scalar dominated enhancements. Finally, we demonstrate for a wide range of molecular systems that the Fermi contact interaction can be computationally predicted via electron–nuclear hyperfine coupling and correlated with experimental 13C DNP enhancements.
- Simulation of Simultaneously Negative Medium MetamaterialsWang, Xiao (Virginia Tech, 2009-09-21)Metamaterials are artificial materials and named by those who work in the microwave material area. According to existing documentation, the metamaterials have relative permittivity and/or relative permeability of values less than 1, including negative values. If the material has negative permittivity and permeability at the same time, the material is also referred to as simultaneously negative medium (DNG medium). Such medium has several features that any natural medium is not equipped with: negative refraction, backward phase, and evanescent wave amplification [5]. Though the medium does not exist in nature, it seems that it can be artificially made through synthesizing metallic insertions inside the natural dielectrics [2]. Due to its unique feature of negative refraction and this feature is not equipped with any reported natural medium, the concept of making perfect lenses with metamaterials has attracted attentions in recent years. However a number of questions need to be answered: How can we quantize the refractive index of the metamaterial given that the permittivity and permeability are known or vice versa; can the metamaterial be made isotropic medium under effects of different incident angles? The answer to the first question will help us to define the dimension of the lenses more efficiently; and the answer to the latter question will help determine if such medium is capable of being used to make lenses. Previous publications from others demonstrated the negative refraction phenomenon of metamaterials though this phenomenon is restricted to a very narrow band [4] [11]. The derivation of the negative refractive index through full-wave simulation and comparison with its value through calculating the simulated negative permittivity and permeability obtained from the simulated scattering matrix have not been reported. The work carried in this thesis fully explored the ways to address this and answer those questions mentioned in previous paragraph. To fully understand the negative refraction effect of metamaterial, the author built a mathematical geometric model to calculate refractive index for rectangular metamaterial slab. With this approach, the refractive index can be obtained provided that incident and peak-receive angle are known. In order to achieve a metamaterial with isotropy property, the author also presented three different types of metamaterial slabs: parallel-arranged, vertical-arranged and cross-arranged slab of capacitive-loaded-loops (CLL) in front of standing probes or posts, which are also called CLL-P slabs. The three arrangements are differentiated by the way unit cell is oriented. With the geometric model, the author obtained refractive indexes for three metamaterial slabs at different incident angles through numerical simulation. The refractive indexes have negative values at all circumstance, which shows the negative refraction phenomena unique to the metamaterial. Unlike the other two CLL-P slabs, the cross-arranged CLL-P slab has near constant refractive index and constant received amplitude regardless of incident angles. This result can be attributed to the symmetrical topology of unit cell in x-y plane. To better explain refractive effects occurred for those three CLL-P slabs, the author also employed a way to calculate the effective permittivity and permeability using scattering matrix. Based on effective permittivity and permeability obtained, the analytical values of refractive indexes have been calculated at resonance point. To check the refractive indexes calculated from two different methods: using Snell's Law based geometric approach and using permittivity/permeability obtained from scattering matrix, two results are compared against each other and agree well. Knowing effective permittivity and permeability is very useful for calculating other parameters of the CLL-P slab such as wave impedance and mismatch loss etc. With all the simulation for parallel-arranged, vertical-arranged, and cross-arranged CLL-P slabs, from simulation results, it is found that the cross-arranged slab has the property of isotropy at different incident angles since the coupling between incident magnetic field and CLL loop will maintain constant. As a validation process, the CLL-P simulation result in parallel waveguide is compared with prior simulation (HFSS) and measurements of refractive focusing of the same structure, and both simulation results agree with measurements. The full-wave simulation tools FEKO that employs the Method of Moments (MoM) is used in the two ways of estimating the negative refractive index of the medium.
- Social Turing Tests: Crowdsourcing Sybil DetectionWang, Gang Alan; Mohanlal, Manish; Wilson, Christo; Wang, Xiao; Metzger, Miriam; Zheng, Haitao; Zhao, Ben Y. (Internet Society, 2013-02)As popular tools for spreading spam and malware, Sybils (or fake accounts) pose a serious threat to online communities such as Online Social Networks (OSNs). Today, sophisticated attackers are creating realistic Sybils that effectively befriend legitimate users, rendering most automated Sybil detection techniques ineffective. In this paper, we explore the feasibility of a crowdsourced Sybil detection system for OSNs. We conduct a large user study on the ability of humans to detect today’s Sybil accounts, using a large corpus of ground-truth Sybil accounts from the Facebook and Renren networks. We analyze detection accuracy by both “experts” and “turkers” under a variety of conditions, and find that while turkers vary significantly in their effectiveness, experts consistently produce near-optimal results. We use these results to drive the design of a multi-tier crowdsourcing Sybil detection system. Using our user study data, we show that this system is scalable, and can be highly effective either as a standalone system or as a complementary technique to current tools.