Browsing by Author "Zhang, Liqing"
Now showing 1 - 20 of 133
Results Per Page
Sort Options
- AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadataLiang, Xiao; Akers, Kyle; Keenum, Ishi M.; Wind, Lauren L.; Gupta, Suraj; Chen, Chaoqi; Aldaihani, Reem; Pruden, Amy; Zhang, Liqing; Knowlton, Katharine F.; Xia, Kang; Heath, Lenwood S. (2021-03-10)Background Metagenomics is gaining attention as a powerful tool for identifying how agricultural management practices influence human and animal health, especially in terms of potential to contribute to the spread of antibiotic resistance. However, the ability to compare the distribution and prevalence of antibiotic resistance genes (ARGs) across multiple studies and environments is currently impossible without a complete re-analysis of published datasets. This challenge must be addressed for metagenomics to realize its potential for helping guide effective policy and practice measures relevant to agricultural ecosystems, for example, identifying critical control points for mitigating the spread of antibiotic resistance. Results Here we introduce AgroSeek, a centralized web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs. AgroSeek draws from rich, user-provided metagenomic data and metadata to facilitate analysis, comparison, and prediction in a user-friendly fashion. Further, AgroSeek draws from publicly-contributed data sets to provide a point of comparison and context for data analysis. To incorporate metadata into our analysis and comparison procedures, we provide flexible metadata templates, including user-customized metadata attributes to facilitate data sharing, while maintaining the metadata in a comparable fashion for the broader user community and to support large-scale comparative and predictive analysis. Conclusion AgroSeek provides an easy-to-use tool for environmental metagenomic analysis and comparison, based on both gene annotations and associated metadata, with this initial demonstration focusing on control of antibiotic resistance in agricultural ecosystems. Agroseek creates a space for metagenomic data sharing and collaboration to assist policy makers, stakeholders, and the public in decision-making. AgroSeek is publicly-available at https://agroseek.cs.vt.edu/ .
- Analysis of the Fitness Effect of Compensatory MutationsZhang, Liqing; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008)We extend our previous work on the fitness effect of the fixation of deleterious mutations on a population by incorporating the effect of compensatory mutations. Compensatory mutations are important in the sense that they make the deleterious mutations less deleterious, thus reducing the genetic load of the population. The essential phenomenon underlying compensatory mutations is the nonindependence of mutations in biological systems. Therefore, it is an important phenomenon that cannot be ignored when considering the fixation and fitness effect of deleterious mutations. Since having compensatory mutations essentially changes the distributional shapes of deleterious mutations, we can consider the effect of compensatory mutations by comparing two distributions where one distribution reflects the reduced fitness effects of deleterious mutations with the influence of compensatory mutations. We compare different distributions of deleterious mutations without compensatory mutations to those with compensatory mutations, and study the effect of population sizes, the shape of the distribution, and the mutation rates of the population on the total fitness reduction of the population.
- Antibiotic Resistance Characterization in Human Fecal and Environmental Resistomes using Metagenomics and Machine LearningGupta, Suraj (Virginia Tech, 2021-11-03)Antibiotic resistance is a global threat that can severely imperil public health. To curb the spread of antibiotic resistance, it is imperative that efforts commensurate with a “One Health” approach are undertaken. Given that interconnectivities among ecosystems can serve as conduits for the proliferation and dissemination of antibiotic resistance, it is increasingly being recognized that a robust global environmental surveillance framework is required to promote One Health. The ideal aim would be to develop approaches that inform global distribution of antibiotic resistance, help prioritize monitoring targets, present robust data analysis frameworks to profile resistance, and ultimately help build strategies to curb the dissemination of antibiotic resistance. The work described in this dissertation was aimed at evaluating and developing different data analysis paradigms and their applications in investigating and characterizing antibiotic resistance across different resistomes. The applications presented in Chapter 2 illustrate challenges associated with various environmental data types (especially metagenomics data) and present a path to advance incorporation of data analytics approaches in Environmental Science and Engineering research and applications. Chapter 3 presents a novel approach, ExtrARG, that identifies discriminatory ARGs among resistomes based on factors of interest. The results in Chapter 4 provide insight into the global distribution of ARGs across human fecal and sewage resistomes across different socioeconomics. Chapter 5 demonstrates a data analysis paradigm using machine learning algorithms that helps bridge the gap between information obtained via culturing and metagenomic sequencing. Lastly, the results of Chapter 6 illustrates the contribution of phages to antibiotic resistance. Overall, the findings provide guidance and approaches for profiling antibiotic resistance using metagenomics and machine learning. The results reported further expand the knowledge on the distribution of antibiotic resistance across different resistomes.
- Antimicrobial Resistance Mitigation [ARM] Concept PaperVikesland, Peter J.; Alexander, Kathleen A.; Badgley, Brian D.; Krometis, Leigh-Anne H.; Knowlton, Katharine F.; Gohlke, Julia M.; Hall, Ralph P.; Hawley, Dana M.; Heath, Lenwood S.; Hession, W. Cully; Hull, Robert Bruce IV; Moeltner, Klaus; Ponder, Monica A.; Pruden, Amy; Schoenholtz, Stephen H.; Wu, Xiaowei; Xia, Kang; Zhang, Liqing (Virginia Tech, 2017-05-15)The development of viable solutions to the global threat of antimicrobial resistance requires a transdisciplinary approach that simultaneously considers the clinical, biological, social, economic, and environmental drivers responsible for this emerging threat. The vision of the Antimicrobial Resistance Mitigation (ARM) group is to build upon and leverage the present strengths of Virginia Tech in ARM research and education using a multifaceted systems approach. Such a framework will empower our group to recognize the interconnectedness and interdependent nature of this threat and enable the delineation, development, and testing of resilient approaches for its mitigation. We seek to develop innovative and sustainable approaches that radically advance detection, characterization, and prevention of antimicrobial resistance emergence and dissemination in human-dominated and natural settings...
- Apigenin Impacts the Growth of the Gut Microbiota and Alters the Gene Expression of EnterococcusWang, Minqian; Firrman, Jenni; Zhang, Liqing; Arango-Argoty, Gustavo; Tomasula, Peggy; Liu, Lin Shu; Xiao, Weidong; Yam, Kit (MDPI, 2017-08-03)Apigenin is a major dietary flavonoid with many bioactivities, widely distributed in plants. Apigenin reaches the colon region intact and interacts there with the human gut microbiota, however there is little research on how apigenin affects the gut bacteria. This study investigated the effect of pure apigenin on human gut bacteria, at both the single strain and community levels. The effect of apigenin on the single gut bacteria strains Bacteroides galacturonicus, Bifidobacterium catenulatum, Lactobacillus rhamnosus GG, and Enterococcus caccae, was examined by measuring their anaerobic growth profiles. The effect of apigenin on a gut microbiota community was studied by culturing a fecal inoculum under in vitro conditions simulating the human ascending colon. 16S rRNA gene sequencing and GC-MS analysis quantified changes in the community structure. Single molecule RNA sequencing was used to reveal the response of Enterococcus caccae to apigenin. Enterococcus caccae was effectively inhibited by apigenin when cultured alone, however, the genus Enterococcus was enhanced when tested in a community setting. Single molecule RNA sequencing found that Enterococcus caccae responded to apigenin by up-regulating genes involved in DNA repair, stress response, cell wall synthesis, and protein folding. Taken together, these results demonstrate that apigenin affects both the growth and gene expression of Enterococcus caccae.
- Applications of Machine Learning in Source Attribution and Gene Function PredictionChinnareddy, Sandeep (Virginia Tech, 2024-06-07)This research investigates the application of machine learning techniques in computational genomics across two distinct domains: (1) the predicting the source of bacterial pathogen using whole genome sequencing data, and (2) the functional annotation of genes using single- cell RNA sequencing data. This work proposes the development of a bioinformatics pipeline tailored for identifying genomic variants, including gene presence/absence and single nu- cleotide polymorphism. This methodology is applied to specific strains such as Salmonella enterica serovar Typhimurium and the Ralstonia solanacearum species complex. Phylo- genetic analyses along with pan-genome and positive selection studiesshow that genomic variants and evolutionary patterns of S. Typhimurium vary across sources, which suggests that sources can be accurately attributed based on genomic variants empowered by machine learning. We benchmarked seven traditional machine learning algorithms, achieving a no- table accuracy of 94.6% in host prediction for S. Typhimurium using the Random Forest model, underscored by SHAP value analyses which elucidated key predictive features. Next, the focus is shifted to the prediction of Gene Ontology terms for Arabidopsis genes using single-cell RNA-seq data. This analysis offers a detailed comparison of gene expression in root versus shoot tissues, juxtaposed with insights from bulk RNA-seq data. The integration of regulatory network data from DAP-seq significantly enhances the prediction accuracy of gene functions.
- ARGem: a new metagenomics pipeline for antibiotic resistance genes: metadata, analysis, and visualizationLiang, Xiao; Zhang, Jingyi; Kim, Yoonjin; Ho, Josh; Liu, Kevin; Keenum, Ishi M.; Gupta, Suraj; Davis, Benjamin; Hepp, Shannon L.; Zhang, Liqing; Xia, Kang; Knowlton, Katharine F.; Liao, Jingqiu; Vikesland, Peter J.; Pruden, Amy; Heath, Lenwood S. (Frontiers, 2023-09-15)Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user’s configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available at https://github.com/xlxlxlx/ARGem.
- An Atlas of the Speed of Copy Number Changes in Animal Gene Families and Its ImplicationsPan, Deng; Zhang, Liqing (PLOS, 2009-10-23)The notion that gene duplications generating new genes and functions is commonly accepted in evolutionary biology. However, this assumption is more speculative from theory rather than well proven in genome-wide studies. Here, we generated an atlas of the rate of copy number changes (CNCs) in all the gene families of ten animal genomes. We grouped the gene families with similar CNC dynamics into rate pattern groups (RPGs) and annotated their function using a novel bottom-up approach. By comparing CNC rate patterns, we showed that most of the species-specific CNC rates groups are formed by gene duplication rather than gene loss, and most of the changes in rates of CNCs may be the result of adaptive evolution. We also found that the functions of many RPGs match their biological significance well. Our work confirmed the role of gene duplication in generating novel phenotypes, and the results can serve as a guide for researchers to connect the phenotypic features to certain gene duplications.
- A Biclustering Approach to Combinatorial Transcription ControlSrinivasan, Venkataraghavan (Virginia Tech, 2005-07-06)Combinatorial control of transcription is a well established phenomenon in the cell. Multiple transcription factors often bind to the same transcriptional control region of a gene and interact with each other to control the expression of the gene. It is thus necessary to consider the joint conservation of sequence pairs in order to identify combinations of binding sites to which the transcription factors bind. Conventional motif finding algorithms fail to address this issue. We propose a novel biclustering algorithm based on random sampling to identify candidate binding site combinations. We establish bounds on the various parameters to the algorithm and study the conditions under which the algorithm is guaranteed to identify candidate binding sites. We analyzed a yeast cell cycle gene expression data set using our algorithm and recovered certain novel combinations of binding sites, besides those already reported in the literature.
- Bioflow: A web based workflow management system for design and execution of genomics pipelinesPuthige, Ashwin Acharya (Virginia Tech, 2014-01-11)The cost required for the process of sequencing genomes has decreased drastically in the last few years. The knowledge of full genomes has increased the pace of the advancements in the field of functional genomics. Computational genomics, which analyses these sequences, has seen a similar growth. The multitude of sequencing technologies has resulted in various formats for storing the sequences. This has resulted in the creation of many tools for DNA analysis. There are various tools for sorting, indexing, analyzing read groups and other tasks. The analysis of genomics often requires the creation of pipelines, which processes the DNA sequences by chaining together many tools. This results in the creation of complex scripts that glue together these tools and pass the output from one stage to the other. Also, there are tools which allow creation of these pipelines with a graphical user interface. But these are complex to use and it is difficult to quickly add the new tools being developed to existing workflows. To solve these issues, we developed BioFlow; a web based genomic workflow management system. The use of BioFlow does not require any programming skills. The integrated workflow designer allows creation and saving workflows. The pipeline is created by connecting the tools with a visual connector. BioFlow provides an easy and simple interface that allows users to quickly add tools for use in any workflow. Audit logs are maintained at each stage, which helps users to easily identify errors and fix them.
- Bioinformatic Analysis of Wastewater Metagenomes Reveals Microbial Ecological and Evolutionary Phenomena Underlying Associations of Antibiotic Resistance with Antibiotic UseBrown, Connor L. (Virginia Tech, 2024-01-17)Antibiotic resistance (AR) is a pervasive crisis that is intricately woven into social and environmental systems. Its escalation is fueled by factors such overuse, poverty, climate change, and the heightened interconnectedness characteristic of our era of globalization. In this dissertation, the impact of antibiotic usage is addressed from the perspective of wastewater-based surveillance (WBS) at the wastewater treatment plant (WWTP) and microbial ecology. Antibiotic usage and contamination was found to influence the prevalence of antibiotic resistance genes (ARGs) and resistant bacteria in both lab-scale and full-scale wastewater treatment settings. Through application of novel bioinformatic approaches developed herein, metagenomics revealed associations between sewage-associated microbes and community antibiotic use that were in part mediated by microbial ecological processes and horizontal gene transfer (HGT). In sum, this dissertation increases the arsenal of bioinformatic tools for AR surveillance in wastewater environments and advances knowledge with respect to the contribution of antibiotic use to the spread of antibiotic resistance at the community-scale. Three studies served to evaluate and/or develop bioinformatic resources for molecular characterization of AR in wastewater. Hybrid assembly combining emerging long read DNA sequencing and short read sequencing was evaluated and found to improve accuracy relative to assembly of long or short reads alone. A novel database of mobile genetic element (MGE) marker genes, mobileOG-db, was compiled in order to address short-comings with pre-existing resources. A pipeline for detecting HGT in metagenomes, Kairos, was created in order to facilitate the detection of HGT in metagenome assemblies which greatly amplified coverage of ARGs. In Chapter 5, a lab-scale study of WWTP bioreactors revealed that elevated antibiotic contamination was correlated with increased prevalence of corresponding ARGs. In addition, multiple in situ HGT events of ARGs encoding resistance to the elevated antibiotics were predicted, including one HGT event likely mediated by a novel bacteriophage. In Chapter 6, influent and effluent from a full-scale municipal WWTP were collected twice-weekly for one year and subjected to deep shotgun metagenomic sequencing. In parallel, collaboration with clinicians enabled statistical modeling of antibiotic usage and resistance, revealing associations between antibiotic prescriptions patterns in the region and resistance at the WWTP. Finally, Chapter 7 details bioinformatic recovery of diverse extended spectrum beta-lactamase gene recovery from the influent and effluent metagenomes, shedding light on the dynamics of circulating resistance genes. In sum, this dissertation identifies bioinformatic evidence for the selection of AR in wastewater environments as a result of antibiotic use in the community and advances hypotheses for explaining the mechanisms of the observed phenomena.
- A Broad Analysis of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and RatShoja, Valia (Virginia Tech, 2006-11-10)Tandemly arrayed genes (TAG) play an important functional and physiological role in the genome. Most previous studies have focused on individual TAG families in a few species, yet a broad characterization of TAGs is not available. We identified all the TAGs in the genomes of human, chimp, mouse, and rat and performed a comprehensive analysis of TAG distribution, TAG sizes, TAG gene orientations and intergenic distances, and TAG gene functions. TAGs account for about 14-17% of all the genomic genes and nearly one third of all the duplicated genes in the four genomes, highlighting the predominant role that tandem duplication plays in gene duplication. For all species, TAG distribution is highly heterogeneous along chromosomes and some chromosomes are enriched with TAG forests while others are enriched with TAG deserts. The majority of TAGs are of size two for all genomes, similar to the previous findings in C. elegans, A. thaliana, and O. sativa, suggesting that it is a rather general phenomenon in eukaryotes. The comparison with the genome patterns shows that TAG members have a significantly higher proportion of parallel gene orientation in all species, corroborating Graham's claim that parallel orientation is the preferred form of orientation in TAGs. Moreover, TAG members with parallel orientation tend to be closer to each other than all neighboring genes with parallel orientation in the genome. The analysis of GO function indicate that genes with receptor or binding activities are significantly over-represented by TAGs. Simulation reveals that random gene rearrangements have little effect on the statistics of TAGs for all genomes. It is noteworthy to mention that gene family sizes are significantly correlated with the extent of tandem duplication, suggesting that tandem duplication is a preferred form of duplication, especially in large families. There has not been any systematic study of TAG genes' expression patterns in the genome. Taking advantage of recent large-scale microarray data, we were able to study expression divergence of some of the TAGs of size two in human and mouse for which the expression data is available and examine the effect of sequence divergence, gene orientation, and physical proximity on the divergence of gene expression patterns. Our results show that there is a weak negative correlation between sequence divergence and expression similarity between the two members of a TAG, and also a weak negative correlation between physical proximity of two genes and their expression similarity. No significant relationship was detected between gene orientation and expression similarity. Moreover, we compared the expression breadth of upstream and downstream duplicate copies and found that downstream duplicate does not show significantly narrower expression breadth. We also compared TAG gene pairs with their neighboring non-TAG pairs for both physical proximity and expression similarity. Our results show that TAG gene pairs do not show any distinct differences in the two aspects from their neighboring gene pairs, suggesting that sufficient divergence has occurred to these duplicated genes during evolution and their original similarity conferred by duplication has decayed to a level that is comparable to their surrounding regions.
- Burst of Young Retrogenes and Independent Retrogene Formation in MammalsPan, Deng; Zhang, Liqing (PLOS, 2009-03-27)Retroposition and retrogenes gain increasing attention as recent studies show that they play an important role in human new gene formation. Here we examined the patterns of retrogene distribution in 8 mammalian genomes using 4 non-mammalian genomes as a contrast. There has been a burst of young retrogenes not only in primate lineages as suggested in a recent study, but also in other mammalian lineages. In mammals, most of the retrofamilies (the gene families that have retrogenes) are shared between species. In these shared retrofamilies, 14%–18% of functional retrogenes may have originated independently in multiple mammalian species. Notably, in the independently originated retrogenes, there is an enrichment of ribosome related gene function. In sharp contrast, none of these patterns hold in non-mammals. Our results suggest that the recruitment of the specific L1 retrotransposons in mammals might have been an important evolutionary event for the split of mammals and non-mammals and retroposition continues to be an important active process in shaping the dynamics of mammalian genomes, as compared to being rather inert in non-mammals.
- CAN-zip – Centroid Based Delta Compression of Next Generation Sequencing DataSteere, Edward; An, Lin; Zhang, Liqing (Department of Computer Science, Virginia Polytechnic Institute & State University, 2015-11-09)We present CANzip, a novel algorithm for compressing short read DNA sequencing data in FastQ format. CANzip is based on delta compression, a process in which only the differences of a specific data stream relative to a given reference stream are stored. However CANzip uniquely assumes no given reference stream. Instead it creates artificial references for different clusters of reads, by constructing an artificial representative sequence for each given cluster. Each cluster sequence is then recoded to indicate only how it differs relative to this artificially created reference sequence. Remodeling the data in this way greatly improves the compression ratio achieved when used in conjunction with commodity tools such as bzip2. Our results indicate that CANzip outperforms gzip on average and that it can outperform bzip2.
- Combinatorial Algorithms for Server Allocation ProblemSowle, Rachita (Virginia Tech, 2024-09-05)Motivated by problems in logistics, image recognition, and statistics, we consider the server allocation problem. In this problem, we are given $k$ servers (with capacities) and $n$ requests, which are points in a metric space. A server serves a request by moving to the request location, and the goal is to serve all requests while minimizing the total movement of servers, subject to the constraint that the number of requests served by a server cannot exceed its capacity. When the server capacity is $1$, and for the Euclidean metric, the problem reduces to the Euclidean bipartite matching problem. When the capacity is $infty$, suppose we are also provided with the order in which requests are to be served, the problem is the $k$-first come first served routing problem. We also consider a generalization of the $k$-first come first served routing problem to the taxi allocation problem, where each request is associated with a pickup location, dropoff location, and pickup time, and the server's velocity is also given as input. We present new algorithms for the Euclidean bipartite matching problem, showing improvements over existing algorithms. In particular, for two point sets $A, B subset mathbb{R}^d$ with $|A| = |B| = n$ and dimension $d > 1$ being constant, we developed: begin{itemize} item A faster algorithm that computes an $varepsilon$-approximate minimum-cost perfect matching in $O(n(varepsilon^{-O(d^3)}loglog n + varepsilon^{-O(d)}log^4 nlog^5log n))$ time. This is an improvement over previous algorithms, which took $n(varepsilon^{-1}log n)^{Omega(d)}$ time. item An algorithm that boosts the accuracy of any $varepsilon$-additive approximation algorithm, achieving an expected additive error of $min{varepsilon, (dloglog n)w^*}$ from the optimal matching cost $w^*$ in $O(T(n, varepsilon/d)loglog n)$ time, where $T(n, varepsilon)$ is the time complexity of any given $eps$-additive approximation algorithm. end{itemize} For the $k$-first come first served routing problem, we present the following results. begin{itemize} item The online version of the $k$-first come first served routing problem is the celebrated $k$-server problem. The best-known online algorithm for this problem is the Work Function algorithm. We present a new implementation of the work function algorithm, where processing the $i$th request takes $O((i+k)^2)$ time, improving on the previous methods that take $Omega(k(i+k)^2)$ time. item For the offline setting, we show that the $k$-first come first served routing problem and the taxi allocation problem can be reduced to the minimum-cost bipartite matching problem. Using this reduction, begin{itemize} item we develop a time-based divide-and-conquer algorithm to obtain an optimal solution in $tilde{O}(kn^2)$ time, which can be further improved to $tilde{O}(kn)$ when the requests and servers are in two-dimensional Euclidean space, and, item we apply a recently presented geometric divide-and-conquer algorithm to obtain an optimal solution for the taxi routing problem in a two-dimensional Euclidean space. As a result, we obtain significant empirical performance improvements for the taxi allocation problem in a two-dimensional space where the cost of moving from one location to another is lower bounded by the Euclidean cost. end{itemize} end{itemize}
- Comparative Genomics Insights into Speciation and Evolution of Hawaiian DrosophilaKang, Lin (Virginia Tech, 2017-05-01)Speciation and adaptation have always been of great interest to biologists. The Hawaiian archipelago provides a natural arena for understanding adaptive radiation and speciation, and genomics and bioinformatics offer new approaches for studying these fundamental processes. The mode of speciation should have profound impacts on the genomic architecture and patterns of reproductive isolation of new species. The Hawaiian Drosophila are a spectacular example of sequential colonization, adaptive radiation, and speciation in the islands with nearly 1,000 estimated species, of which more than 500 have been described to date. This dissertation gives an overview of the Hawaiian Drosophila system (Chapter 1), new insights into genomes of three recently diverged species of Hawaiian picture-winged Drosophila (Chapter 2), as well as estimated gene flow patterns (Chapter 3). Additionally, I present a new approach of mapping genomic scaffolds onto chromosomes, based on NextGen sequencing from chromosomal microdissections (Chapter 4), and gene expression profiles of backcross hybrids and their parental forms (Chapter 5). Overall, obtained results were used to address such fundamental questions as the role of adaptive changes, founder effects (small effective population size in isolation), and genetic admixture during speciation.
- Comparison of Whole-Genome Sequences of Legionella pneumophila in Tap Water and in Clinical Strains, Flint, Michigan, USA, 2016Garner, Emily; Brown, Connor L.; Schwake, David Otto; Rhoads, William J.; Arango-Argoty, Gustavo; Zhang, Liqing; Jospin, Guillaume; Coil, David A.; Eisen, Jonathan A.; Edwards, Marc A.; Pruden, Amy (Centers for Disease Control and Prevention, 2019-11)During the water crisis in Flint, Michigan, USA (2014–2015), 2 outbreaks of Legionnaires’ disease occurred in Genesee County, Michigan. We compared whole-genome sequences of 10 clinical Legionella pneumophila isolates submitted to a laboratory in Genesee County during the second outbreak with 103 water isolates collected the following year. We documented a genetically diverse range of L. pneumophila strains across clinical and water isolates. Isolates belonging to 1 clade (3 clinical isolates, 3 water isolates from a Flint hospital, 1 water isolate from a Flint residence, and the reference Paris strain) had a high degree of similarity (2–1,062 single-nucleotide polymorphisms), all L. pneumophila sequence type 1, serogroup 1. Serogroup 6 isolates belonging to sequence type 2518 were widespread in Flint hospital water samples but bore no resemblance to available clinical isolates. L. pneumophila strains in Flint tap water after the outbreaks were diverse and similar to some disease-causing strains.
- Comprehensive off-target analysis of dCas9-SAM-mediated HIV reactivation via long noncoding RNA and mRNA profilingZhang, Yonggang; Arango-Argoty, Gustavo; Li, Fang; Xiao, Xiao; Putatunda, Raj; Yu, Jun; Yang, Xiao-Feng; Wang, Hong; Watson, Layne T.; Zhang, Liqing; Hu, Wenhui (2018-09-10)Background CRISPR/CAS9 (epi)genome editing revolutionized the field of gene and cell therapy. Our previous study demonstrated that a rapid and robust reactivation of the HIV latent reservoir by a catalytically-deficient Cas9 (dCas9)-synergistic activation mediator (SAM) via HIV long terminal repeat (LTR)-specific MS2-mediated single guide RNAs (msgRNAs) directly induces cellular suicide without additional immunotherapy. However, potential off-target effect remains a concern for any clinical application of Cas9 genome editing and dCas9 epigenome editing. After dCas9 treatment, potential off-target responses have been analyzed through different strategies such as mRNA sequence analysis, and functional screening. In this study, a comprehensive analysis of the host transcriptome including mRNA, lncRNA, and alternative splicing was performed using human cell lines expressing dCas9-SAM and HIV-targeting msgRNAs. Results The control scrambled msgRNA (LTR_Zero), and two LTR-specific msgRNAs (LTR_L and LTR_O) groups show very similar expression profiles of the whole transcriptome. Among 839 identified lncRNAs, none exhibited significantly different expression in LTR_L vs. LTR_Zero group. In LTR_O group, only TERC and scaRNA2 lncRNAs were significantly decreased. Among 142,791 mRNAs, four genes were differentially expressed in LTR_L vs. LTR_Zero group. There were 21 genes significantly downregulated in LTR_O vs. either LTR_Zero or LTR_L group and one third of them are histone related. The distributions of different types of alternative splicing were very similar either within or between groups. There were no apparent changes in all the lncRNA and mRNA transcripts between the LTR_L and LTR_Zero groups. Conclusion This is an extremely comprehensive study demonstrating the rare off-target effects of the HIV-specific dCas9-SAM system in human cells. This finding is encouraging for the safe application of dCas9-SAM technology to induce target-specific reactivation of latent HIV for an effective “shock-and-kill” strategy.
- Computational Analysis of Viruses in Metagenomic DataTithi, Saima Sultana (Virginia Tech, 2019-10-24)Viruses have huge impact on controlling diseases and regulating many key ecosystem processes. As metagenomic data can contain many microbiomes including many viruses, by analyzing metagenomic data we can analyze many viruses at the same time. The first step towards analyzing metagenomic data is to identify and quantify viruses present in the data. In order to answer this question, we developed a computational pipeline, FastViromeExplorer. FastViromeExplorer leverages a pseudoalignment based approach, which is faster than the traditional alignment based approach to quickly align millions/billions of reads. Application of FastViromeExplorer on both human gut samples and environmental samples shows that our tool can successfully identify viruses and quantify the abundances of viruses quickly and accurately even for a large data set. As viruses are getting increased attention in recent times, most of the viruses are still unknown or uncategorized. To discover novel viruses from metagenomic data, we developed a computational pipeline named FVE-novel. FVE-novel leverages a hybrid of both reference based and de novo assembly approach to recover novel viruses from metagenomic data. By applying FVE-novel to an ocean metagenome sample, we successfully recovered two novel viruses and two different strains of known phages. Analysis of viral assemblies from metagenomic data reveals that viral assemblies often contain assembly errors like chimeric sequences which means more than one viral genomes are incorrectly assembled together. In order to identify and fix these types of assembly errors, we developed a computational tool called VirChecker. Our tool can identify and fix assembly errors due to chimeric assembly. VirChecker also extends the assembly as much as possible to complete it and then annotates the extended and improved assembly. Application of VirChecker to viral scaffolds collected from an ocean meatgenome sample shows that our tool successfully fixes the assembly errors and extends two novel virus genomes and two strains of known phage genomes.
- A Computational Framework for Interacting with Physical Molecular Models of the Polypeptide ChainChakraborty, Promita (Virginia Tech, 2014-05-08)Although nonflexible, scaled molecular models like Pauling-Corey's and its descendants have made significant contributions in structural biology research and pedagogy, recent technical advances in 3D printing and electronics make it possible to go one step further in designing physical models of biomacromolecules: to make them conformationally dynamic. We report the design, construction, and validation of a flexible, scaled, physical model of the polypeptide chain, which accurately reproduces the bond rotational degrees-of-freedom in the peptide backbone. The coarse-grained backbone model consists of repeating amide and alpha-carbon units, connected by mechanical bonds (corresponding to phi and psi angles) that include realistic barriers to rotation that closely approximate those found at the molecular scale. Longer-range hydrogen-bonding interactions are also incorporated, allowing the chain to easily fold into stable secondary structures. This physical model can serve as the basis for linking tangible bio-macromolecular models directly to the vast array of existing computational tools to provide an enhanced and interactive human-computer interface. We have explored the boundaries of this direction at the interface of computational tools and physical models of biological macromolecules at the nano-scale. Using a CAD-biocomputational framework, we have provided a methodology to design and build physical protein models focusing on shape and dynamics. We have also developed a workflow and an interface implemented for such bio-modeling tools. This physical-digital interface paradigm, at the intersection of native state proteins (P), computational models (C) and physical models (P), provides new opportunities for building an interactive computational modeling tool for protein folding and drug design. Furthermore, this model is easily constructed with readily obtainable parts and promises to be a tremendous educational aid to the intuitive understanding of chain folding as the basis for macromolecular structure.