Browsing by Author "Liang, Xiao"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadataLiang, Xiao; Akers, Kyle; Keenum, Ishi M.; Wind, Lauren L.; Gupta, Suraj; Chen, Chaoqi; Aldaihani, Reem; Pruden, Amy; Zhang, Liqing; Knowlton, Katharine F.; Xia, Kang; Heath, Lenwood S. (2021-03-10)Background Metagenomics is gaining attention as a powerful tool for identifying how agricultural management practices influence human and animal health, especially in terms of potential to contribute to the spread of antibiotic resistance. However, the ability to compare the distribution and prevalence of antibiotic resistance genes (ARGs) across multiple studies and environments is currently impossible without a complete re-analysis of published datasets. This challenge must be addressed for metagenomics to realize its potential for helping guide effective policy and practice measures relevant to agricultural ecosystems, for example, identifying critical control points for mitigating the spread of antibiotic resistance. Results Here we introduce AgroSeek, a centralized web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs. AgroSeek draws from rich, user-provided metagenomic data and metadata to facilitate analysis, comparison, and prediction in a user-friendly fashion. Further, AgroSeek draws from publicly-contributed data sets to provide a point of comparison and context for data analysis. To incorporate metadata into our analysis and comparison procedures, we provide flexible metadata templates, including user-customized metadata attributes to facilitate data sharing, while maintaining the metadata in a comparable fashion for the broader user community and to support large-scale comparative and predictive analysis. Conclusion AgroSeek provides an easy-to-use tool for environmental metagenomic analysis and comparison, based on both gene annotations and associated metadata, with this initial demonstration focusing on control of antibiotic resistance in agricultural ecosystems. Agroseek creates a space for metagenomic data sharing and collaboration to assist policy makers, stakeholders, and the public in decision-making. AgroSeek is publicly-available at https://agroseek.cs.vt.edu/ .
- ARGem: a new metagenomics pipeline for antibiotic resistance genes: metadata, analysis, and visualizationLiang, Xiao; Zhang, Jingyi; Kim, Yoonjin; Ho, Josh; Liu, Kevin; Keenum, Ishi M.; Gupta, Suraj; Davis, Benjamin; Hepp, Shannon L.; Zhang, Liqing; Xia, Kang; Knowlton, Katharine F.; Liao, Jingqiu; Vikesland, Peter J.; Pruden, Amy; Heath, Lenwood S. (Frontiers, 2023-09-15)Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user’s configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available at https://github.com/xlxlxlx/ARGem.
- Collection Management WebpagesEagan, Mackenzie; Liang, Xiao; Michael, Louis; Patil, Supritha (Virginia Polytechnic Institute and State University, 2017-12-25)The Collection Management Webpages team is responsible for collecting, processing, and storing webpages from different sources. Our team worked on familiarizing ourselves with the necessary tools and data required to produce the specified output that was used by other teams in this class (Fall 2017 CS 5604). Input includes URLs generated by the Event Focused Crawler (EFC), URLs obtained from tweets by the Collection Management Tweets team, and webpage content from Web Archive (WARC) files from the Internet Archive or other sources. Our team fetches raw HTML from the obtained URLs and extracts HTML from WARC files. From this raw data, we obtain metadata information about the corresponding webpage. The raw data is also cleaned and processed for other teams' consumption. This processing is accomplished using various Python libraries. The cleaned information is made available in a variety of formats, including tokens, stemmed or lemmatized text, and text tagged with parts of speech. Both the raw and processed webpage data are stored in HBase and intermediately in HDFS (Hadoop Distributed File System). Our team successfully executed all individual portions of our proposed process. We successfully ran the EFC and obtained URLs from these runs. Using these URLs, we created WARC files. We obtained the raw HTML, extracted metadata information from it, and cleaned and processed the webpage information before uploading it to HBase. We iteratively expanded on the functionality of our cleaning and processing scripts in order to provide more relevant information to other groups. We processed and cleaned information from WARC files provided by the instructor in a similar manner. We have acquired webpage data from URLs obtained by the Collection Management Tweets (CMT) team. At this time however, there is no end-to-end process in place. Due to the volume of data our team has been dealing with, we explored various methods for parallelizing and speeding up our processes. Our team used the PySpark library for obtaining information from URLs and the multiprocessing library in Python for processing information stored in WARC files.
- Computational Insights into Evolutionary Dynamics of Human and Primate GenesLiang, Xiao (Virginia Tech, 2024-06-06)The evolutionary history of genes across different species is a subject of research interest. For human genes, there is a particular focus on investigating the possible origins of genes. However, there has been limited research on the development process from an evolutionary perspective. Additionally, most previous studies have focused on model organisms and representative organisms from various eras, with less attention given to primates, which are evolutionarily more closely-related to humans. With the advancement of whole genome sequencing of primates, investigating the genes of various primate species has become a viable possibility. This dissertation work integrates computational insights into the topics of primate and human gene emergence, conservation, and loss. Specifically, this series of studies contributes to three aspects of the topic: (1) the environmental conditions in evolution history that are associated with the emergence of primate and human de novo genes, (2) the evolutionary dynamics of human cancer genes in primates, and (3) gene conservation and loss in primates. Results reveal that primate and human de novo genes and cancer genes share similarities in the time of emergence, peaking later than random human genes and tending to occur in local warm periods in the context of an overall trend of decreasing global surface temperature. Cancer genes are more conserved in their evolutionary origins than random human genes, with two peaks of emergence, one before primates and the other within 20 million years, and have different patterns within the two time periods. Genes with high expression in the human brain exhibit more conservation in their evolutionary origins than those in the immune system or random genes. On the other hand, genes expressed highly in the mouse brain tend to be either prevalent in primates or specific to mouse. Overall, this dissertation work charts the evolutionary history of a number of distinct primate and human genes, elucidates the potential association of ancient environmental factors with primate genomes, provides insights into the origin, conservation, and emergence of cancer genes in primates, as well as examines the conservation and loss of genes in different tissues. The hope is that these results will contribute to a greater understanding of the picture of gene evolution in primate and human genomes.