Computational Tools for Annotating Antibiotic Resistance in Metagenomic Data

Arango Argoty, Gustavo Alonso

Computational Tools for Annotating Antibiotic Resistance in Metagenomic Data

dc.contributor.author	Arango Argoty, Gustavo Alonso	en
dc.contributor.committeechair	Zhang, Liqing	en
dc.contributor.committeemember	Heath, Lenwood S.	en
dc.contributor.committeemember	Xiao, Weidong	en
dc.contributor.committeemember	Pruden, Amy	en
dc.contributor.committeemember	Meng, Na	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2019-04-16T08:00:50Z	en
dc.date.available	2019-04-16T08:00:50Z	en
dc.date.issued	2019-04-15	en
dc.description.abstract	Metagenomics has become a reliable tool for the analysis of the microbial diversity and the molecular mechanisms carried out by microbial communities. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. Interpretation of specific information from metagenomic data is especially a challenge for environmental samples as current annotation systems only offer broad classification of microbial diversity and function. Therefore, I developed MetaStorm, a public web-service that facilitates customization of computational analysis for metagenomic data. The identification of antibiotic resistance genes (ARGs) from metagenomic data is carried out by searches against curated databases producing a high rate of false negatives. Thus, I developed DeepARG, a deep learning approach that uses the distribution of sequence alignments to predict over 30 antibiotic resistance categories with a high accuracy. Curation of ARGs is a labor intensive process where errors can be easily propagated. Thus, I developed ARGminer, a web platform dedicated to the annotation and inspection of ARGs by using crowdsourcing. Effective environmental monitoring tools should ideally capture not only ARGs, but also mobile genetic elements and indicators of co-selective forces, such as metal resistance genes. Here, I introduce NanoARG, an online computational resource that takes advantage of the long reads produced by nanopore sequencing technology to provide insights into mobility, co-selection, and pathogenicity. Sequence alignment has been one of the preferred methods for analyzing metagenomic data. However, it is slow and requires high computing resources. Therefore, I developed MetaMLP, a machine learning approach that uses a novel representation of protein sequences to perform classifications over protein functions. The method is accurate, is able to identify a larger number of hits compared to sequence alignments, and is >50 times faster than sequence alignment techniques.	en
dc.description.abstractgeneral	Antimicrobial resistance (AMR) is one of the biggest threats to human public health. It has been estimated that the number of deaths caused by AMR will surpass the ones caused by cancer on 2050. The seriousness of these projections requires urgent actions to understand and control the spread of AMR. In the last few years, metagenomics has stand out as a reliable tool for the analysis of the microbial diversity and the AMR. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics, a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. In particular, by the development of computational pipelines to process metagenomics data in the cloud and distributed systems, the development of machine learning and deep learning tools to ease the computational cost of detecting antibiotic resistance genes in metagenomic data, and the integration of crowdsourcing as a way to curate and validate antibiotic resistance genes.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:19779	en
dc.identifier.uri	http://hdl.handle.net/10919/88987	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	bioinformatics	en
dc.subject	metagenomics	en
dc.subject	antibiotic resistance	en
dc.subject	Machine learning	en
dc.title	Computational Tools for Annotating Antibiotic Resistance in Metagenomic Data	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Arango_Argoty_GA_D_2019.pdf
Size:: 5.68 MB
Format:: Adobe Portable Document Format

Download

Name:: Arango_Argoty_GA_D_2019_support_1.pdf
Size:: 539.83 KB
Format:: Adobe Portable Document Format
Description:: Supporting documents

Download

Collections

Doctoral Dissertations