Show simple item record

dc.contributor.authorArango Argoty, Gustavo Alonsoen
dc.date.accessioned2019-04-16T08:00:50Zen
dc.date.available2019-04-16T08:00:50Zen
dc.date.issued2019-04-15en
dc.identifier.othervt_gsexam:19779en
dc.identifier.urihttp://hdl.handle.net/10919/88987en
dc.description.abstractMetagenomics has become a reliable tool for the analysis of the microbial diversity and the molecular mechanisms carried out by microbial communities. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. Interpretation of specific information from metagenomic data is especially a challenge for environmental samples as current annotation systems only offer broad classification of microbial diversity and function. Therefore, I developed MetaStorm, a public web-service that facilitates customization of computational analysis for metagenomic data. The identification of antibiotic resistance genes (ARGs) from metagenomic data is carried out by searches against curated databases producing a high rate of false negatives. Thus, I developed DeepARG, a deep learning approach that uses the distribution of sequence alignments to predict over 30 antibiotic resistance categories with a high accuracy. Curation of ARGs is a labor intensive process where errors can be easily propagated. Thus, I developed ARGminer, a web platform dedicated to the annotation and inspection of ARGs by using crowdsourcing. Effective environmental monitoring tools should ideally capture not only ARGs, but also mobile genetic elements and indicators of co-selective forces, such as metal resistance genes. Here, I introduce NanoARG, an online computational resource that takes advantage of the long reads produced by nanopore sequencing technology to provide insights into mobility, co-selection, and pathogenicity. Sequence alignment has been one of the preferred methods for analyzing metagenomic data. However, it is slow and requires high computing resources. Therefore, I developed MetaMLP, a machine learning approach that uses a novel representation of protein sequences to perform classifications over protein functions. The method is accurate, is able to identify a larger number of hits compared to sequence alignments, and is >50 times faster than sequence alignment techniques.en
dc.format.mediumETDen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectbioinformaticsen
dc.subjectmetagenomicsen
dc.subjectantibiotic resistanceen
dc.subjectMachine learningen
dc.titleComputational Tools for Annotating Antibiotic Resistance in Metagenomic Dataen
dc.typeDissertationen
dc.contributor.departmentComputer Scienceen
dc.description.degreeDoctor of Philosophyen
thesis.degree.nameDoctor of Philosophyen
thesis.degree.leveldoctoralen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.disciplineComputer Science and Applicationsen
dc.contributor.committeechairZhang, Liqingen
dc.contributor.committeememberHeath, Lenwood S.en
dc.contributor.committeememberXiao, Weidongen
dc.contributor.committeememberPruden, Amyen
dc.contributor.committeememberMeng, Naen
dc.description.abstractgeneralAntimicrobial resistance (AMR) is one of the biggest threats to human public health. It has been estimated that the number of deaths caused by AMR will surpass the ones caused by cancer on 2050. The seriousness of these projections requires urgent actions to understand and control the spread of AMR. In the last few years, metagenomics has stand out as a reliable tool for the analysis of the microbial diversity and the AMR. By the use of next generation sequencing, metagenomic studies can generate millions of short sequencing reads that are processed by computational tools. However, with the rapid adoption of metagenomics, a large amount of data has been generated. This situation requires the development of computational tools and pipelines to manage the data scalability, accessibility, and performance. In this thesis, several strategies varying from command line, web-based platforms to machine learning have been developed to address these computational challenges. In particular, by the development of computational pipelines to process metagenomics data in the cloud and distributed systems, the development of machine learning and deep learning tools to ease the computational cost of detecting antibiotic resistance genes in metagenomic data, and the integration of crowdsourcing as a way to curate and validate antibiotic resistance genes.en


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record