DeepARG+ - A Computational Pipeline for the Prediction of Antibiotic Resistance

dc.contributor.authorKulkarni, Rutwik Shashanken
dc.contributor.committeechairZhang, Liqingen
dc.contributor.committeememberPruden, Amyen
dc.contributor.committeememberKarpatne, Anujen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2022-12-09T07:00:39Zen
dc.date.available2022-12-09T07:00:39Zen
dc.date.issued2021-06-16en
dc.description.abstractThe global spread of antibiotic resistance warrants concerted surveillance in the clinic and in the environment. The widespread use of metagenomics for various studies has led to the generation of a large amount of sequencing data. Next-generation sequencing of microbial communities provides an opportunity for proactive detection of emerging antibiotic resistance genes (ARGs) from such data, but there are a limited number of pipelines that enable the identification of novel ARGs belonging to diverse antibiotic classes at present. Therefore, there is a need for the development of computational pipelines that can identify these putative novel ARGs. Such pipelines should be scalable, accessible and have good performance. To address this problem we develop a new method for predicting novel ARGs from genomic or metagenomic sequences, leveraging known ARGs of different resistance categories. Our method takes into account the physio-chemical properties that are intrinsic to different ARG families. Traditionally, new ARGs are predicted by making sequence alignment and calculating sequence similarity to existing ARG reference databases, which can be very time consuming. Here we introduce an alignment free and deep learning prediction method that incorporates both the primary protein sequences of ARGs and their physio-chemical properties. We compare our method with existing pipelines including hidden Markov model based Resfams and fARGene, sequence alignment and machine learning-based DeepARG-LS, and homology modelling based Pairwise Comparative Modelling. We also use our model to detect novel ARGs from various environments including human-gut, soil, activated sludge and the influent samples collected from a waste water treatment plant. Results show that our method achieves greater accuracy compared to existing models for the prediction of ARGs and enables the detection of putative novel ARGs, providing promising targets for experimental characterization to the scientific community.en
dc.description.abstractgeneralVarious bacteria contain genes that allow them to survive and grow even after the application of antibiotics. Such genes are called antibiotic resistance genes (ARGs). Each ARG has properties that make it resistant to a particular class of antibiotics. This class is called the resistance class/category of the gene. Antimicrobial resistance (AMR) is one of the biggest challenges to public health in recent times. It has been projected that a large number of deaths might occur due to AMR in the future. Therefore, there is a need for monitoring AMR in various environments. Currently, developed methods use the sequence's similarity with the existing database as a feature for ARG prediction. Some tools also use the 3D structure of proteins as a feature for ARG prediction. In this thesis, we develop a tool that incorporates both the sequence similarity and the structural information of proteins for ARG prediction. The structural information is encoded with physio-chemical properties (such as hydrophobicity, molecular weight etc.) of the amino acids. Our results show the efficacy of the pipeline in various environments. Results also show that our method achieves accuracy greater than existing models for the prediction of ARGs from metagenomic data. It also enables the detection of putative novel ARGs, providing promising targets for experimental characterization to the scientific community.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:31271en
dc.identifier.urihttp://hdl.handle.net/10919/112828en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectAntibiotic Resistanceen
dc.subjectDeep Learningen
dc.subjectMachine Learningen
dc.subjectProtein Structureen
dc.titleDeepARG+ - A Computational Pipeline for the Prediction of Antibiotic Resistanceen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kulkarni_RS_T_2021.pdf
Size:
1.44 MB
Format:
Adobe Portable Document Format

Collections