Kulkarni, Rutwik Shashank2022-12-092022-12-092021-06-16vt_gsexam:31271http://hdl.handle.net/10919/112828The global spread of antibiotic resistance warrants concerted surveillance in the clinic and in the environment. The widespread use of metagenomics for various studies has led to the generation of a large amount of sequencing data. Next-generation sequencing of microbial communities provides an opportunity for proactive detection of emerging antibiotic resistance genes (ARGs) from such data, but there are a limited number of pipelines that enable the identification of novel ARGs belonging to diverse antibiotic classes at present. Therefore, there is a need for the development of computational pipelines that can identify these putative novel ARGs. Such pipelines should be scalable, accessible and have good performance. To address this problem we develop a new method for predicting novel ARGs from genomic or metagenomic sequences, leveraging known ARGs of different resistance categories. Our method takes into account the physio-chemical properties that are intrinsic to different ARG families. Traditionally, new ARGs are predicted by making sequence alignment and calculating sequence similarity to existing ARG reference databases, which can be very time consuming. Here we introduce an alignment free and deep learning prediction method that incorporates both the primary protein sequences of ARGs and their physio-chemical properties. We compare our method with existing pipelines including hidden Markov model based Resfams and fARGene, sequence alignment and machine learning-based DeepARG-LS, and homology modelling based Pairwise Comparative Modelling. We also use our model to detect novel ARGs from various environments including human-gut, soil, activated sludge and the influent samples collected from a waste water treatment plant. Results show that our method achieves greater accuracy compared to existing models for the prediction of ARGs and enables the detection of putative novel ARGs, providing promising targets for experimental characterization to the scientific community.ETDIn CopyrightAntibiotic ResistanceDeep LearningMachine LearningProtein StructureDeepARG+ - A Computational Pipeline for the Prediction of Antibiotic ResistanceThesis