Metagenomic Data Analysis Using Extremely Randomized Tree Algorithm

dc.contributor.authorGupta, Surajen
dc.contributor.committeechairVikesland, Peter J.en
dc.contributor.committeememberEdwards, Marc A.en
dc.contributor.committeememberPruden, Amyen
dc.contributor.departmentCivil and Environmental Engineeringen
dc.date.accessioned2019-12-19T07:00:53Zen
dc.date.available2019-12-19T07:00:53Zen
dc.date.issued2018-06-26en
dc.description.abstractMany antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance.en
dc.description.abstractgeneralAntibiotic resistance is a natural and primordial process that predates the use of antibiotics in humans for disease treatment and occurs when a bacterium evolves to render the drugs, chemicals, or other agents meant to cure or prevent infections ineffective. Antibiotic resistance genes (ARGs) conferring resistance to a wide range of antibiotics have been widely found in rivers, surface waters, and hospital and farm wastewater discharges. Even treated wastewater from treatment plants is a concern as ARGs have frequently been detected in effluent discharges which poses questions on the effectiveness of treatment plants in removing ARGs. Since, these systems are interconnected there’s a possibility of dissemination and proliferation of ARGs which may pose serious threat to human health. Hence, it is desirable to perform comparative studies among these aquatic habitats. In previous studies, researchers compared different habitats which tells how similar and dissimilar the environments are in terms of ARGs present in these samples. While these analyses are important, it doesn’t tell which ARGs are unique or which ARGs are responsible to create those similarities or dissimilarities. This information is crucial in order to understand the water environments in terms of occurrence and presence of ARGs, the risk posed by them, and in identifying factors responsible for resistance gene proliferation. In this research, a methodology was developed which could capture such ARG variations in the environmental samples, using data analysis algorithms. Further the developed methodology was demonstrated using environmental samples such as wastewater samples from different geographical locations (to understand the spread of ARGs globally), untreated vs treated wastewater (to understand the effectiveness of treatment plants in removing ARGs), and different aquatic habitats (to understand the impact and spread of ARGs within these habitats). It was determined that the proposed method was efficient in differentiating samples and identifying discriminatory ARGs. The comparison between environmental samples showed that the samples from different locations have specific ARGs which were unique to wastewater samples from certain locations suggesting that site-specific factors can have certain effect in shaping the ARG profiles. Comparing untreated and treated samples revealed that treatment plants were able to remove certain ARGs but it was also observed v that some ARGs proliferated after the treatment irrespective of location and treatment plant specific variables. Analyzing different environments, the approach was able to identify certain ARGs which were specific to certain environments. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance. In essence, it is a valuable addition for improved surveillance of antibiotic resistance pollution and for the framing of best management practices.en
dc.description.degreeMSen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:16015en
dc.identifier.urihttp://hdl.handle.net/10919/96025en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectAntibiotic resistance genesen
dc.subjectARGsen
dc.subjectaquatic environmentsen
dc.subjectensemble learningen
dc.subjectextremely randomized treesen
dc.subjectwastewateren
dc.titleMetagenomic Data Analysis Using Extremely Randomized Tree Algorithmen
dc.typeThesisen
thesis.degree.disciplineCivil Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMSen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gupta_S_T_2018.pdf
Size:
1.73 MB
Format:
Adobe Portable Document Format

Collections