Browsing by Author "Gupta, Suraj"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadataLiang, Xiao; Akers, Kyle; Keenum, Ishi M.; Wind, Lauren L.; Gupta, Suraj; Chen, Chaoqi; Aldaihani, Reem; Pruden, Amy; Zhang, Liqing; Knowlton, Katharine F.; Xia, Kang; Heath, Lenwood S. (2021-03-10)Background Metagenomics is gaining attention as a powerful tool for identifying how agricultural management practices influence human and animal health, especially in terms of potential to contribute to the spread of antibiotic resistance. However, the ability to compare the distribution and prevalence of antibiotic resistance genes (ARGs) across multiple studies and environments is currently impossible without a complete re-analysis of published datasets. This challenge must be addressed for metagenomics to realize its potential for helping guide effective policy and practice measures relevant to agricultural ecosystems, for example, identifying critical control points for mitigating the spread of antibiotic resistance. Results Here we introduce AgroSeek, a centralized web-based system that provides computational tools for analysis and comparison of metagenomic data sets tailored specifically to researchers and other users in the agricultural sector interested in tracking and mitigating the spread of ARGs. AgroSeek draws from rich, user-provided metagenomic data and metadata to facilitate analysis, comparison, and prediction in a user-friendly fashion. Further, AgroSeek draws from publicly-contributed data sets to provide a point of comparison and context for data analysis. To incorporate metadata into our analysis and comparison procedures, we provide flexible metadata templates, including user-customized metadata attributes to facilitate data sharing, while maintaining the metadata in a comparable fashion for the broader user community and to support large-scale comparative and predictive analysis. Conclusion AgroSeek provides an easy-to-use tool for environmental metagenomic analysis and comparison, based on both gene annotations and associated metadata, with this initial demonstration focusing on control of antibiotic resistance in agricultural ecosystems. Agroseek creates a space for metagenomic data sharing and collaboration to assist policy makers, stakeholders, and the public in decision-making. AgroSeek is publicly-available at https://agroseek.cs.vt.edu/ .
- Antibiotic Resistance Characterization in Human Fecal and Environmental Resistomes using Metagenomics and Machine LearningGupta, Suraj (Virginia Tech, 2021-11-03)Antibiotic resistance is a global threat that can severely imperil public health. To curb the spread of antibiotic resistance, it is imperative that efforts commensurate with a “One Health” approach are undertaken. Given that interconnectivities among ecosystems can serve as conduits for the proliferation and dissemination of antibiotic resistance, it is increasingly being recognized that a robust global environmental surveillance framework is required to promote One Health. The ideal aim would be to develop approaches that inform global distribution of antibiotic resistance, help prioritize monitoring targets, present robust data analysis frameworks to profile resistance, and ultimately help build strategies to curb the dissemination of antibiotic resistance. The work described in this dissertation was aimed at evaluating and developing different data analysis paradigms and their applications in investigating and characterizing antibiotic resistance across different resistomes. The applications presented in Chapter 2 illustrate challenges associated with various environmental data types (especially metagenomics data) and present a path to advance incorporation of data analytics approaches in Environmental Science and Engineering research and applications. Chapter 3 presents a novel approach, ExtrARG, that identifies discriminatory ARGs among resistomes based on factors of interest. The results in Chapter 4 provide insight into the global distribution of ARGs across human fecal and sewage resistomes across different socioeconomics. Chapter 5 demonstrates a data analysis paradigm using machine learning algorithms that helps bridge the gap between information obtained via culturing and metagenomic sequencing. Lastly, the results of Chapter 6 illustrates the contribution of phages to antibiotic resistance. Overall, the findings provide guidance and approaches for profiling antibiotic resistance using metagenomics and machine learning. The results reported further expand the knowledge on the distribution of antibiotic resistance across different resistomes.
- ARGem: a new metagenomics pipeline for antibiotic resistance genes: metadata, analysis, and visualizationLiang, Xiao; Zhang, Jingyi; Kim, Yoonjin; Ho, Josh; Liu, Kevin; Keenum, Ishi M.; Gupta, Suraj; Davis, Benjamin; Hepp, Shannon L.; Zhang, Liqing; Xia, Kang; Knowlton, Katharine F.; Liao, Jingqiu; Vikesland, Peter J.; Pruden, Amy; Heath, Lenwood S. (Frontiers, 2023-09-15)Antibiotic resistance is of crucial interest to both human and animal medicine. It has been recognized that increased environmental monitoring of antibiotic resistance is needed. Metagenomic DNA sequencing is becoming an attractive method to profile antibiotic resistance genes (ARGs), including a special focus on pathogens. A number of computational pipelines are available and under development to support environmental ARG monitoring; the pipeline we present here is promising for general adoption for the purpose of harmonized global monitoring. Specifically, ARGem is a user-friendly pipeline that provides full-service analysis, from the initial DNA short reads to the final visualization of results. The capture of extensive metadata is also facilitated to support comparability across projects and broader monitoring goals. The ARGem pipeline offers efficient analysis of a modest number of samples along with affordable computational components, though the throughput could be increased through cloud resources, based on the user’s configuration. The pipeline components were carefully assessed and selected to satisfy tradeoffs, balancing efficiency and flexibility. It was essential to provide a step to perform short read assembly in a reasonable time frame to ensure accurate annotation of identified ARGs. Comprehensive ARG and mobile genetic element databases are included in ARGem for annotation support. ARGem further includes an expandable set of analysis tools that include statistical and network analysis and supports various useful visualization techniques, including Cytoscape visualization of co-occurrence and correlation networks. The performance and flexibility of the ARGem pipeline is demonstrated with analysis of aquatic metagenomes. The pipeline is freely available at https://github.com/xlxlxlx/ARGem.
- Demonstrating a Comprehensive Wastewater-Based Surveillance Approach That Differentiates Globally Sourced ResistomesPrieto Riquelme, Maria Virginia; Garner, Emily; Gupta, Suraj; Metch, Jake; Zhu, Ni; Blair, Matthew F.; Arango-Argoty, Gustavo; Maile-Moskowitz, Ayella; Li, An-dong; Flach, Carl-Fredrik; Aga, Diana S.; Nambi, Indumathi M.; Larsson, D. G. Joakim; Bürgmann, Helmut; Zhang, Tong; Pruden, Amy; Vikesland, Peter J. (ACS, 2022-06-27)Wastewater-based surveillance (WBS) for disease monitoring is highly promising but requires consistent methodologies that incorporate predetermined objectives, targets, and metrics. Herein, we describe a comprehensive metagenomics-based approach for global surveillance of antibiotic resistance in sewage that enables assessment of 1) which antibiotic resistance genes (ARGs) are shared across regions/communities; 2) which ARGs are discriminatory; and 3) factors associated with overall trends in ARGs, such as antibiotic concentrations. Across an internationally sourced transect of sewage samples collected using a centralized, standardized protocol, ARG relative abundances (16S rRNA gene-normalized) were highest in Hong Kong and India and lowest in Sweden and Switzerland, reflecting national policy, measured antibiotic concentrations, and metal resistance genes. Asian versus European/US resistomes were distinct, with macrolide-lincosamide-streptogramin, phenicol, quinolone, and tetracycline versus multidrug resistance ARGs being discriminatory, respectively. Regional trends in measured antibiotic concentrations differed from trends expected from public sales data. This could reflect unaccounted uses, captured only by the WBS approach. If properly benchmarked, antibiotic WBS might complement public sales and consumption statistics in the future. The WBS approach defined herein demonstrates multisite comparability and sensitivity to local/regional factors.
- Evaluation of Metagenomic-Enabled Antibiotic Resistance Surveillance at a Conventional Wastewater Treatment PlantMajeed, Haniyyah J.; Riquelme, Maria V.; Davis, Benjamin C.; Gupta, Suraj; Angeles, Luisa F.; Aga, Diana S.; Garner, Emily; Pruden, Amy; Vikesland, Peter J. (Frontiers, 2021-05-13)Wastewater treatment plants (WWTPs) receive a confluence of sewage containing antimicrobials, antibiotic resistant bacteria, antibiotic resistance genes (ARGs), and pathogens and thus are a key point of interest for antibiotic resistance surveillance. WWTP monitoring has the potential to inform with respect to the antibiotic resistance status of the community served as well as the potential for ARGs to escape treatment. However, there is lack of agreement regarding suitable sampling frequencies and monitoring targets to facilitate comparison within and among individual WWTPs. The objective of this study was to comprehensively evaluate patterns in metagenomic-derived indicators of antibiotic resistance through various stages of treatment at a conventional WWTP for the purpose of informing local monitoring approaches that are also informative for global comparison. Relative abundance of total ARGs decreased by ∼50% from the influent to the effluent, with each sampling location defined by a unique resistome (i.e., total ARG) composition. However, 90% of the ARGs found in the effluent were also detected in the influent, while the effluent ARG-pathogen taxonomic linkage patterns identified in assembled metagenomes were more similar to patterns in regional clinical surveillance data than the patterns identified in the influent. Analysis of core and discriminatory resistomes and general ARG trends across the eight sampling events (i.e., tendency to be removed, increase, decrease, or be found in the effluent only), along with quantification of ARGs of clinical concern, aided in identifying candidate ARGs for surveillance. Relative resistome risk characterization further provided a comprehensive metric for predicting the relative mobility of ARGs and likelihood of being carried in pathogens and can help to prioritize where to focus future monitoring and mitigation. Most antibiotics that were subject to regional resistance testing were also found in the WWTP, with the total antibiotic load decreasing by ∼40–50%, but no strong correlations were found between antibiotics and corresponding ARGs. Overall, this study provides insight into how metagenomic data can be collected and analyzed for surveillance of antibiotic resistance at WWTPs, suggesting that effluent is a beneficial monitoring point with relevance both to the local clinical condition and for assessing efficacy of wastewater treatment in reducing risk of disseminating antibiotic resistance.
- Identification of discriminatory antibiotic resistance genes among environmental resistomes using extremely randomized tree algorithmGupta, Suraj; Arango-Argoty, Gustavo; Zhang, Liqing; Pruden, Amy; Vikesland, Peter J. (2019-08-29)Background The interconnectivities of built and natural environments can serve as conduits for the proliferation and dissemination of antibiotic resistance genes (ARGs). Several studies have compared the broad spectrum of ARGs (i.e., “resistomes”) in various environmental compartments, but there is a need to identify unique ARG occurrence patterns (i.e., “discriminatory ARGs”), characteristic of each environment. Such an approach will help to identify factors influencing ARG proliferation, facilitate development of relative comparisons of the ARGs distinguishing various environments, and help pave the way towards ranking environments based on their likelihood of contributing to the spread of clinically relevant antibiotic resistance. Here we formulate and demonstrate an approach using an extremely randomized tree (ERT) algorithm combined with a Bayesian optimization technique to capture ARG variability in environmental samples and identify the discriminatory ARGs. The potential of ERT for identifying discriminatory ARGs was first evaluated using in silico metagenomic datasets (simulated metagenomic Illumina sequencing data) with known variability. The application of ERT was then demonstrated through analyses using publicly available and in-house metagenomic datasets associated with (1) different aquatic habitats (e.g., river, wastewater influent, hospital effluent, and dairy farm effluent) to compare resistomes between distinct environments and (2) different river samples (i.e., Amazon, Kalamas, and Cam Rivers) to compare resistome characteristics of similar environments. Results The approach was found to readily identify discriminatory ARGs in the in silico datasets. Also, it was not found to be biased towards ARGs with high relative abundance, which is a common limitation of feature projection methods, and instead only captured those ARGs that elicited significant profiles. Analyses of publicly available metagenomic datasets further demonstrated that the ERT approach can effectively differentiate real-world environmental samples and identify discriminatory ARGs based on pre-defined categorizing schemes. Conclusions Here a new methodology was formulated to characterize and compare variances in ARG profiles between metagenomic data sets derived from similar/dissimilar environments. Specifically, identification of discriminatory ARGs among samples representing various environments can be identified based on factors of interest. The methodology could prove to be a particularly useful tool for ARG surveillance and the assessment of the effectiveness of strategies for mitigating the spread of antibiotic resistance. The python package is hosted in the Git repository: https://github.com/gaarangoa/ExtrARG
- Integrated Metagenomic Assessment of Multiple Pre-harvest Control Points on Lettuce Resistomes at Field-ScaleWind, Lauren L.; Keenum, Ishi M.; Gupta, Suraj; Ray, Partha P.; Knowlton, Katharine F.; Ponder, Monica A.; Hession, W. Cully; Pruden, Amy; Krometis, Leigh-Anne H. (Frontiers, 2021-07-09)An integrated understanding of factors influencing the occurrence, distribution, and fate of antibiotic resistance genes (ARGs) in vegetable production systems is needed to inform the design and development of strategies for mitigating the potential for antibiotic resistance propagation in the food chain. The goal of the present study was to holistically track antibiotic resistance and associated microbiomes at three distinct pre-harvest control points in an agroecosystem in order to identify the potential impacts of key agricultural management strategies. Samples were collected over the course of a single growing season (67 days) from field-scale plots amended with various organic and inorganic amendments at agronomic rates. Dairy-derived manure and compost amendment samples (n = 14), soil samples (n = 27), and lettuce samples (n = 12) were analyzed via shotgun metagenomics to assess multiple pre-harvest factors as hypothetical control points that shape lettuce resistomes. Pre-harvest factors of interest included manure collection during/post antibiotic use, manure composting, and soil amended with organic (stockpiled manure/compost) versus chemical fertilizer. Microbial community resistome and taxonomic compositions were unique from amendment to soil to lettuce surface according to dissimilarity analysis. The highest resistome alpha diversity (i.e., unique ARGs, n = 642) was detected in amendment samples prior to soil application, while the composted manure had the lowest total ARG relative abundance (i.e., 16S rRNA gene-normalized). Regardless of amendment type, soils acted as an apparent ecological buffer, i.e., soil resistome and taxonomic profiles returned to background conditions 67 d-post amendment application. Effects of amendment conditions surprisingly re-emerged in lettuce phyllosphere resistomes, with the highest total ARG relative abundances recovered on the surface of lettuce plants grown in organically-fertilized soils (i.e., compost- and manure-amended soils). Co-occurrence analysis identified 55 unique ARGs found both in the soil amendments and on lettuce surfaces. Among these, arnA and pmrF were the most abundant ARGs co-occurring with mobile genetic elements (MGE). Other prominent ARG-MGE co-occurrences throughout this pre-harvest lettuce production chain included: TetM to transposon (Clostridiodies difficile) in the manure amendment and TriC to plasmid (Ralstonia solanacearum) on the lettuce surfaces. This suggests that, even with imposing manure management and post-amendment wait periods in agricultural systems, ARGs originating from manure can still be found on crop surfaces. This study demonstrates a comprehensive approach to identifying key control points for the propagation of ARGs in vegetable production systems, identifying potential ARG-MGE combinations that could inform future surveillance. The findings suggest that additional pre-harvest and potentially post-harvest interventions may be warranted to minimize risk of propagating antibiotic resistance in the food chain.
- Integration and Implementation (INT) CS 5604 F2020Hicks, Alexander; Thazhath, Mohit; Gupta, Suraj; Long, Xingyu; Poland, Cherie; Hsieh, Hsinhan; Mahajan, Yash (Virginia Tech, 2020-12-18)The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish another major goal, supporting modern search and browse capabilities for a large collection of tweets from the Twitter social media platform, web pages, and electronic theses and dissertations (ETDs). The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers in a pipelined fashion, whether in the cluster or on virtual machines, for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system supports text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by five teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. The teams on this project include three collection management groups -- Electronic Theses and Dissertations (ETD), Tweets (TWT), and Web-Pages (WP) -- as well as the Front-end (FE) group and the Integration (INT) group to help provide the overarching structure for the application. This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. Each team will have several of these containers set up in a pipeline formation to allow scaling and extension of the current system. The INT team also contributes to a cross-team effort for exploring the use of Elasticsearch and its internally associated database. The INT team administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and the Ceph filesystem. During formative stages of development, the INT team also has a role in guiding team evaluations of prospective container components and workflows. The INT team is responsible for the overall project architecture and facilitating the tools and tutorials that assist the other teams in deploying containers in a development environment according to mutual specifications agreed upon with each team. The INT team maintains the status of the Kubernetes cluster, deploying new containers and pods as needed by the collection management teams as they expand their workflows. This team is responsible for utilizing a continuous integration process to update existing containers. During the development stage the INT team collaborates specifically with the collection management teams to create the pipeline for the ingestion and processing of new collection documents, crossing services between those teams as needed. The INT team develops a reasoner engine to construct workflows with information goal as input, which are then programmatically authored, scheduled, and monitored using Apache Airflow. The INT team is responsible for the flow, management, and logging of system performance data and making any adjustments necessary based on the analysis of testing results. The INT team has established a Gitlab repository for archival code related to the entire project and has provided the other groups with the documentation to deposit their code in the repository. This repository will be expanded using Gitlab CI in order to provide continuous integration and testing once it is available. Finally, the INT team will provide a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. The INT team will archive this distribution on the Virginia Tech Docker Container Registry and deploy it on the Virginia Tech CS Cloud. The INT-2020 team owes a sincere debt of gratitude to the work of the INT-2019 team. This is a very large undertaking and the wrangling of all of the products and processes would not have been possible without their guidance in both direct and written form. We have relied heavily on the foundation they and their predecessors have provided for us. We continue their work with systematic improvements, but also want to acknowledge their efforts Ibid. Without them, our progress to date would not have been possible.
- Metagenomic Data Analysis Using Extremely Randomized Tree AlgorithmGupta, Suraj (Virginia Tech, 2018-06-26)Many antibiotic resistance genes (ARGs) conferring resistance to a broad range of antibiotics have often been detected in aquatic environments such as untreated and treated wastewater, river and surface water. ARG proliferation in the aquatic environment could depend upon various factors such as geospatial variations, the type of aquatic body, and the type of wastewater (untreated or treated) discharged into these aquatic environments. Likewise, the strong interconnectivity of aquatic systems may accelerate the spread of ARGs through them. Hence a comparative and a holistic study of different aquatic environments is required to appropriately comprehend the problem of antibiotic resistance. Many studies approach this issue using molecular techniques such as metagenomic sequencing and metagenomic data analysis. Such analyses compare the broad spectrum of ARGs in water and wastewater samples, but these studies use comparisons which are limited to similarity/dissimilarity analyses. However, in such analyses, the discriminatory ARGs (associated ARGs driving such similarity/ dissimilarity measures) may not be identified. Consequentially, the reason which drives the dissimilarities among the samples would not be identified and the reason for antibiotic resistance proliferation may not be clearly understood. In this study, an effective methodology, using Extremely Randomized Trees (ET) Algorithm, was formulated and demonstrated to capture such ARG variations and identify discriminatory ARGs among environmentally derived metagenomes. In this study, data were grouped by: geographic location (to understand the spread of ARGs globally), untreated vs. treated wastewater (to see the effectiveness of WWTPs in removing ARGs), and different aquatic habitats (to understand the impact and spread within aquatic habitats). It was observed that there were certain ARGs which were specific to wastewater samples from certain locations suggesting that site-specific factors can have a certain effect in shaping ARG profiles. Comparing untreated and treated wastewater samples from different WWTPs revealed that biological treatments have a definite impact on shaping the ARG profile. While there were several ARGs which got removed after the treatment, there were some ARGs which showed an increase in relative abundance irrespective of location and treatment plant specific variables. On comparing different aquatic environments, the algorithm identified ARGs which were specific to certain environments. The algorithm captured certain ARGs which were specific to hospital discharges when compared with other aquatic environments. It was determined that the proposed method was efficient in identifying the discriminatory ARGs which could classify the samples according to their groups. Further, it was also effective in capturing low-level variations which generally get over-shadowed in the analysis due to highly abundant genes. The results of this study suggest that the proposed method is an effective method for comprehensive analyses and can provide valuable information to better understand antibiotic resistance.
- mobileOG-db: a Manually Curated Database of Protein Families Mediating the Life Cycle of Bacterial Mobile Genetic ElementsBrown, Connor L.; Mullet, James; Hindi, Fadi; Stoll, James E.; Gupta, Suraj; Choi, Minyoung; Keenum, Ishi M.; Vikesland, Peter J.; Pruden, Amy; Zhang, Liqing (American Society for Microbiology, 2022-08-29)Bacterial mobile genetic elements (MGEs) encode functional modules that perform both core and accessory functions for the element, the latter of which are often only transiently associated with the element. The presence of these accessory genes, which are often close homologs to primarily immobile genes, incur high rates of false positives and, therefore, limits the usability of these databases for MGE annotation. To overcome this limitation, we analyzed 10,776,849 protein sequences derived from eight MGE databases to compile a comprehensive set of 6,140 manually curated protein families that are linked to the “life cycle” (integration/excision, replication/recombination/repair, transfer, stability/transfer/defense, and phage-specific processes) of plasmids, phages, integrative, transposable, and conjugative elements. We overlay experimental information where available to create a tiered annotation scheme of high-quality annotations and annotations inferred exclusively through bioinformatic evidence. We additionally provide an MGE-class label for each entry (e.g., plasmid or integrative element), and assign to each entry a major and minor category. The resulting database, mobileOG-db (for mobile orthologous groups), comprises over 700,000 deduplicated sequences encompassing five major mobileOG categories and more than 50 minor categories, providing a structured language and interpretable basis for an array of MGE-centered analyses. mobileOG-db can be accessed at mobileogdb.flsi.cloud.vt.edu/, where users can select, refine, and analyze custom subsets of the dynamic mobilome.