Browsing by Author "Shukla, Maulik"
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- Antimicrobial Resistance Prediction in PATRIC and RASTDavis, James J.; Boisvert, Sebastien; Brettin, Thomas; Kenyon, Ronald W.; Mao, Chunhong; Olson, Robert D.; Overbeek, Ross; Santerre, John; Shukla, Maulik; Wattam, Alice R.; Will, Rebecca; Xia, Fangfang; Stevens, Rick L. (Springer Nature, 2016-06-14)The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.
- Comparative Genomics of Early-Diverging Brucella Strains Reveals a Novel Lipopolysaccharide Biosynthesis PathwayWattam, Alice R.; Inzana, Thomas J.; Williams, Kelly P.; Mane, Shrinivasrao P.; Shukla, Maulik; Almeida, Nalvo F.; Dickerman, Allan W.; Mason, Steven; Moriyon, Ignacio; O'Callaghan, David; Whatmore, Adrian M.; Sobral, Bruno; Tiller, Rebekah V.; Hoffmaster, Alex R.; Frace, Michael A.; De Castro, Cristina; Molinaro, Antonio; Boyle, Stephen M.; De, Barun K.; Setubal, Joao C. (American Society for Microbiology, 2012-11)Brucella species are Gram-negative bacteria that infect mammals. Recently, two unusual strains (Brucella inopinata BO1T and B. inopinata-like BO2) have been isolated from human patients, and their similarity to some atypical brucellae isolated from Australian native rodent species was noted. Here we present a phylogenomic analysis of the draft genome sequences of BO1T and BO2 and of the Australian rodent strains 83-13 and NF2653 that shows that they form two groups well separated from the other sequenced Brucella spp. Several important differences were noted. Both BO1T and BO2 did not agglutinate significantly when live or inactivated cells were exposed to monospecific A and M antisera against O-side chain sugars composed of N-formyl-perosamine. While BO1T maintained the genes required to synthesize a typical Brucella O-antigen, BO2 lacked many of these genes but still produced a smooth LPS (lipopolysaccharide). Most missing genes were found in the wbk region involved in O-antigen synthesis in classic smooth Brucella spp. In their place, BO2 carries four genes that other bacteria use for making a rhamnose-based O-antigen. Electrophoretic, immunoblot, and chemical analyses showed that BO2 carries an antigenically different O-antigen made of repeating hexose-rich oligosaccharide units that made the LPS water-soluble, which contrasts with the homopolymeric O-antigen of other smooth brucellae that have a phenol-soluble LPS. The results demonstrate the existence of a group of early-diverging brucellae with traits that depart significantly from those of the Brucella species described thus far. IMPORTANCE This report examines differences between genomes from four new Brucella strains and those from the classic Brucella spp. Our results show that the four new strains are outliers with respect to the previously known Brucella strains and yet are part of the genus, forming two new clades. The analysis revealed important information about the evolution and survival mechanisms of Brucella species, helping reshape our knowledge of this important zoonotic pathogen. One discovery of special importance is that one of the strains, BO2, produces an O-antigen distinct from any that has been seen in any other Brucella isolates to date.
- GeneSieve: A Probe Selection Strategy for cDNA MicroarraysShukla, Maulik (Virginia Tech, 2004-08-24)The DNA microarray is a powerful tool to study expression levels of thousands of genes simultaneously. Often, cDNA libraries representing expressed genes of an organism are available, along with expressed sequence tags (ESTs). ESTs are widely used as the probes for microarrays. Designing custom microarrays, rich in genes relevant to the experimental objectives, requires selection of probes based on their sequence. We have designed a probe selection method, called GeneSieve, to select EST probes for custom microarrays. To assign annotations to the ESTs, we cluster them into contigs using PHRAP. The larger contig sequences are then used for similarity search against known proteins in model organism such as Arabidopsis thaliana. We have designed three different methods to assign annotations to the contigs: bidirectional hits (BH), bidirectional best hits (BBH), and unidirectional best hits (UBH). We apply these methods to pine and potato EST sets. Results show that the UBH method assigns unambiguous annotations to a large fraction of contigs in an organism. Hence, we use UBH to assign annotations to ESTs in GeneSieve. To select a single EST from a contig, GeneSieve assigns a quality score to each EST based on its protein homology (PH), cross hybridization (CH), and relative length (RL). We use this quality score to rank ESTs according to seven different measures: length, 3' proximity, 5' proximity, protein homology, cross hybridization, relative length, and overall quality score. Results for pine and potato EST sets indicate that EST probes selected by quality score are relatively long and give better values for protein homology and cross hybridization. Results of the GeneSieve protocol are stored in a database and linked with sequence databases and known functional category schemes such as MIPS and GO. The database is made available via a web interface. A biologist is able to select large number of EST probes based on annotations or functional categories in a quick and easy way.
- Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource CenterWattam, Alice R.; Davis, James J.; Assaf, Rida; Boisvert, Sebastien; Brettin, Thomas; Bun, Christopher; Conrad, Neal; Dietrich, Emily M.; Disz, Terry L.; Gabbard, Joseph L.; Gerdes, Svetlana; Henry, Christopher S.; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olsen, Gary J.; Murphy-Olson, Daniel E.; Olson, Robert D.; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Warren, Andrew S.; Xia, Fangfang; Yoo, Hyunseung; Stevens, Rick L. (2017-01-04)The Pathosystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center (https://www.patricbrc.org). Recent changes to PATRIC include a redesign of the web interface and some new services that provide users with a platform that takes them from raw reads to an integrated analysis experience. The redesigned interface allows researchers direct access to tools and data, and the emphasis has changed to user- created genome-groups, with detailed summaries and views of the data that researchers have selected. Perhaps the biggest change has been the enhanced capability for researchers to analyze their private data and compare it to the available public data. Researchers can assemble their raw sequence reads and annotate the contigs using RASTtk. PATRIC also provides services for RNA-Seq, variation, model reconstruction and differential expression analysis, all delivered through an updated private workspace. Private data can be compared by `virtual integration' to any of PATRIC's public data. The number of genomes available for comparison in PATRIC has expanded to over 80 000, with a special emphasis on genomes with antimicrobial resistance data. PATRIC uses this data to improve both subsystem annotation and k-mer classification, and tags new genomes as having signatures that indicate susceptibility or resistance to specific antibiotics.
- The PATRIC Bioinformatics Resource Center: expanding data and analysis capabilitiesDavis, James J.; Wattam, Alice R.; Aziz, Ramy K.; Brettin, Thomas; Butler, Ralph; Butler, Rory M.; Chlenski, Philippe; Conrad, Neal; Dickerman, Allan W.; Dietrich, Emily M.; Gabbard, Joseph L.; Gerdes, Svetlana; Guard, Andrew; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Murphy-Olson, Daniel E.; Nguyen, Marcus; Nordberg, Eric K.; Olsen, Gary J.; Olson, Robert D.; Overbeek, Jamie C.; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomas, Chris; VanOeffelen, Margo; Vonstein, Veronika; Warren, Andrew S.; Xia, Fangfang; Xie, Dawen; Yoo, Hyunseung; Stevens, Rick L. (2020-01-08)The PathoSystems Resource Integration Center (PATRIC) is the bacterial Bioinformatics Resource Center funded by the National Institute of Allergy and Infectious Diseases (https://www.patricbrc.org). PATRIC supports bioinformatic analyses of all bacteria with a special emphasis on pathogens, offering a rich comparative analysis environment that provides users with access to over 250 000 uniformly annotated and publicly available genomes with curated metadata. PATRIC offers web-based visualization and comparative analysis tools, a private workspace in which users can analyze their own data in the context of the public collections, services that streamline complex bioinformatic workflows and command-line tools for bulk data analysis. Over the past several years, as genomic and other omics-related experiments have become more cost-effective and widespread, we have observed considerable growth in the usage of and demand for easy-to-use, publicly available bioinformatic tools and services. Here we report the recent updates to the PATRIC resource, including new web-based comparative analysis tools, eight new services and the release of a command-line interface to access, query and analyze data.
- PATRIC, the bacterial bioinformatics database and analysis resourceWattam, Alice R.; Abraham, David; Dalay, Oral; Disz, Terry L.; Driscoll, Timothy; Gabbard, Joseph L.; Gillespie, Joseph J.; Gough, Roger; Hix, Deborah; Kenyon, Ronald W.; Machi, Dustin; Mao, Chunhong; Nordberg, Eric K.; Olson, Robert; Overbeek, Ross; Pusch, Gordon D.; Shukla, Maulik; Schulman, Julie; Stevens, Rick L.; Sullivan, Daniel E.; Vonstein, Veronika; Warren, Andrew S.; Will, Rebecca; Wilson, Meredith J. C.; Yoo, Hyunseung; Zhang, Chengdong; Zhang, Yan; Sobral, Bruno (2014-01)The Pathosystems Resource Integration Center (PATRIC) is the all-bacterial Bioinformatics Resource Center (BRC) (http://www.patricbrc.org). A joint effort by two of the original National Institute of Allergy and Infectious Diseases-funded BRCs, PATRIC provides researchers with an online resource that stores and integrates a variety of data types [e. g. genomics, transcriptomics, protein-protein interactions (PPIs), three-dimensional protein structures and sequence typing data] and associated metadata. Datatypes are summarized for individual genomes and across taxonomic levels. All genomes in PATRIC, currently more than 10 000, are consistently annotated using RAST, the Rapid Annotations using Subsystems Technology. Summaries of different data types are also provided for individual genes, where comparisons of different annotations are available, and also include available transcriptomic data. PATRIC provides a variety of ways for researchers to find data of interest and a private workspace where they can store both genomic and gene associations, and their own private data. Both private and public data can be analyzed together using a suite of tools to perform comparative genomic or transcriptomic analysis. PATRIC also includes integrated information related to disease and PPIs. All the data and integrated analysis and visualization tools are freely available. This manuscript describes updates to the PATRIC since its initial report in the 2007 NAR Database Issue.
- PATRIC: The VBI PathoSystems Resource Integration CenterSnyder, E. E.; Kampanya, N.; Lu, J.; Nordberg, E. K.; Karur, H. R.; Shukla, Maulik; Soneja, J.; Tian, Y.; Xue, T.; Yoo, H.; Zhang, F.; Dharmanolla, C.; Dongre, N. V.; Gillespie, J. J.; Hamelius, J.; Hance, M.; Huntington, K. I.; Jukneliene, D.; Koziski, J.; Mackasmiel, L.; Mane, S. P.; Nguyen, V.; Purkayastha, A.; Shallom, J.; Yu, G.; Guo, Y.; Gabbard, Joseph L.; Hix, D.; Azad, A. F.; Baker, S. C.; Boyle, Stephen M.; Khudyakov, Y.; Meng, Xiang-Jin; Rupprecht, C.; Vinje, J.; Crasta, Oswald R.; Czar, M. J.; Dickerman, Allan W.; Eckart, J. D.; Kenyon, R.; Will, R.; Setubal, Joao C.; Sobral, Bruno (2007-01)The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infection Diseases (NIAID) to create a data and analysis resource for selected NIAID priority pathogens, specifically proteobacteria of the genera Brucella, Rickettsia and Coxiella, and corona-, calici- and lyssaviruses and viruses associated with hepatitis A and E. The goal of the project is to provide a comprehensive bioinformatics resource for these pathogens, including consistently annotated genome, proteome and metabolic pathway data to facilitate research into counter-measures, including drugs, vaccines and diagnostics. The project's curation strategy has three prongs: 'breadth first' beginning with whole-genome and proteome curation using standardized protocols, a 'targeted' approach addressing the specific needs of researchers and an integrative strategy to leverage high-throughput experimental data (e.g. microarrays, proteomics) and literature. The PATRIC infrastructure consists of a relational database, analytical pipelines and a website which supports browsing, querying, data visualization and the ability to download raw and curated data in standard formats. At present, the site warehouses complete sequences for 17 bacterial and 332 viral genomes. The PATRIC website (https://patric.vbi.vt.edu) will continually grow with the addition of data, analysis and functionality over the course of the project.
- PATtyFams: Protein Families for the Microbial Genomes in the PATRIC DatabaseDavis, James J.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Pusch, Gordon D.; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Yoo, Hyunseung (Frontiers, 2016-02-08)The ability to build accurate protein families is a fundamental operation in bioinformatics that influences comparative analyses, genome annotation, and metabolic modeling. For several years we have been maintaining protein families for all microbial genomes in the PATRIC database (Pathosystems Resource Integration Center, patricbrc.org) in order to drive many of the comparative analysis tools that are available through the PATRIC website. However, due to the burgeoning number of genomes, traditional approaches for generating protein families are becoming prohibitive. In this report, we describe a new approach for generating protein families, which we call PATtyFams. This method uses the k-mer-based function assignments available through RAST (Rapid Annotation using Subsystem Technology) to rapidly guide family formation, and then differentiates the function-based groups into families using a Markov Cluster algorithm (MCL). This new approach for generating protein families is rapid, scalable and has properties that are consistent with alignment-based methods.
- RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomesBrettin, Thomas; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Olsen, Gary J.; Olson, Robert; Overbeek, Ross; Parrello, Bruce; Pusch, Gordon D.; Shukla, Maulik; Thomason, James A., III; Stevens, Rick L.; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang (Springer Nature, 2015-02-10)The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In this paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception.
- Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular LifeGillespie, Joseph J.; Williams, Kelly; Shukla, Maulik; Snyder, Eric E.; Nordberg, Eric K.; Ceraul, Shane M.; Dharmanolla, Chitti; Rainey, Daphne; Soneja, Jeetendra; Shallom, Joshua M.; Vishnubhat, Nataraj Dongre; Wattam, Rebecca; Purkayastha, Anjan; Czar, Michael; Crasta, Oswald; Setubal, João C.; Azad, Abdu F.; Sobral, Bruno (Public Library of Science, 2008-04-16)Background: Completed genome sequences are rapidly increasing for Rickettsia, obligate intracellular α-proteobacteria responsible for various human diseases, including epidemic typhus and Rocky Mountain spotted fever. In light of phylogeny, the establishment of orthologous groups (OGs) of open reading frames (ORFs) will distinguish the core rickettsial genes and other group specific genes (class 1 OGs or C1OGs) from those distributed indiscriminately throughout the rickettsial tree (class 2 OG or C2OGs). Methodology/Principal Findings: We present 1823 representative (no gene duplications) and 259 non-representative (at least one gene duplication) rickettsial OGs. While the highly reductive (~1.2 MB) Rickettsia genomes range in predicted ORFs from 872 to 1512, a core of 752 OGs was identified, depxicting the essential Rickettsia genes. Unsurprisingly, this core lacks many metabolic genes, reflecting the dependence on host resources for growth and survival. Additionally, we bolster our recent reclassification of Rickettsia by identifying OGs that define the AG (ancestral group), TG (typhus group), TRG (transitional group), and SFG (spotted fever group) rickettsiae. OGs for insect-associated species, tick-associated species and species that harbor plasmids were also predicted. Through superimposition of all OGs over robust phylogeny estimation, we discern between C1OGs and C2OGs, the latter depicting genes either decaying from the conserved C1OGs or acquired laterally. Finally, scrutiny of non-representative OGs revealed high levels of split genes versus gene duplications, with both phenomena confounding gene orthology assignment. Interestingly, non-representative OGs, as well as OGs comprised of several gene families typically involved in microbial pathogenicity and/or the acquisition of virulence factors, fall predominantly within C2OG distributions. Conclusion/Significance: Collectively, we determined the relative conservation and distribution of 14354 predicted ORFs from 10 rickettsial genomes across robust phylogeny estimation. The data, available at PATRIC (PathoSystems Resource Integration Center), provide novel information for unwinding the intricacies associated with Rickettsia pathogenesis, expanding the range of potential diagnostic, vaccine and therapeutic targets
- The SEED and the Rapid Annotation of microbial genomes using Subsystems Technology (RAST)Overbeek, Ross; Olson, Robert; Pusch, Gordon D.; Olsen, Gary J.; Davis, James J.; Disz, Terry; Edwards, Robert A.; Gerdes, Svetlana; Parrello, Bruce; Shukla, Maulik; Vonstein, Veronika; Wattam, Alice R.; Xia, Fangfang; Stevens, Rick L. (2014-01)In 2004, the SEED (http://pubseed.theseed.org/) was created to provide consistent and accurate genome annotations across thousands of genomes and as a platform for discovering and developing de novo annotations. The SEED is a constantly updated integration of genomic data with a genome database, web front end, API and server scripts. It is used by many scientists for predicting gene functions and discovering new pathways. In addition to being a powerful database for bioinformatics research, the SEED also houses subsystems (collections of functionally related protein families) and their derived FIGfams (protein families), which represent the core of the RAST annotation engine (http://rast.nmpdr.org/). When a new genome is submitted to RAST, genes are called and their annotations are made by comparison to the FIGfam collection. If the genome is made public, it is then housed within the SEED and its proteins populate the FIGfam collection. This annotation cycle has proven to be a robust and scalable solution to the problem of annotating the exponentially increasing number of genomes. To date, >12 000 users worldwide have annotated >60 000 distinct genomes using RAST. Here we describe the interconnectedness of the SEED database and RAST, the RAST annotation pipeline and updates to both resources.