Browsing by Author "Setubal, Joao C."
Now showing 1 - 11 of 11
Results Per Page
Sort Options
- Comparative Genomics of Early-Diverging Brucella Strains Reveals a Novel Lipopolysaccharide Biosynthesis PathwayWattam, Alice R.; Inzana, Thomas J.; Williams, Kelly P.; Mane, Shrinivasrao P.; Shukla, Maulik; Almeida, Nalvo F.; Dickerman, Allan W.; Mason, Steven; Moriyon, Ignacio; O'Callaghan, David; Whatmore, Adrian M.; Sobral, Bruno; Tiller, Rebekah V.; Hoffmaster, Alex R.; Frace, Michael A.; De Castro, Cristina; Molinaro, Antonio; Boyle, Stephen M.; De, Barun K.; Setubal, Joao C. (American Society for Microbiology, 2012-11)Brucella species are Gram-negative bacteria that infect mammals. Recently, two unusual strains (Brucella inopinata BO1T and B. inopinata-like BO2) have been isolated from human patients, and their similarity to some atypical brucellae isolated from Australian native rodent species was noted. Here we present a phylogenomic analysis of the draft genome sequences of BO1T and BO2 and of the Australian rodent strains 83-13 and NF2653 that shows that they form two groups well separated from the other sequenced Brucella spp. Several important differences were noted. Both BO1T and BO2 did not agglutinate significantly when live or inactivated cells were exposed to monospecific A and M antisera against O-side chain sugars composed of N-formyl-perosamine. While BO1T maintained the genes required to synthesize a typical Brucella O-antigen, BO2 lacked many of these genes but still produced a smooth LPS (lipopolysaccharide). Most missing genes were found in the wbk region involved in O-antigen synthesis in classic smooth Brucella spp. In their place, BO2 carries four genes that other bacteria use for making a rhamnose-based O-antigen. Electrophoretic, immunoblot, and chemical analyses showed that BO2 carries an antigenically different O-antigen made of repeating hexose-rich oligosaccharide units that made the LPS water-soluble, which contrasts with the homopolymeric O-antigen of other smooth brucellae that have a phenol-soluble LPS. The results demonstrate the existence of a group of early-diverging brucellae with traits that depart significantly from those of the Brucella species described thus far. IMPORTANCE This report examines differences between genomes from four new Brucella strains and those from the classic Brucella spp. Our results show that the four new strains are outliers with respect to the previously known Brucella strains and yet are part of the genus, forming two new clades. The analysis revealed important information about the evolution and survival mechanisms of Brucella species, helping reshape our knowledge of this important zoonotic pathogen. One discovery of special importance is that one of the strains, BO2, produces an O-antigen distinct from any that has been seen in any other Brucella isolates to date.
- Genome-Centric Analysis of a Thermophilic and Cellulolytic Bacterial Consortium Derived from CompostingLemos, Leandro N.; Pereira, Roberta V.; Quaggio, Ronaldo B.; Martins, Layla F.; Moura, Livia M. S.; da Silva, Amanda R.; Antunes, Luciana P.; da Silva, Aline M.; Setubal, Joao C. (Frontiers, 2017-04-19)Microbial consortia selected from complex lignocellulolytic microbial communities are promising alternatives to deconstruct plant waste, since synergistic action of different enzymes is required for full degradation of plant biomass in biorefining applications. Culture enrichment also facilitates the study of interactions among consortium members, and can be a good source of novel microbial species. Here, we used a sample from a plant waste composting operation in the Sao Paulo Zoo (Brazil) as inoculum to obtain a thermophilic aerobic consortium enriched through multiple passages at 60C in carboxymethylcellulose as sole carbon source. The microbial community composition of this consortium was investigated by shotgun metagenomics and genome-centric analysis. Six near-complete (over 90%) genomes were reconstructed. Similarity and phylogenetic analyses show that four of these six genomes are novel, with the following hypothesized identifications: a new Thermobacillus species; the first Bacillus thermozeamaize genome (for which currently only 16S sequences are available) or else the first representative of a new family in the Bacillales order; the first representative of a new genus in the Paenibacillaceae family; and the first representative of a new deep-branching family in the Clostridia class. The reconstructed genomes from known species were identified as Geobacillus thermoglucosidasius and Caldibacillus debilis. The metabolic potential of these recovered genomes based on COG and CAZy analyses show that these genomes encode several glycoside hydrolases (GHs) as well as other genes related to lignocellulose breakdown. The new Thermobacillus species stands out for being the richest in diversity and abundance of GHs, possessing the greatest potential for biomass degradation among the six recovered genomes. We also investigated the presence and activity of the organisms corresponding to these genomes in the composting operation from which the consortium was built, using compost metagenome and metatranscriptome datasets generated in a previous study. We obtained strong evidence that five of the six recovered genomes are indeed present and active in that composting process. We have thus discovered three (perhaps four) new thermophillic bacterial species that add to the increasing repertoire of known lignocellulose degraders, whose biotechnological potential can now be investigated in further studies.
- MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic BinsAmgarten, Deyvid; Braga, Lucas P. P.; da Silva, Aline M.; Setubal, Joao C. (Frontiers, 2018-08-07)Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences.
- Methods for Analysis of Prokaryotic Genome ArchitectureWarren, Andrew S. (Virginia Tech, 2017-07-19)Research in comparative microbial genomics has largely been organized around the concept of reference genomes. Reference genomes provide a useful comparative touchstone for closely related organisms. However, they do not necessarily represent the biological diversity in a group of genomes. Currently there are more than 96,000 bacterial genomes sequenced and this number is rapidly increasing. Some closely related groups have large numbers of genomes sequenced creating interesting comparative challenges: E. coli more than 5,400 isolates, S. aureus almost 9,000. As this sampling through sequencing becomes both deeper and broader, reference genome based methods become less effective at characterizing groups of organisms. Functional motifs can help explain the organizing principles behind cellular systems in bacteria which have yet to be well understood. Currently there are relatively few bioinformatic tools for analyzing potential patterns at the level of genome organization that do not depend directly on sequence similarity. We present a framework for conducting genomic data mining to look for patterns that currently require human expert designation. We establish new computational methods for identifying patterns in prokaryotic genome construction through a mapping of genomic features, using semantic similarity, independent of a particular corpus to better approximate functional similarity. We also present an algorithm for creating whole genome multiple sequence comparisons and a model for representing the similarities and di erences among sequences as a graph of syntenic gene families. This e ort touches on several di erent research fronts: graph representation of genomes and their alignments, synteny block analysis, whole genome sequence alignment, pan-genome analysis, multiple sequence alignment, and genome rearrangement analysis. Though our approach was originally developed from a pan-genome perspective for prokaryotes, the methods involved have the potential to speed up more expensive computation such as phylogenetic tree construction and SNP analysis. Novel elements include the contextualization of synteny analysis both between and within multi-contig genomes and an analytical framework for detecting genome level evolutionary events such as insertions, inversions, translocations, and fusions.
- PATRIC: The VBI PathoSystems Resource Integration CenterSnyder, E. E.; Kampanya, N.; Lu, J.; Nordberg, E. K.; Karur, H. R.; Shukla, Maulik; Soneja, J.; Tian, Y.; Xue, T.; Yoo, H.; Zhang, F.; Dharmanolla, C.; Dongre, N. V.; Gillespie, J. J.; Hamelius, J.; Hance, M.; Huntington, K. I.; Jukneliene, D.; Koziski, J.; Mackasmiel, L.; Mane, S. P.; Nguyen, V.; Purkayastha, A.; Shallom, J.; Yu, G.; Guo, Y.; Gabbard, Joseph L.; Hix, D.; Azad, A. F.; Baker, S. C.; Boyle, Stephen M.; Khudyakov, Y.; Meng, Xiang-Jin; Rupprecht, C.; Vinje, J.; Crasta, Oswald R.; Czar, M. J.; Dickerman, Allan W.; Eckart, J. D.; Kenyon, R.; Will, R.; Setubal, Joao C.; Sobral, Bruno (2007-01)The PathoSystems Resource Integration Center (PATRIC) is one of eight Bioinformatics Resource Centers (BRCs) funded by the National Institute of Allergy and Infection Diseases (NIAID) to create a data and analysis resource for selected NIAID priority pathogens, specifically proteobacteria of the genera Brucella, Rickettsia and Coxiella, and corona-, calici- and lyssaviruses and viruses associated with hepatitis A and E. The goal of the project is to provide a comprehensive bioinformatics resource for these pathogens, including consistently annotated genome, proteome and metabolic pathway data to facilitate research into counter-measures, including drugs, vaccines and diagnostics. The project's curation strategy has three prongs: 'breadth first' beginning with whole-genome and proteome curation using standardized protocols, a 'targeted' approach addressing the specific needs of researchers and an integrative strategy to leverage high-throughput experimental data (e.g. microarrays, proteomics) and literature. The PATRIC infrastructure consists of a relational database, analytical pipelines and a website which supports browsing, querying, data visualization and the ability to download raw and curated data in standard formats. At present, the site warehouses complete sequences for 17 bacterial and 332 viral genomes. The PATRIC website (https://patric.vbi.vt.edu) will continually grow with the addition of data, analysis and functionality over the course of the project.
- Patterns and Processes of Mycobacterium bovis Evolution Revealed by Phylogenomic AnalysesPatane, Jose S. L.; Martins, Joaquim, Jr.; Castelao, Ana Beatriz; Nishibe, Christiane; Montera, Luciana; Bigi, Fabiana; Zumarraga, Martin J.; Cataldi, Angel A.; Fonseca Junior, Antonio; Roxo, Eliana; Osorio, Ana Luiza A. R.; Jorge, Klaudia S.; Thacker, Tyler C.; Almeida, Nalvo F.; Araujo, Flabio R.; Setubal, Joao C. (2017-03)Mycobacterium bovis is an important animal pathogen worldwide that parasitizes wild and domesticated vertebrate livestock as well as humans. A comparison of the five M. bovis complete genomes from the United Kingdom, South Korea, Brazil, and the United States revealed four novel large-scale structural variations of at least 2,000 bp. A comparative phylogenomic study including 2,483 core genes of 38 taxa from eight countries showed conflicting phylogenetic signal among sites. By minimizing this effect, we obtained a tree that better agrees with sampling locality. Results supported a relatively basal position of African strains (all isolated from Homo sapiens), confirming that Africa was an important region for early diversification and that humans were one of the earliest hosts. Selection analyses revealed that functional categories such as "Lipid transport and metabolism," "Cell cycle control, cell division, chromosome partitioning" and "Cell motility" were significant for the evolution of the group, besides other categories previously described, showing importance of genes associated with virulence and cholesterol metabolism in the evolution of M. bovis. PE/PPE genes, many of which are known to be associated with virulence, were major targets for large-scale polymorphisms, homologous recombination, and positive selection, evincing for the first time a plethora of evolutionary forces possibly contributing to differential adaptability in M. bovis. By assuming different priors, US strains originated and started to diversify around 150-5,210 ya. By further analyzing the largest set of US genomes to date (76 in total), obtained from 14 host species, we detected that hosts were not clustered in clades (except for a few cases), with some faster-evolving strains being detected, suggesting fast and ongoing reinfections across host species, and therefore, the possibility of new bovine tuberculosis outbreaks.
- Phylogenomics of Xanthomonas field strains infecting pepper and tomato reveals diversity in effector repertoires and identifies determinants of host specificitySchwartz, Allison R.; Potnist, Neha; Milsina, Sujan; Wilson, Mark; Patane, Jose; Martins, Joaquim, Jr.; Minsavage, Gerald V.; Dahlbeck, Douglas; Akhunova, Alina; Almeida, Nalvo F.; Vallad, Gary E.; Barak, Jeri D.; White, Frank F.; Miller, Sally A.; Ritchie, David; Goss, Erica; Bart, Rebecca S.; Setubal, Joao C.; Jones, Jeffrey B.; Staskawicz, Brian J. (Frontiers, 2015-06-03)Bacterial spot disease of pepper and tomato is caused by four distinct Xanthomonas species and is a severely limiting factor on fruit yield in these crops. The genetic diversity and the type Ill effector repertoires of a large sampling of field strains for this disease have yet to be explored on a genomic scale, limiting our understanding of pathogen evolution in an agricultural setting. Genomes of 67 Xanthomonas euvesicatoria (Xe), Xanthomonas perforans (Xp), and Xanthomonas gardneri (Kg) strains isolated from diseased pepper and tomato fields in the southeastern and midwestern United States were sequenced in order to determine the genetic diversity in field strains. Type Ill effector repertoires were computationally predicted for each strain, and multiple methods of constructing phylogenies were employed to understand better the genetic relationship of strains in the collection. A division in the Xp population was detected based on core genome phylogeny, supporting a model whereby the host-range expansion of Xp field strains on pepper is due, in part, to a loss of the effector AvrBsT. Xp-host compatibility was further studied with the observation that a double deletion of AvrBsT and XopQ allows a host range expansion for Nicotiana benthamiana. Extensive sampling of field strains and an improved understanding of effector content will aid in efforts to design disease resistance strategies targeted against highly conserved core effectors.
- Schistosoma mansoni Egg, Adult Male and Female Comparative Gene Expression Analysis and Identification of Novel Genes by RNA-SeqAnderson, Leticia; Amaral, Murilo S.; Beckedorff, Felipe; Silva, Lucas F.; Dazzani, Bianca; Oliveira, Katia C.; Almeida, Giulliana T.; Gomes, Monete R.; Pires, David S.; Setubal, Joao C.; DeMarco, Ricardo; Verjovski-Almeida, Sergio (PLOS, 2015-12)Background Schistosomiasis is one of the most prevalent parasitic diseases worldwide and is a public health problem. Schistosoma mansoni is the most widespread species responsible for schistosomiasis in the Americas, Middle East and Africa. Adult female worms (mated to males) release eggs in the hepatic portal vasculature and are the principal cause of morbidity. Comparative separate transcriptomes of female and male adult worms were previously assessed with using microarrays and Serial Analysis of Gene Expression (SAGE), thus limiting the possibility of finding novel genes. Moreover, the egg transcriptome was analyzed only once with limited bacterially cloned cDNA libraries. Methodology/Principal findings To compare the gene expression of S. mansoni eggs, females, and males, we performed RNA-Seq on these three parasite forms using 454/Roche technology and reconstructed the transcriptome using Trinity de novo assembly. The resulting contigs were mapped to the genome and were cross-referenced with predicted Smp genes and H3K4me3 ChIP-Seq public data. For the first time, we obtained separate, unbiased gene expression profiles for S. mansoni eggs and female and male adult worms, identifying enriched biological processes and specific enriched functions for each of the three parasite forms. Transcripts with no match to predicted genes were analyzed for their protein-coding potential and the presence of an encoded conserved protein domain. A set of 232 novel protein-coding genes with putative functions related to reproduction, metabolism, and cell biogenesis was detected, which contributes to the understanding of parasite biology. Conclusions/Significance Large-scale RNA-Seq analysis using de novo assembly associated with genome-wide information for histone marks in the vicinity of gene models constitutes a new approach to transcriptome analysis that has not yet been explored in schistosomes. Importantly, all data have been consolidated into a UCSC Genome Browser search-and download-tool (http://schistosoma.usp.br/). This database provides new ways to explore the schistosome genome and transcriptome and will facilitate molecular research on this important parasite.
- Sequence verification of synthetic DNA by assembly of sequencing readsWilson, Mandy L.; Cai, Yizhi; Hanlon, Regina; Taylor, Samantha; Chevreux, Bastien; Setubal, Joao C.; Tyler, Brett M.; Peccoud, Jean (2013-01)Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org.
- Tissue-Associated Bacterial Alterations in Rectal Carcinoma Patients Revealed by 16S rRNA Community ProfilingThomas, Andrew M.; Jesus, Eliane C.; Lopes, Ademar; Aguiar, Samuel, Jr.; Begnami, Maria D.; Rocha, Rafael M.; Carpinetti, Paola Avelar; Camargo, Anamaria A.; Hoffmann, Christian; Freitas, Helano C.; Silva, Israel T.; Nunes, Diana N.; Setubal, Joao C.; Dias-Neto, Emmanuel (Frontiers, 2016-12-09)Sporadic and inflammatory forms of colorectal cancer (CRC) account for more than 80% of cases. Recent publications have shown mechanistic evidence for the involvement of gut bacteria in the development of both CRC-forms. Whereas, colon and rectal cancer have been routinely studied together as CRC, increasing evidence show these to be distinct diseases. Also, the common use of fecal samples to study microbial communities may reflect disease state but possibly not the tumor microenvironment. We performed this study to evaluate differences in bacterial communities found in tissue samples of 18 rectal-cancer subjects when compared to 18 non-cancer controls. Samples were collected during exploratory colonoscopy (non-cancer group) or during surgery for tumor excision (rectal-cancer group). High throughput 16S rRNA amplicon sequencing of the V4V5 region was conducted on the Ion PGM platform, reads were filtered using Qiime and clustered using UPARSE. We observed significant increases in species richness and diversity in rectal cancer samples, evidenced by the total number of OTUs and the Shannon and Simpson indexes. Enterotyping analysis divided our cohort into two groups, with the majority of rectal cancer samples clustering into one enterotype, characterized by a greater abundance of Bacteroides and Dorea. At the phylum level, rectal-cancer samples had increased abundance of candidate phylum OD1 (also known as Parcubacteria) whilst non-cancer samples had increased abundance of Planctomycetes. At the genera level, rectal-cancer samples had higher abundances of Bacteroides, Phascolarctobacterium, Parabacteroides, Desulfovibrio, and Odoribacter whereas non-cancer samples had higher abundances of Pseudomonas, Escherichia, Acinetobacter, Lactobacillus, and Bacillus. Two Bacteroides fragilis OTUs were more abundant among rectal-cancer patients seen through 16S rRNA amplicon sequencing, whose presence was confirmed by immunohistochemistry and enrichment verified by digital droplet PCR. Our findings point to increased bacterial richness and diversity in rectal cancer, along with several differences in microbial community composition. Our work is the first to present evidence for a possible role of bacteria such as B. fragilis and the phylum Parcubacteria in rectal cancer, emphasizing the need to study tissue-associated bacteria and specific regions of the gastrointestinal tract in order to better understand the possible links between the microbiota and rectal cancer.
- A versatile computational pipeline for bacterial genome annotation improvement and comparative analysis, with Brucella as a use caseYu, G. X.; Snyder, E. E.; Boyle, Stephen M.; Crasta, Oswald R.; Czar, M. J.; Mane, S. P.; Purkayastha, A.; Sobral, Bruno; Setubal, Joao C. (2007-06)We present a bacterial genome computational analysis pipeline, called GenVar. The pipeline, based on the program GeneWise, is designed to analyze an annotated genome and automatically identify missed gene calls and sequence variants such as genes with disrupted reading frames (split genes) and those with insertions and deletions (indels). For a given genome to be analyzed, GenVar relies on a database containing closely related genomes (such as other species or strains) as well as a few additional reference genomes. GenVar also helps identify gene disruptions probably caused by sequencing errors. We exemplify GenVar's capabilities by presenting results from the analysis of four Brucella genomes. Brucella is an important human pathogen and zoonotic agent. The analysis revealed hundreds of missed gene calls, new split genes and indels, several of which are species specific and hence provide valuable clues to the understanding of the genome basis of Brucella pathogenicity and host specificity.