Browsing by Author "Setubal, João C."
Now showing 1 - 20 of 22
Results Per Page
Sort Options
- Analysis of Schistosoma mansoni genes shared with Deuterostomia and with possible roles in host interactionsVenancio, Thiago M.; DeMarco, Ricardo; Almeida, Giulliana T.; Oliveira, Katia C.; Setubal, João C.; Verjovski-Almeida, Sergio (2007-11-08)Background: Schistosoma mansoni is a blood helminth parasite that causes schistosomiasis, a disease that affects 200 million people in the world. Many orthologs of known mammalian genes have been discovered in this parasite and evidence is accumulating that some of these genes encode proteins linked to signaling pathways in the parasite that appear to be involved with growth or development, suggesting a complex co-evolutionary process. Results: In this work we found 427 genes conserved in the Deuterostomia group that have orthologs in S. mansoni and no members in any nematodes and insects so far sequenced. Among these genes we have identified Insulin Induced Gene (INSIG), Interferon Regulatory Factor (IRF) and vasohibin orthologs, known to be involved in mammals in mevalonate metabolism, immune response and angiogenesis control, respectively. We have chosen these three genes for a more detailed characterization, which included extension of their cloned messages to obtain full-length sequences. Interestingly, SmINSIG showed a 10-fold higher expression in adult females as opposed to males, in accordance with its possible role in regulating egg production. SmIRF has a DNA binding domain, a tryptophan-rich N-terminal region and several predicted phosphorylation sites, usually important for IRF activity. Fourteen different alternatively spliced forms of the S. mansoni vasohibin (SmVASL) gene were detected that encode seven different protein isoforms including one with a complete C-terminal end, and other isoforms with shorter C-terminal portions. Using S. mansoni homologs, we have employed a parsimonious rationale to compute the total gene losses/gains in nematodes, arthropods and deuterostomes under either the Coelomata or the Ecdysozoa evolutionary hypotheses; our results show a lower losses/gains number under the latter hypothesis. Conclusion: The genes discussed which are conserved between S. mansoni and deuterostomes, probably have an ancient origin and were lost in Ecdysozoa, being still present in Lophotrochozoa. Given their known functions in Deuterostomia, it is possible that some of them have been co-opted to perform functions related (directly or indirectly) to host adaptation or interaction with host signaling processes.
- Analysis of the Allergenic Potential of the Ubiquitous Airborne Fungus Alternaria Using BioinformaticsBabiceanu, Mihaela (Virginia Tech, 2011-06-15)Among the environmental airborne fungi one of the most common is Alternaria alternata. From a clinical perspective Alternaria has long been associated with IgE-mediated, histamine-dependent mold allergy, allergic rhinitis, chronic rhinosinusitis (CRS) and asthma. Recently it has been proven that an abnormal immunological response to Alternaria most likely contributes to the pathogenesis of upper respiratory airway disorders. In this body of work, we present for the first time results of several sets of experiments including, 1) the analysis of A. alternata spore germination expressed sequence tags (ESTs), 2) the survey of global allergen homologues in fungal genomes, and 3) the first microarray experiment investigating airway epithelial cell responses to this fungus. In the first project, the analyses of the EST dataset offered a first look into the gene content of A. alternata and represents the beginning of future research of this ubiquitous fungus. Annotation and classification of ESTs revealed a number of genes that could be involved in the immunomodulation process of the human immune response toward fungi. We also discovered that the majority of known allergens are expressed during the spore germination phase of A. alternata. For investigating the allergenic potential of fungi we developed a whole genome approach by querying fungal genome sequences (A. alternata, A. brassicicola, and Aspergillus fumigatus) with a database of all known allergenic proteins from a taxonomically diverse group of organisms. Interestingly, we identified homologues of diverse types of allergens in these fungal genomes and also many homologues of allergens from other organisms including those from pollen, insects, and venoms. Finally, we investigated global gene expression changes of human airway cells in response to A. alternata and an ∆alt a 1 deletion mutant. We found that wild type Alternaria spores induced significant changes in gene expression patterns in human airway epithelial cells, especially known immune response genes. Furthermore, results of these analyses revealed that Alt a 1 is a major factor in inducing epithelial inflammatory responses.
- Ancestral Genome Reconstruction in BacteriaYang, Kuan (Virginia Tech, 2012-06-06)The rapid accumulation of numerous sequenced genomes has provided a golden opportunity for ancestral state reconstruction studies, especially in the whole genome reconstruction area. However, most ancestral genome reconstruction methods developed so far only focus on gene or replicon sequences instead of whole genomes. They rely largely on either detailed modeling of evolutionary events or edit distance computation, both of which can be computationally prohibitive for large data sets. Hence, most of these methods can only be applied to a small number of features and species. In this dissertation, we describe the design, implementation, and evaluation of an ancestral genome reconstruction system (REGEN) for bacteria. It is the first bacterial genome reconstruction tool that focuses on ancestral state reconstruction at the genome scale instead of the gene scale. It not only reconstructs ancestral gene content and contiguous gene runs using either a maximum parsimony or a maximum likelihood criterion but also replicon structures of each ancestor. Based on the reconstructed genomes, it can infer all major events at both the gene scale, such as insertion, deletion, and translocation, and the replicon scale, such as replicon gain, loss, and merge. REGEN finishes by producing a visual representation of the entire evolutionary history of all genomes in the study. With a model-free reconstruction method at its core, the computational requirement for ancestral genome reconstruction is reduced sufficiently for the tool to be applied to large data sets with dozens of genomes and thousands of features. To achieve as accurate a reconstruction as possible, we also develop a homologous gene family prediction tool for preprocessing. Furthermore, we build our in-house Prokaryote Genome Evolution simulator (PEGsim) for evaluation purposes. The homologous gene family prediction refinement module can refine homologous gene family predictions generated by third party de novo prediction programs by combining phylogeny and local gene synteny. We show that such refinement can be accomplished for up to 80% of homologous gene family predictions with ambiguity (mixed families). The genome evolution simulator, PEGsim, is the first random events based high level bacteria genome evolution simulator with models for all common evolutionary events at the gene, replicon, and genome scales. The concepts of conserved gene runs and horizontal gene transfer (HGT) are also built in. We show the validation of PEGsim itself and the evaluation of the last reconstruction component with simulated data produced by it. REGEN, REconstruction of GENomes, is an ancestral genome reconstruction tool based on the concept of neighboring gene pairs (NGPs). Although it does not cover the reconstruction of actual nucleotide sequences, it is capable of reconstructing gene content, contiguous genes runs, and replicon structure of each ancestor using either a maximum parsimony or a maximum likelihood criterion. Based on the reconstructed genomes, it can infer all major events at both the gene scale, such as insertion, deletion, and translocation, and the replicon scale, such as replicon gain, loss, and merge. REGEN finishes by producing a visual representation of the entire evolutionary history of all genomes in the study.
- An anomalous type IV secretion system in Rickettsia is evolutionarily conservedGillespie, Joseph J.; Ammerman, Nicole C.; Dreher-Lesnick, Sheila M.; Rahman, Sayeedur; Worley, Micah J.; Setubal, João C.; Sobral, Bruno; Azad, Abdu F. (Public Library of Science, 2009-03-12)Background: Bacterial type IV secretion systems (T4SSs) comprise a diverse transporter family functioning in conjugation, competence, and effector molecule (DNA and/or protein) translocation. Thirteen genome sequences from Rickettsia, obligate intracellular symbionts/pathogens of a wide range of eukaryotes, have revealed a reduced T4SS relative to the Agrobacterium tumefaciens archetype (vir). However, the Rickettsia T4SS has not been functionally characterized for its role in symbiosis/virulence, and none of its substrates are known. Results: Superimposition of T4SS structural/functional information over previously identified Rickettsia components implicate a functional Rickettsia T4SS. virB4, virB8 and virB9 are duplicated, yet only one copy of each has the conserved features of similar genes in other T4SSs. An extraordinarily duplicated VirB6 gene encodes five hydrophobic proteins conserved only in a short region known to be involved in DNA transfer in A. tumefaciens. virB1, virB2 and virB7 are newly identified, revealing a Rickettsia T4SS lacking only virB5 relative to the vir archetype. Phylogeny estimation suggests vertical inheritance of all components, despite gene rearrangements into an archipelago of five islets. Similarities of Rickettsia VirB7/ VirB9 to ComB7/ComB9 proteins of e-proteobacteria, as well as phylogenetic affinities to the Legionella lvh T4SS, imply the Rickettsiales ancestor acquired a vir-like locus from distantly related bacteria, perhaps while residing in a protozoan host. Modern modifications of these systems likely reflect diversification with various eukaryotic host cells. Conclusion: We present the rvh (Rickettsiales vir homolog) T4SS, an evolutionary conserved transporter with an unknown role in rickettsial biology. This work lays the foundation for future laboratory characterization of this system, and also identifies the Legionella lvh T4SS as a suitable genetic model.
- Comparative genomics reveals diversity among xanthomonads infecting tomato and pepperPotnis, Neha; Krasileva, Ksenia V.; Chow, Virginia; Almeida, Nalvo F.; Patil, Prabhu B.; Ryan, Robert P.; Sharlach, Molly; Behlau, Franklin; Dow, J. Max; Momol, M. T.; White, Frank F.; Preston, James F.; Vinatzer, Boris A.; Koebnik, Ralf; Setubal, João C.; Norman, David J.; Staskawicz, Brian J.; Jones, Jeffrey B. (2011-03-11)Background Bacterial spot of tomato and pepper is caused by four Xanthomonas species and is a major plant disease in warm humid climates. The four species are distinct from each other based on physiological and molecular characteristics. The genome sequence of strain 85-10, a member of one of the species, Xanthomonas euvesicatoria (Xcv) has been previously reported. To determine the relationship of the four species at the genome level and to investigate the molecular basis of their virulence and differing host ranges, draft genomic sequences of members of the other three species were determined and compared to strain 85-10. Results We sequenced the genomes of X. vesicatoria (Xv) strain 1111 (ATCC 35937), X. perforans (Xp) strain 91-118 and X. gardneri (Xg) strain 101 (ATCC 19865). The genomes were compared with each other and with the previously sequenced Xcv strain 85-10. In addition, the molecular features were predicted that may be required for pathogenicity including the type III secretion apparatus, type III effectors, other secretion systems, quorum sensing systems, adhesins, extracellular polysaccharide, and lipopolysaccharide determinants. Several novel type III effectors from Xg strain 101 and Xv strain 1111 genomes were computationally identified and their translocation was validated using a reporter gene assay. A homolog to Ax21, the elicitor of XA21-mediated resistance in rice, and a functional Ax21 sulfation system were identified in Xcv. Genes encoding proteins with functions mediated by type II and type IV secretion systems have also been compared, including enzymes involved in cell wall deconstruction, as contributors to pathogenicity. Conclusions Comparative genomic analyses revealed considerable diversity among bacterial spot pathogens, providing new insights into differences and similarities that may explain the diverse nature of these strains. Genes specific to pepper pathogens, such as the O-antigen of the lipopolysaccharide cluster, and genes unique to individual strains, such as novel type III effectors and bacteriocin genes, have been identified providing new clues for our understanding of pathogen virulence, aggressiveness, and host preference. These analyses will aid in efforts towards breeding for broad and durable resistance in economically important tomato and pepper cultivars.
- Genetic resources for advanced biofuel production described with the Gene OntologyTorto-Alalibo, Trudy; Purwantini, Endang; Lomax, Jane; Setubal, João C.; Mukhopadhyay, Biswarup; Tyler, Brett M. (Frontiers, 2014-10-10)Dramatic increases in research in the area of microbial biofuel production coupled with high-throughput data generation on bioenergy-related microbes has led to a deluge of information in the scientific literature and in databases. Consolidating this information and making it easily accessible requires a unified vocabulary. The Gene Ontology (GO) fulfills that requirement, as it is a well-developed structured vocabulary that describes the activities and locations of gene products in a consistent manner across all kingdoms of life. The Microbial ENergy processes Gene Ontology (http://www.mengo.biochem.vt.edu) project is extending the GO to include new terms to describe microbial processes of interest to bioenergy production. Our effort has added over 600 bioenergy related terms to the Gene Ontology. These terms will aid in the comprehensive annotation of gene products from diverse energy-related microbial genomes. An area of microbial energy research that has received a lot of attention is microbial production of advanced biofuels. These include alcohols such as butanol, isopropanol, isobutanol, and fuels derived from fatty acids, isoprenoids, and polyhydroxyalkanoates. These fuels are superior to first generation biofuels (ethanol and biodiesel esterified from vegetable oil or animal fat), can be generated from non-food feedstock sources, can be used as supplements or substitutes for gasoline, diesel and jet fuels, and can be stored and distributed using existing infrastructure. Here we review the roles of genes associated with synthesis of advanced biofuels, and at the same time introduce the use of the GO to describe the functions of these genes in a standardized way.
- Genetic resources for methane production from biomass described with the Gene OntologyPurwantini, E.; Torto-Alalibo, T.; Lomax, J.; Setubal, João C.; Tyler, B. M.; Mukhopadhyay, B. (Frontiers, 2014-12-03)
- The Genome Reverse Compiler: an explorative annotation toolWarren, Andrew S.; Setubal, João C. (2009-01-27)Background As sequencing costs have decreased, whole genome sequencing has become a viable and integral part of biological laboratory research. However, the tools with which genes can be found and functionally characterized have not been readily adapted to be part of the everyday biological sciences toolkit. Most annotation pipelines remain as a service provided by large institutions or come as an unwieldy conglomerate of independent components, each requiring their own setup and maintenance. Results To address this issue we have created the Genome Reverse Compiler, an easy-to-use, open-source, automated annotation tool. The GRC is independent of third party software installs and only requires a Linux operating system. This stands in contrast to most annotation packages, which typically require installation of relational databases, sequence similarity software, and a number of other programming language modules. We provide details on the methodology used by GRC and evaluate its performance on several groups of prokaryotes using GRC's built in comparison module. Conclusion Traditionally, to perform whole genome annotation a user would either set up a pipeline or take advantage of an online service. With GRC the user need only provide the genome he or she wants to annotate and the function resource files to use. The result is high usability and a very minimal learning curve for the intended audience of life science researchers and bioinformaticians. We believe that the GRC fills a valuable niche in allowing users to perform explorative, whole-genome annotation.
- Graph-based genomic signaturesPati, Amrita (Virginia Tech, 2008-04-14)Genomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons to regions of conserved or divergent gene order. Genomic signatures work by capturing one or more such features efficiently into a compact mathematical structure. This work examines the unique manner in which oligonucleotides fit together to comprise a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a marriage of a de Bruijn graph and a finite Markov chain. By representing a DNA sequence as a walk over a DBC and retaining specific information at nodes and edges, we are able to obtain the de Bruijn chain genomic signature (DBCGS), based on both graph structure and the stationary distribution of the DBC. We demonstrate that DBCGS is information-rich, efficient, sufficiently representative of the sequence from which it is derived, and superior to existing genomic signatures such as the dinucleotides odds ratio and word frequency based signatures. We develop a mathematical framework to elucidate the power of the DBCGS signature to distinguish between sequences hypothesized to be generated by DBCs of distinct parameters. We study the effect of order of the DBCGS signature on accuracy while presenting relationships with genome size and genome variety. We illustrate its practical value in distinguishing genomic sequences and predicting the origin of short DNA sequences of unknown origin, while highlighting its superior performance compared to existing genomic signatures including the dinucleotides odds ratio. Additionally, we describe details of the CMGS database, a centralized repository for raw and value-added data particular to C. elegans.
- Missing genes in the annotation of prokaryotic genomesWarren, Andrew S.; Archuleta, Jeremy; Feng, Wu-chun; Setubal, João C. (BioMed Central, 2010-03-15)Background Protein-coding gene detection in prokaryotic genomes is considered a much simpler problem than in intron-containing eukaryotic genomes. However there have been reports that prokaryotic gene finder programs have problems with small genes (either over-predicting or under-predicting). Therefore the question arises as to whether current genome annotations have systematically missing, small genes. Results We have developed a high-performance computing methodology to investigate this problem. In this methodology we compare all ORFs larger than or equal to 33 aa from all fully-sequenced prokaryotic replicons. Based on that comparison, and using conservative criteria requiring a minimum taxonomic diversity between conserved ORFs in different genomes, we have discovered 1,153 candidate genes that are missing from current genome annotations. These missing genes are similar only to each other and do not have any strong similarity to gene sequences in public databases, with the implication that these ORFs belong to missing gene families. We also uncovered 38,895 intergenic ORFs, readily identified as putative genes by similarity to currently annotated genes (we call these absent annotations). The vast majority of the missing genes found are small (less than 100 aa). A comparison of select examples with GeneMark, EasyGene and Glimmer predictions yields evidence that some of these genes are escaping detection by these programs. Conclusions Prokaryotic gene finders and prokaryotic genome annotations require improvement for accurate prediction of small genes. The number of missing gene families found is likely a lower bound on the actual number, due to the conservative criteria used to determine whether an ORF corresponds to a real gene.
- Next-generation phage display: integrating and comparing available molecular tools to enable cost-effective high-throughput analysisDias-Neto, Emmanue; Nunes, Diana N.; Giordano, Ricardo J.; Sun, Jessica; Botz, Gregory H.; Yang, Kuan; Setubal, João C.; Pasqualini, Renata; Arap, Wadih (Public Library of Science, 2009-12-17)Background: Combinatorial phage display has been used in the last 20 years in the identification of protein-ligands and protein-protein interactions, uncovering relevant molecular recognition events. Rate-limiting steps of combinatorial phage display library selection are (i) the counting of transducing units and (ii) the sequencing of the encoded displayed ligands. Here, we adapted emerging genomic technologies to minimize such challenges. Methodology/Principal Findings: We gained efficiency by applying in tandem real-time PCR for rapid quantification to enable bacteria-free phage display library screening, and added phage DNA next-generation sequencing for large-scale ligand analysis, reporting a fully integrated set of high-throughput quantitative and analytical tools. The approach is far less labor-intensive and allows rigorous quantification; for medical applications, including selections in patients, it also represents an advance for quantitative distribution analysis and ligand identification of hundreds of thousands of targeted particles from patient-derived biopsy or autopsy in a longer timeframe post library administration. Additional advantages over current methods include increased sensitivity, less variability, enhanced linearity, scalability, and accuracy at much lower cost. Sequences obtained by qPhage plus pyrosequencing were similar to a dataset produced from conventional Sanger-sequenced transducing-units (TU), with no biases due to GC content, codon usage, and amino acid or peptide frequency. These tools allow phage display selection and ligand analysis at .1,000-fold faster rate, and reduce costs ,250-fold for generating 106 ligand sequences. Conclusions/Significance: Our analyses demonstrates that whereas this approach correlates with the traditional colonycounting, it is also capable of a much larger sampling, allowing a faster, less expensive, more accurate and consistent analysis of phage enrichment. Overall, qPhage plus pyrosequencing is superior to TU-counting plus Sanger sequencing and is proposed as the method of choice over a broad range of phage display applications in vitro, in cells, and in vivo.
- Novel insights into the genomic basis of citrus canker based on the genome sequences of two strains of Xanthomonas fuscans subsp aurantifoliiMoreira, Leandro M.; Almeida, Nalvo F.; Potnis, Neha; Digiampietri, Luciano A.; Adi, Said S.; Bortolossi, Julio C.; da Silva, Ana C.; da Silva, Aline M.; de Moraes, Fabrício E.; de Oliveira, Julio C.; de Souza, Robson F.; Facincani, Agda P.; Ferraz, André L.; Ferro, Maria I.; Furlan, Luiz R.; Gimenez, Daniele F.; Jones, Jeffrey B.; Kitajima, Elliot W.; Laia, Marcelo L.; Leite, Rui P., Jr; Nishiyama, Milton Y.; Rodrigues Neto, Julio; Nociti, Letícia A.; Norman, David J.; Ostroski, Eric H.; Pereira, Haroldo A. Jr.; Staskawicz, Brian J.; Tezza, Renata I.; Ferro, Jesus A.; Vinatzer, Boris A.; Setubal, João C. (Biomed Central, 2010-04-13)Background Citrus canker is a disease that has severe economic impact on the citrus industry worldwide. There are three types of canker, called A, B, and C. The three types have different phenotypes and affect different citrus species. The causative agent for type A is Xanthomonas citri subsp. citri, whose genome sequence was made available in 2002. Xanthomonas fuscans subsp. aurantifolii strain B causes canker B and Xanthomonas fuscans subsp. aurantifolii strain C causes canker C. Results We have sequenced the genomes of strains B and C to draft status. We have compared their genomic content to X. citri subsp. citri and to other Xanthomonas genomes, with special emphasis on type III secreted effector repertoires. In addition to pthA, already known to be present in all three citrus canker strains, two additional effector genes, xopE3 and xopAI, are also present in all three strains and are both located on the same putative genomic island. These two effector genes, along with one other effector-like gene in the same region, are thus good candidates for being pathogenicity factors on citrus. Numerous gene content differences also exist between the three cankers strains, which can be correlated with their different virulence and host range. Particular attention was placed on the analysis of genes involved in biofilm formation and quorum sensing, type IV secretion, flagellum synthesis and motility, lipopolysacharide synthesis, and on the gene xacPNP, which codes for a natriuretic protein. Conclusion We have uncovered numerous commonalities and differences in gene content between the genomes of the pathogenic agents causing citrus canker A, B, and C and other Xanthomonas genomes. Molecular genetics can now be employed to determine the role of these genes in plant-microbe interactions. The gained knowledge will be instrumental for improving citrus canker control.
- Origin and diversification of Xanthomonas citri subsp. citri pathotypes revealed by inclusive phylogenomic, dating, and biogeographic analysesPatané, José S. L.; Martins, Joaquim; Rangel, Luiz T.; Belasque, José; Digiampietri, Luciano A.; Facincani, Agda P.; Ferreira, Rafael M.; Jaciani, Fabrício J.; Zhang, Yunzeng; Varani, Alessandro M.; Almeida, Nalvo F.; Wang, Nian; Ferro, Jesus A.; Moreira, Leandro M.; Setubal, João C. (2019-09-09)Background Xanthomonas citri subsp. citri pathotypes cause bacterial citrus canker, being responsible for severe agricultural losses worldwide. The A pathotype has a broad host spectrum, while A* and Aw are more restricted both in hosts and in geography. Two previous phylogenomic studies led to contrasting well-supported clades for sequenced genomes of these pathotypes. No extensive biogeographical or divergence dating analytic approaches have been so far applied to available genomes. Results Based on a larger sampling of genomes than in previous studies (including six new genomes sequenced by our group, adding to a total of 95 genomes), phylogenomic analyses resulted in different resolutions, though overall indicating that A + AW is the most likely true clade. Our results suggest the high degree of recombination at some branches and the fast diversification of lineages are probable causes for this phylogenetic blurring effect. One of the genomes analyzed, X. campestris pv. durantae, was shown to be an A* strain; this strain has been reported to infect a plant of the family Verbenaceae, though there are no reports of any X. citri subsp. citri pathotypes infecting any plant outside the Citrus genus. Host reconstruction indicated the pathotype ancestor likely had plant hosts in the family Fabaceae, implying an ancient jump to the current Rutaceae hosts. Extensive dating analyses indicated that the origin of X. citri subsp. citri occurred more recently than the main phylogenetic splits of Citrus plants, suggesting dispersion rather than host-directed vicariance as the main driver of geographic expansion. An analysis of 120 pathogenic-related genes revealed pathotype-associated patterns of presence/absence. Conclusions Our results provide novel insights into the evolutionary history of X. citri subsp. citri as well as a sound phylogenetic foundation for future evolutionary and genomic studies of its pathotypes.
- Pathosystems Biology: Computational Prediction and Analysis of Host-Pathogen Protein Interaction NetworksDyer, Matthew D. (Virginia Tech, 2008-06-26)An important aspect of systems biology is the elucidation of the protein-protein interactions (PPIs) that control important biological processes within a cell and between organisms. In particular, at the cellular and molecular level, interactions between a pathogen and its host play a vital role in initiating infection and a successful pathogenesis. Despite recent successes in the advancement of the systems biology of model organisms to understand complex diseases, the analysis of infectious diseases at the systems-level has not received as much attention. Since pathogen related disease is responsible for millions of deaths and billions of dollars in damage to crops and livestock, understanding the mechanisms employed by pathogens to infect their hosts is critical in the development of new and effective therapeutic strategies. The research presented here is one of the first computational approaches to studying host-pathogen PPI networks. This dissertation has two main aims. First, we discuss analytical tools for studying host-pathogen networks to identify common pathways perturbed and manipulated by pathogens. We present the first global comparison of the host-pathogen PPI networks of 190 different pathogens and their interactions with human proteins. We also present the construction and analysis of three highly infectious human-bacterial PPI networks: Bacillus anthracis, Francislla tularensis, and Yersinia pestis. The second aim of the research presented here is the development of predictive models for identifying PPIs between host and pathogen proteins. We present two methods: (i) a domain-based approach that uses frequency of domain-pairs in intra-species PPIs, and (ii) a supervised machine learning method that is trained on known inter-species PPIs. The techniques developed in this dissertation, along with the informative datasets presented, will serve as a foundation for the field of computational pathosystems biology.
- The Plant Pathogen Pseudomonas syringae pv. tomato Is Genetically Monomorphic and under Strong Selection to Evade Tomato ImmunityCai, Rongman; Lewis, James; Yan, Shuangchun; Clarke, Christopher R.; Campanile, Francesco; Almeida, Nalvo F.; Studholme, David J.; Lindeberg, Magdalen; Schneider, David; Zaccardelli, Massimo; Setubal, João C.; Morales-Lizcano, Nadia P.; Bernal, Adriana; Coaker, Gitta; Baker, Christy; Bender, Carol L.; Leman, Scotland C.; Vinatzer, Boris A. (PLOS Pathogens, 2011-08-25)Recently, genome sequencing of many isolates of genetically monomorphic bacterial human pathogens has given new insights into pathogen microevolution and phylogeography. Here, we report a genome-based micro-evolutionary study of a bacterial plant pathogen, Pseudomonas syringae pv. tomato. Only 267 mutations were identified between five sequenced isolates in 3,543,009 nt of analyzed genome sequence, which suggests a recent evolutionary origin of this pathogen. Further analysis with genome-derived markers of 89 world-wide isolates showed that several genotypes exist in North America and in Europe indicating frequent pathogen movement between these world regions. Genome-derived markers and molecular analyses of key pathogen loci important for virulence and motility both suggest ongoing adaptation to the tomato host. A mutational hotspot was found in the type III-secreted effector gene hopM1. These mutations abolish the cell death triggering activity of the full-length protein indicating strong selection for loss of function of this effector, which was previously considered a virulence factor. Two non-synonymous mutations in the flagellin-encoding gene fliC allowed identifying a new microbe associated molecular pattern (MAMP) in a region distinct from the known MAMP flg22. Interestingly, the ancestral allele of this MAMP induces a stronger tomato immune response than the derived alleles. The ancestral allele has largely disappeared from today’s Pto populations suggesting that flagellin-triggered immunity limits pathogen fitness even in highly virulent pathogens. An additional non-synonymous mutation was identified in flg22 in South American isolates. Therefore, MAMPs are more variable than expected differing even between otherwise almost identical isolates of the same pathogen strain.
- PlantSimLab - a modeling and simulation web tool for plant biologistsHa, Sook; Dimitrova, Elena; Hoops, Stefan; Altarawy, Doaa; Ansariola, Mitra; Deb, Devdutta; Glazebrook, Jane; Hillmer, Rachel; Shahin, Hossameldin L.; Katagiri, Fumiaki; McDowell, John M.; Megraw, Molly; Setubal, João C.; Tyler, Brett M.; Laubenbacher, Reinhard C. (2019-10-21)Background At the molecular level, nonlinear networks of heterogeneous molecules control many biological processes, so that systems biology provides a valuable approach in this field, building on the integration of experimental biology with mathematical modeling. One of the biggest challenges to making this integration a reality is that many life scientists do not possess the mathematical expertise needed to build and manipulate mathematical models well enough to use them as tools for hypothesis generation. Available modeling software packages often assume some modeling expertise. There is a need for software tools that are easy to use and intuitive for experimentalists. Results This paper introduces PlantSimLab, a web-based application developed to allow plant biologists to construct dynamic mathematical models of molecular networks, interrogate them in a manner similar to what is done in the laboratory, and use them as a tool for biological hypothesis generation. It is designed to be used by experimentalists, without direct assistance from mathematical modelers. Conclusions Mathematical modeling techniques are a useful tool for analyzing complex biological systems, and there is a need for accessible, efficient analysis tools within the biological community. PlantSimLab enables users to build, validate, and use intuitive qualitative dynamic computer models, with a graphical user interface that does not require mathematical modeling expertise. It makes analysis of complex models accessible to a larger community, as it is platform-independent and does not require extensive mathematical expertise.
- Protein secretion systems in bacterial-host associations, and their description in the Gene OntologyTseng, Tsai-Tien; Tyler, Brett M.; Setubal, João C. (2009-02-19)Protein secretion plays a central role in modulating the interactions of bacteria with their environments. This is particularly the case when symbiotic bacteria (whether pathogenic, commensal or mutualistic) are interacting with larger host organisms. In the case of Gram-negative bacteria, secretion requires translocation across the outer as well as the inner membrane, and a diversity of molecular machines have been elaborated for this purpose. A number of secreted proteins are destined to enter the host cell (effectors and toxins), and thus several secretion systems include apparatus to translocate proteins across the plasma membrane of the host also. The Plant-Associated Microbe Gene Ontology (PAMGO) Consortium has been developing standardized terms for describing biological processes and cellular components that play important roles in the interactions of microbes with plant and animal hosts, including the processes of bacterial secretion. Here we survey bacterial secretion systems known to modulate interactions with host organisms and describe Gene Ontology terms useful for describing the components and functions of these systems, and for capturing the similarities among the diverse systems.
- REGEN: Ancestral Genome Reconstruction for BacteriaYang, Kuan; Heath, Lenwood S.; Setubal, João C. (MDPI, 2012-07-18)Ancestral genome reconstruction can be understood as a phylogenetic study with more details than a traditional phylogenetic tree reconstruction. We present a new computational system called REGEN for ancestral bacterial genome reconstruction at both the gene and replicon levels. REGEN reconstructs gene content, contiguous gene runs, and replicon structure for each ancestral genome. Along each branch of the phylogenetic tree, REGEN infers evolutionary events, including gene creation and deletion and replicon fission and fusion. The reconstruction can be performed by either a maximum parsimony or a maximum likelihood method. Gene content reconstruction is based on the concept of neighboring gene pairs. REGEN was designed to be used with any set of genomes that are sufficiently related, which will usually be the case for bacteria within the same taxonomic order. We evaluated REGEN using simulated genomes and genomes in the Rhizobiales order.
- Rickettsia Phylogenomics: Unwinding the Intricacies of Obligate Intracellular LifeGillespie, Joseph J.; Williams, Kelly; Shukla, Maulik; Snyder, Eric E.; Nordberg, Eric K.; Ceraul, Shane M.; Dharmanolla, Chitti; Rainey, Daphne; Soneja, Jeetendra; Shallom, Joshua M.; Vishnubhat, Nataraj Dongre; Wattam, Rebecca; Purkayastha, Anjan; Czar, Michael; Crasta, Oswald; Setubal, João C.; Azad, Abdu F.; Sobral, Bruno (Public Library of Science, 2008-04-16)Background: Completed genome sequences are rapidly increasing for Rickettsia, obligate intracellular α-proteobacteria responsible for various human diseases, including epidemic typhus and Rocky Mountain spotted fever. In light of phylogeny, the establishment of orthologous groups (OGs) of open reading frames (ORFs) will distinguish the core rickettsial genes and other group specific genes (class 1 OGs or C1OGs) from those distributed indiscriminately throughout the rickettsial tree (class 2 OG or C2OGs). Methodology/Principal Findings: We present 1823 representative (no gene duplications) and 259 non-representative (at least one gene duplication) rickettsial OGs. While the highly reductive (~1.2 MB) Rickettsia genomes range in predicted ORFs from 872 to 1512, a core of 752 OGs was identified, depxicting the essential Rickettsia genes. Unsurprisingly, this core lacks many metabolic genes, reflecting the dependence on host resources for growth and survival. Additionally, we bolster our recent reclassification of Rickettsia by identifying OGs that define the AG (ancestral group), TG (typhus group), TRG (transitional group), and SFG (spotted fever group) rickettsiae. OGs for insect-associated species, tick-associated species and species that harbor plasmids were also predicted. Through superimposition of all OGs over robust phylogeny estimation, we discern between C1OGs and C2OGs, the latter depicting genes either decaying from the conserved C1OGs or acquired laterally. Finally, scrutiny of non-representative OGs revealed high levels of split genes versus gene duplications, with both phenomena confounding gene orthology assignment. Interestingly, non-representative OGs, as well as OGs comprised of several gene families typically involved in microbial pathogenicity and/or the acquisition of virulence factors, fall predominantly within C2OG distributions. Conclusion/Significance: Collectively, we determined the relative conservation and distribution of 14354 predicted ORFs from 10 rickettsial genomes across robust phylogeny estimation. The data, available at PATRIC (PathoSystems Resource Integration Center), provide novel information for unwinding the intricacies associated with Rickettsia pathogenesis, expanding the range of potential diagnostic, vaccine and therapeutic targets
- Towards a Genome Reverse CompilerWarren, Andrew S. (Virginia Tech, 2007-11-05)The Genome Reverse Compiler (GRC) is an annotation tool for prokaryotic genomes. Its name and philosophy are based on analogy with a high-level programming language compiler. In this analogy, the genome is a program in a certain low-level language that humans cannot understand. Given the sequence of any prokaryotic genome, GRC produces its corresponding "high-level program"--its annotation. GRC works in a completely automatic manner, using standard input and output formats. The goal is to provide an open-source, easy-to-run, very efficient annotation program.