Methods for Analysis of Prokaryotic Genome Architecture

dc.contributor.authorWarren, Andrew S.en
dc.contributor.committeechairHeath, Lenwood S.en
dc.contributor.committeechairSetubal, Joao C.en
dc.contributor.committeememberMurali, T. M.en
dc.contributor.committeememberDickerman, Allan W.en
dc.contributor.committeememberFriedberg, Iddoen
dc.contributor.committeememberZhang, Liqingen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2019-01-11T07:00:35Zen
dc.date.available2019-01-11T07:00:35Zen
dc.date.issued2017-07-19en
dc.description.abstractResearch in comparative microbial genomics has largely been organized around the concept of reference genomes. Reference genomes provide a useful comparative touchstone for closely related organisms. However, they do not necessarily represent the biological diversity in a group of genomes. Currently there are more than 96,000 bacterial genomes sequenced and this number is rapidly increasing. Some closely related groups have large numbers of genomes sequenced creating interesting comparative challenges: E. coli more than 5,400 isolates, S. aureus almost 9,000. As this sampling through sequencing becomes both deeper and broader, reference genome based methods become less effective at characterizing groups of organisms. Functional motifs can help explain the organizing principles behind cellular systems in bacteria which have yet to be well understood. Currently there are relatively few bioinformatic tools for analyzing potential patterns at the level of genome organization that do not depend directly on sequence similarity. We present a framework for conducting genomic data mining to look for patterns that currently require human expert designation. We establish new computational methods for identifying patterns in prokaryotic genome construction through a mapping of genomic features, using semantic similarity, independent of a particular corpus to better approximate functional similarity. We also present an algorithm for creating whole genome multiple sequence comparisons and a model for representing the similarities and di erences among sequences as a graph of syntenic gene families. This e ort touches on several di erent research fronts: graph representation of genomes and their alignments, synteny block analysis, whole genome sequence alignment, pan-genome analysis, multiple sequence alignment, and genome rearrangement analysis. Though our approach was originally developed from a pan-genome perspective for prokaryotes, the methods involved have the potential to speed up more expensive computation such as phylogenetic tree construction and SNP analysis. Novel elements include the contextualization of synteny analysis both between and within multi-contig genomes and an analytical framework for detecting genome level evolutionary events such as insertions, inversions, translocations, and fusions.en
dc.description.abstractgeneralResearch in comparative microbial genomics has largely been organized around the concept of reference genomes. Reference genomes provide a useful comparative touchstone for closely related organisms. However, they do not necessarily well represent the biological diversity in a group of genomes. As sampling through sequencing becomes both deeper and broader, reference genome based methods become less effective at characterizing groups of organisms. We present an algorithm for creating whole genome multiple sequence comparisons and a model for representing the similarities and differences among sequences as a graph of syntenic gene families called a pan-synteny graph. As the evolutionary distance between organisms increase sequence similarity and homology detection tend to break down. However, similarities in the functional characteristics of certain genes and gene modules may persist or have converged over time. Detecting and defining patterns in these functional similarities, in relation to conserved gene order, is a largely unexplored problem. To create a model for representing the architectural similarity of functional modules, using ontologies and semantic similarity, we present a corpus independent semantic similarity method, and describe a computational framework for using semantic similarity and pan-synteny graphs.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:11481en
dc.identifier.urihttp://hdl.handle.net/10919/86660en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectComputer scienceen
dc.subjectBioinformaticsen
dc.subjectMicrobiologyen
dc.titleMethods for Analysis of Prokaryotic Genome Architectureen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Warren_AS_D_2017.pdf
Size:
2.65 MB
Format:
Adobe Portable Document Format