Show simple item record

dc.contributor.authorPati, Amritaen_US
dc.date.accessioned2014-03-14T20:11:12Z
dc.date.available2014-03-14T20:11:12Z
dc.date.issued2008-04-14en_US
dc.identifier.otheretd-04282008-150624en_US
dc.identifier.urihttp://hdl.handle.net/10919/27423
dc.description.abstractGenomes have both deterministic and random aspects, with the underlying DNA sequences exhibiting features at numerous scales, from codons to regions of conserved or divergent gene order. Genomic signatures work by capturing one or more such features e±ciently into a compact mathematical structure. This work examines the unique manner in which oligonucleotides fit together to comprise a genome, within a graph-theoretic setting. A de Bruijn chain (DBC) is a marriage of a de Bruijn graph and a finite Markov chain. By representing a DNA sequence as a walk over a DBC and retaining specific information at nodes and edges, we are able to obtain the de Bruijn chain genomic signature (DBCGS), based on both graph structure and the stationary distribution of the DBC. We demonstrate that DBCGS is information-rich, efficient, sufficiently representative of the sequence from which it is derived, and superior to existing genomic signatures such as the dinucleotides odds ratio and word frequency based signatures. We develop a mathematical framework to elucidate the power of the DBCGS signature to distinguish between sequences hypothesized to be generated by DBCs of distinct parameters. We study the effect of order of the DBCGS signature on accuracy while presenting relationships with genome size and genome variety. We illustrate its practical value in distinguishing genomic sequences and predicting the origin of short DNA sequences of unknown origin, while highlighting its superior performance compared to existing genomic signatures including the dinucleotides odds ratio. Additionally, we describe details of the CMGS database, a centralized repository for raw and value-added data particular to C. elegans.en_US
dc.publisherVirginia Techen_US
dc.relation.haspartPati_Dissertation.pdfen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectMarkov chainsen_US
dc.subjectde Bruijn graphsen_US
dc.subjectGenomic signaturesen_US
dc.subjectDNA wordsen_US
dc.titleGraph-based genomic signaturesen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Scienceen_US
dc.contributor.committeechairHeath, Lenwood S.en_US
dc.contributor.committeememberHelm, Richard Fredericken_US
dc.contributor.committeememberRamakrishnan, Narenen_US
dc.contributor.committeememberShende, Anil M.en_US
dc.contributor.committeememberSetubal, João C.
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-04282008-150624/en_US
dc.date.sdate2008-04-28en_US
dc.date.rdate2008-05-14
dc.date.adate2008-05-14en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record