Show simple item record

dc.contributor.authorPorter, Jacob Stuarten_US
dc.date.accessioned2018-04-24T08:00:55Z
dc.date.available2018-04-24T08:00:55Z
dc.date.issued2018-04-23
dc.identifier.othervt_gsexam:14990en_US
dc.identifier.urihttp://hdl.handle.net/10919/82870
dc.description.abstractEpigenetics are stable heritable traits that are not a result of the DNA sequence. Epigenetic modification of DNA cytosine plays a role in development and disease. The covalent bonding of a methyl group or a hydroxymethyl group to the 5-carbon of cytosine epigenetically modifies cytosine to 5-methylcytosine or 5-hydroxymethylcytosine. Upon PCR amplification, the bisulfite treatment of DNA converts unmethylated cytosine to thymine, while 5-methylcytosine, 5-hydroxymethylcytosine, and other bases remain unchanged. The resulting sequences can be mapped to a reference genome; however, this can be challenging due to sequencing technology complexity, low sequence complexity, and biases and errors introduced with bisulfite treatment. Once the short read is mapped, the identity of 5-methylcytosine or 5-hydroxymethylcytosine can be determined by comparing the mapped read to the aligned reference genome. Bisulfite DNA read mapping is characterized by mapping performance as low as 40%. This research improves bisulfite short read mapping quality. First, reads generated from the bisulfite hairpin PCR protocol are used to study mapping failure and solutions. A read may not map to the genome; it may map uniquely, or it may map to multiple locations. Sequence complexity correlates with these mapping categories. The hairpin protocol allows for a recovery, in some cases, of the original untreated read, and mapping this read with the regular read mapper Bowtie2 improved mapper performance by 10%. New bisulfite read mapping software called BisPin was created that calls BFAST (BLAT-like Fast Accurate Search Tool) for mapping. BisPin resolves ambiguously mapped reads with a rescoring strategy, which yields a statistically significant improvement. BFAST-Gap for Ion Torrent reads was developed, since Ion Torrent machines are less expensive than Illumina machines and since Ion Torrent reads are longer. There are few mappers for Ion Torrent data. BFAST-Gap uses homopolymer run length for contextual gap penalty functions, since homopolymer runs cause errors in Ion Torrent reads. In conjunction with BisPin, this software performed well on real and simulated bisulfite Ion Torrent data and Illumina data. InfoTrim, a read trimmer with an entropy term, was developed with competitive results.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectDNA read alignmenten_US
dc.subjecthairpin whole genome bisulfiteen_US
dc.subjectindelsen_US
dc.subjectbisulfite Ion Torrenten_US
dc.subjectBisPinen_US
dc.subjectBFAST-Gapen_US
dc.titleMapping Bisulfite-Treated Short DNA Readsen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairZhang, Liqingen_US
dc.contributor.committeememberYiu, Siumingen_US
dc.contributor.committeememberXie, Hehuang Daviden_US
dc.contributor.committeememberWatson, Layne T.en_US
dc.contributor.committeememberHeath, Lenwood S.en_US
dc.contributor.committeememberWu, Xiaoweien_US


Files in this item

Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record