Investigating bisulfite short-read mapping failure with hairpin bisulfite sequencing data

TR Number
Journal Title
Journal ISSN
Volume Title

Background: DNA methylation is an important epigenetic mark relevant to normal development and disease genesis. A common approach to characterizing genome-wide DNA methylation is using Next Generation Sequencing technology to sequence bisulfite treated DNA. The short sequence reads are mapped to the reference genome to determine the methylation statuses of Cs. However, despite intense effort, a much smaller proportion of the reads derived from bisulfite treated DNA (usually about 40-80%) can be mapped than regular short reads mapping (> 90%), and it is unclear what factors lead to this low mapping efficiency. Results: To address this issue, we used the hairpin bisulfite sequencing technology to determine sequences of both DNA double strands simultaneously. This enabled the recovery of the original non-bisulfite-converted sequences. We used Bismark for bisulfite read mapping and Bowtie2 for recovered read mapping. We found that recovering the reads improved unique mapping efficiency by 9-10% compared to the bisulfite reads. Such improvement in mapping efficiency is related to sequence entropy. Conclusions: The hairpin recovery technique improves mapping efficiency, and sequence entropy relates to mapping efficiency.