Uncovering missed indels by leveraging unmapped reads

dc.contributor.authorHasan, Mohammad Shabbiren
dc.contributor.authorWu, Xiaoweien
dc.contributor.authorZhang, Liqingen
dc.contributor.departmentComputer Scienceen
dc.contributor.departmentStatisticsen
dc.date.accessioned2019-11-19T18:42:58Zen
dc.date.available2019-11-19T18:42:58Zen
dc.date.issued2019-07-31en
dc.description.abstractIn current practice, Next Generation Sequencing (NGS) applications start with mapping/aligning short reads to the reference genome, with the aim of identifying genetic variants. Although existing alignment tools have shown great accuracy in mapping short reads to the reference genome, a significant number of short reads still remain unmapped and are often excluded from downstream analyses thereby causing nonnegligible information loss in the subsequent variant calling procedure. This paper describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed in the original procedure. Genesis-indel is applied to the unmapped reads of 30 breast cancer patients from TCGA. Results show that the unmapped reads are conserved between the two subtypes of breast cancer investigated in this study and might contribute to the divergence between the subtypes. Genesis-indel identifies 72,997 novel high-quality indels previously not found, among which 16,141 have not been annotated in the widely used mutation database. Statistical analysis of these indels shows significant enrichment of indels residing in oncogenes and tumour suppressor genes. Functional annotation further reveals that these indels are strongly correlated with pathways of cancer and can have high to moderate impact on protein functions. Additionally, some of the indels overlap with the genes that do not have any indel mutations called from the originally mapped reads but have been shown to contribute to the tumorigenesis in multiple carcinomas, further emphasizing the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies.en
dc.description.notesThis work is partially supported by Virginia Tech's Open Access Subvention Fund. Authors acknowledge The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov) as the primary source of data. Authors thank Gustavo Arango and Saima Tithi from ZhangLab at Virginia Tech for helpful discussions and feedback.en
dc.description.sponsorshipVirginia Tech's Open Access Subvention Funden
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.1038/s41598-019-47405-zen
dc.identifier.eissn2045-2322en
dc.identifier.other11093en
dc.identifier.pmid31366961en
dc.identifier.urihttp://hdl.handle.net/10919/95813en
dc.identifier.volume9en
dc.language.isoenen
dc.publisherSpringer Natureen
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.titleUncovering missed indels by leveraging unmapped readsen
dc.title.serialScientific Reportsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
dc.type.dcmitypeStillImageen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
s41598-019-47405-z.pdf
Size:
2.36 MB
Format:
Adobe Portable Document Format
Description: