Show simple item record

dc.contributor.authorHasan, Mohammad Shabbir
dc.contributor.authorWu, Xiaowei
dc.contributor.authorWatson, Layne T.
dc.contributor.authorLi, Zhiyi
dc.contributor.authorZhang, Liqing
dc.date.accessioned2017-12-06T19:47:18Z
dc.date.available2017-12-06T19:47:18Z
dc.date.issued2017-10-26
dc.identifier.urihttp://hdl.handle.net/10919/81068
dc.description.abstractStoring biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.en_US
dc.format.mimetypeapplication/pdfen_US
dc.language.isoen_USen_US
dc.publisherNatureen_US
dc.titleUPS-indel: a Universal Positioning System for Indelsen_US
dc.typeArticle - Refereeden_US
dc.title.serialScientific Reportsen_US
dc.identifier.doihttps://doi.org/10.1038/s41598-017-14400-1
dc.identifier.volume7en_US
dc.type.dcmitypeTexten_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record