Show simple item record

dc.contributor.authorLi, Zhiyien_US
dc.contributor.authorWu, Xiaoweien_US
dc.contributor.authorHe, Binen_US
dc.contributor.authorZhang, Liqingen_US
dc.identifier.citationBMC Bioinformatics. 2014 Nov 19;15(1):359
dc.description.abstractBackground With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors. Results In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants. Conclusions Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server
dc.format.extent? - ? (10) page(s)en_US
dc.publisherBiomed Central Ltden_US
dc.rightsCreative Commons Attribution 4.0 International (CC BY 4.0)*
dc.subjectBiochemical Research Methodsen_US
dc.subjectBiotechnology & Applied Microbiologyen_US
dc.subjectMathematical & Computational Biologyen_US
dc.subjectBiochemistry & Molecular Biologyen_US
dc.subjectIndel redundancyen_US
dc.subjectGap openingen_US
dc.subjectGap extensionen_US
dc.subjectEND SHORT READSen_US
dc.titleVindel: a simple pipeline for checking indel redundancyen_US
dc.typeArticle - Refereeden_US
dc.description.versionPublished (Publication status)en_US
dc.rights.holderZhiyi Li et al.; licensee BioMed Central Ltd.
dc.title.serialBMC BIOINFORMATICSen_US
pubs.organisational-group/Virginia Tech
pubs.organisational-group/Virginia Tech/All T&R Faculty
pubs.organisational-group/Virginia Tech/Engineering
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Faculty
pubs.organisational-group/Virginia Tech/Engineering/Computer Science
pubs.organisational-group/Virginia Tech/Science
pubs.organisational-group/Virginia Tech/Science/COS T&R Faculty
pubs.organisational-group/Virginia Tech/Science/Statistics

Files in this item


This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution 4.0 International (CC BY 4.0)
License: Creative Commons Attribution 4.0 International (CC BY 4.0)