Vindel: a simple pipeline for checking indel redundancy

dc.contributor.authorLi, Zhiyien
dc.contributor.authorWu, Xiaoweien
dc.contributor.authorHe, Binen
dc.contributor.authorZhang, Liqingen
dc.contributor.departmentComputer Scienceen
dc.contributor.departmentStatisticsen
dc.date.accessioned2017-02-03T00:02:39Zen
dc.date.available2017-02-03T00:02:39Zen
dc.date.issued2014-11-19en
dc.description.abstractBackground With the advance of next generation sequencing (NGS) technologies, a large number of insertion and deletion (indel) variants have been identified in human populations. Despite much research into variant calling, it has been found that a non-negligible proportion of the identified indel variants might be false positives due to sequencing errors, artifacts caused by ambiguous alignments, and annotation errors. Results In this paper, we examine indel redundancy in dbSNP, one of the central databases for indel variants, and develop a standalone computational pipeline, dubbed Vindel, to detect redundant indels. The pipeline first applies indel position information to form candidate redundant groups, then performs indel mutations to the reference genome to generate corresponding indel variant substrings. Finally the indel variant substrings in the same candidate redundant groups are compared in a pairwise fashion to identify redundant indels. We applied our pipeline to check for redundancy in the human indels in dbSNP. Our pipeline identified approximately 8% redundancy in insertion type indels, 12% in deletion type indels, and overall 10% for insertions and deletions combined. These numbers are largely consistent across all human autosomes. We also investigated indel size distribution and adjacent indel distance distribution for a better understanding of the mechanisms generating indel variants. Conclusions Vindel, a simple yet effective computational pipeline, can be used to check whether a set of indels are redundant with respect to those already in the database of interest such as NCBI’s dbSNP. Of the approximately 5.9 million indels we examined, nearly 0.6 million are redundant, revealing a serious limitation in the current indel annotation. Statistics results prove the consistency of the pipeline on indel redundancy detection for all 22 chromosomes. Apart from the standalone Vindel pipeline, the indel redundancy check algorithm is also implemented in the web server http://bioinformatics.cs.vt.edu/zhanglab/indelRedundant.php.en
dc.description.versionPublished versionen
dc.format.extent? - ? (10) page(s)en
dc.format.mimetypeapplication/pdfen
dc.identifier.citationBMC Bioinformatics. 2014 Nov 19;15(1):359en
dc.identifier.doihttps://doi.org/10.1186/s12859-014-0359-1en
dc.identifier.issn1471-2105en
dc.identifier.urihttp://hdl.handle.net/10919/74912en
dc.identifier.volume15en
dc.language.isoenen
dc.publisherBiomed Centralen
dc.relation.urihttp://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000347429700001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=930d57c9ac61a043676db62af60056c1en
dc.rightsCreative Commons Attribution 4.0 Internationalen
dc.rights.holderZhiyi Li et al.; licensee BioMed Central Ltd.en
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectBiochemical Research Methodsen
dc.subjectBiotechnology & Applied Microbiologyen
dc.subjectMathematical & Computational Biologyen
dc.subjectBiochemistry & Molecular Biologyen
dc.subjectIndel redundancyen
dc.subjectGap openingen
dc.subjectGap extensionen
dc.subjectEND SHORT READSen
dc.subjectDELETIONSen
dc.subjectINTERFERENCEen
dc.subjectBREAKPOINTSen
dc.subjectINSERTIONSen
dc.subjectALIGNMENTen
dc.subjectVARIANTSen
dc.titleVindel: a simple pipeline for checking indel redundancyen
dc.title.serialBMC Bioinformaticsen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten
pubs.organisational-group/Virginia Techen
pubs.organisational-group/Virginia Tech/All T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineeringen
pubs.organisational-group/Virginia Tech/Engineering/COE T&R Facultyen
pubs.organisational-group/Virginia Tech/Engineering/Computer Scienceen
pubs.organisational-group/Virginia Tech/Scienceen
pubs.organisational-group/Virginia Tech/Science/COS T&R Facultyen
pubs.organisational-group/Virginia Tech/Science/Statisticsen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Vindel: a simple pipeline for checking indel redundancy.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format
Description:
Publisher's Version
Name:
s12859-014-0359-1-S1.docx
Size:
1012.31 KB
Format:
Microsoft Word XML
Description:
License bundle
Now showing 1 - 1 of 1
Name:
VTUL_Distribution_License_2016_05_09.pdf
Size:
18.09 KB
Format:
Adobe Portable Document Format
Description: