DBS: a fast and informative segmentation algorithm for DNA copy number analysis
MetadataShow full item record
Background Genome-wide DNA copy number changes are the hallmark events in the initiation and progression of cancers. Quantitative analysis of somatic copy number alterations (CNAs) has broad applications in cancer research. With the increasing capacity of high-throughput sequencing technologies, fast and efficient segmentation algorithms are required when characterizing high density CNAs data. Results A fast and informative segmentation algorithm, DBS (Deviation Binary Segmentation), is developed and discussed. The DBS method is based on the least absolute error principles and is inspired by the segmentation method rooted in the circular binary segmentation procedure. DBS uses point-by-point model calculation to ensure the accuracy of segmentation and combines a binary search algorithm with heuristics derived from the Central Limit Theorem. The DBS algorithm is very efficient requiring a computational complexity of O(n*log n), and is faster than its predecessors. Moreover, DBS measures the change-point amplitude of mean values of two adjacent segments at a breakpoint, where the significant degree of change-point amplitude is determined by the weighted average deviation at breakpoints. Accordingly, using the constructed binary tree of significant degree, DBS informs whether the results of segmentation are over- or under-segmented. Conclusion DBS is implemented in a platform-independent and open-source Java application (ToolSeg), including a graphical user interface and simulation data generation, as well as various segmentation methods in the native Java language.