Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Computer Science Technical Reports
    • View Item
    •   VTechWorks Home
    • College of Engineering (COE)
    • Department of Computer Science
    • Computer Science Technical Reports
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    CAN-zip – Centroid Based Delta Compression of Next Generation Sequencing Data

    Thumbnail
    View/Open
    CANzip.pdf (191.9Kb)
    Downloads: 339
    TR number
    TR-15-05
    Date
    2015-11-09
    Author
    Steere, Edward
    An, Lin
    Zhang, Liqing
    Metadata
    Show full item record
    Abstract
    We present CANzip, a novel algorithm for compressing short read DNA sequencing data in FastQ format. CANzip is based on delta compression, a process in which only the differences of a specific data stream relative to a given reference stream are stored. However CANzip uniquely assumes no given reference stream. Instead it creates artificial references for different clusters of reads, by constructing an artificial representative sequence for each given cluster. Each cluster sequence is then recoded to indicate only how it differs relative to this artificially created reference sequence. Remodeling the data in this way greatly improves the compression ratio achieved when used in conjunction with commodity tools such as bzip2. Our results indicate that CANzip outperforms gzip on average and that it can outperform bzip2.
    URI
    http://hdl.handle.net/10919/63992
    Collections
    • Computer Science Technical Reports [1036]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us