Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Deep Learning for Taxonomy Prediction

    Thumbnail
    View/Open
    Ramesh_S_T_2019.pdf (3.373Mb)
    Downloads: 1059
    Date
    2019-06-04
    Author
    Ramesh, Shreyas
    Metadata
    Show full item record
    Abstract
    The last decade has seen great advances in Next-Generation Sequencing technologies, and, as a result, there has been a rise in the number of genomes sequenced each year. In 2017, there were as many as 10,000 new organisms sequenced and added into the RefSeq Database. Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction. In the Plinko strategy, each network takes advantage of different word usage patterns corresponding to different levels of evolutionary divergence. Plinko has the advantages of relatively low storage, GPGPU parallel training and inference, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional,alignment-free approach to taxonomy prediction.
    General Audience Abstract
    Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. Given species diversity on Earth, taxonomy prediction gets challenging with (i) increasing number of species (labels) to classify and (ii) decreasing input (DNA) size. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Three major challenges in taxonomy prediction are (i) large dataset sizes (order of 109 sequences) (ii) large label spaces (order of 103 labels) and (iii) low resolution inputs (100 base pairs or less). Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction for hard to classify sequences under the three conditions stated above. Plinko has the advantage of relatively low storage footprint, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional, alignment-free approach to taxonomy prediction.
    URI
    http://hdl.handle.net/10919/89752
    Collections
    • Masters Theses [22189]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us