Deep Learning for Taxonomy Prediction
dc.contributor.author | Ramesh, Shreyas | en |
dc.contributor.committeechair | Marathe, Madhav V. | en |
dc.contributor.committeechair | Warren, Andrew S. | en |
dc.contributor.committeemember | Vullikanti, Anil Kumar S. | en |
dc.contributor.department | Computer Science | en |
dc.date.accessioned | 2019-06-05T08:00:18Z | en |
dc.date.available | 2019-06-05T08:00:18Z | en |
dc.date.issued | 2019-06-04 | en |
dc.description.abstract | The last decade has seen great advances in Next-Generation Sequencing technologies, and, as a result, there has been a rise in the number of genomes sequenced each year. In 2017, there were as many as 10,000 new organisms sequenced and added into the RefSeq Database. Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction. In the Plinko strategy, each network takes advantage of different word usage patterns corresponding to different levels of evolutionary divergence. Plinko has the advantages of relatively low storage, GPGPU parallel training and inference, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional,alignment-free approach to taxonomy prediction. | en |
dc.description.abstractgeneral | Taxonomy prediction is a science involving the hierarchical classification of DNA fragments up to the rank species. Given species diversity on Earth, taxonomy prediction gets challenging with (i) increasing number of species (labels) to classify and (ii) decreasing input (DNA) size. In this research, we introduce Predicting Linked Organisms, Plinko, for short. Plinko is a fully-functioning, state-of-the-art predictive system that accurately captures DNA - Taxonomy relationships where other state-of-the-art algorithms falter. Three major challenges in taxonomy prediction are (i) large dataset sizes (order of 109 sequences) (ii) large label spaces (order of 103 labels) and (iii) low resolution inputs (100 base pairs or less). Plinko leverages multi-view convolutional neural networks and the pre-defined taxonomy tree structure to improve multi-level taxonomy prediction for hard to classify sequences under the three conditions stated above. Plinko has the advantage of relatively low storage footprint, making the solution portable, and scalable with anticipated genome database growth. To the best of our knowledge, Plinko is the first to use multi-view convolutional neural networks as the core algorithm in a compositional, alignment-free approach to taxonomy prediction. | en |
dc.description.degree | Master of Science | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:20261 | en |
dc.identifier.uri | http://hdl.handle.net/10919/89752 | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | taxonomy prediction | en |
dc.subject | convolutional neural networks | en |
dc.subject | hierarchical prediction | en |
dc.subject | cnn | en |
dc.subject | taxonomic binning | en |
dc.title | Deep Learning for Taxonomy Prediction | en |
dc.type | Thesis | en |
thesis.degree.discipline | Computer Science and Applications | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | masters | en |
thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1