Show simple item record

dc.contributor.authorRobertson, Jeffrey Alanen_US
dc.date.accessioned2018-08-02T08:00:32Z
dc.date.available2018-08-02T08:00:32Z
dc.date.issued2018-08-01en_US
dc.identifier.othervt_gsexam:16720en_US
dc.identifier.urihttp://hdl.handle.net/10919/84470
dc.description.abstractAs improving technology is making it easier to select or engineer DNA sequences that produce dangerous proteins, it is important to be able to predict whether a novel DNA sequence is potentially dangerous by determining its taxonomic identity and functional characteristics. These tasks can be facilitated by the ever increasing amounts of available biological data. Unfortunately, though, these growing databases can be difficult to take full advantage of due to the corresponding increase in computational and storage costs. Entropy scaling algorithms and data structures present an approach that can expedite this type of analysis by scaling with the amount of entropy contained in the database instead of scaling with the size of the database. Because sets of DNA and protein sequences are biologically meaningful instead of being random, they demonstrate some amount of structure instead of being purely random. As biological databases grow, taking advantage of this structure can be extremely beneficial. The entropy scaling sequence similarity search algorithm introduced here demonstrates this by accelerating the biological sequence search tools BLAST and DIAMOND. Tests of the implementation of this algorithm shows that while this approach can lead to improved query times, constructing the required entropy scaling indices is difficult and expensive. To improve performance and remove this bottleneck, I investigate several ideas for accelerating building indices that support entropy scaling searches. The results of these tests identify key tradeoffs and demonstrate that there is potential in using these techniques for sequence similarity searches.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectBioinformaticsen_US
dc.subjectEntropy Scalingen_US
dc.subjectSequence Searchen_US
dc.subjectBLASTen_US
dc.titleEntropy Measurements and Ball Cover Construction for Biological Sequencesen_US
dc.typeThesisen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreeMaster of Scienceen_US
thesis.degree.nameMaster of Scienceen_US
thesis.degree.levelmastersen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairHeath, Lenwood S.en_US
dc.contributor.committeememberMarathe, Madhav Vishnuen_US
dc.contributor.committeememberEubank, Stephen G.en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record