Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer

dc.contributor.authorBlake, Patrick Michaelen
dc.contributor.committeechairWang, Yue J.en
dc.contributor.committeememberXuan, Jianhuaen
dc.contributor.committeememberYu, Guoqiangen
dc.contributor.departmentElectrical Engineeringen
dc.date.accessioned2019-02-01T09:00:57Zen
dc.date.available2019-02-01T09:00:57Zen
dc.date.issued2019-01-31en
dc.description.abstractMany data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts' understanding of the relationships within complex data sets and their ability to make informed decisions from such data.en
dc.description.abstractgeneralMany data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts’ understanding of the relationships within complex data sets and their ability to make informed decisions from such data.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:18613en
dc.identifier.urihttp://hdl.handle.net/10919/87392en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjecthigh-dimensional dataen
dc.subjectbiclusteringen
dc.subjectVISDAen
dc.subjectVISDApyen
dc.titleBiclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzeren
dc.typeThesisen
thesis.degree.disciplineElectrical Engineeringen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Blake_PM_T_2019.pdf
Size:
2.59 MB
Format:
Adobe Portable Document Format

Collections