Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer

Blake, Patrick Michael

Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer

dc.contributor.author	Blake, Patrick Michael	en
dc.contributor.committeechair	Wang, Yue J.	en
dc.contributor.committeemember	Xuan, Jianhua	en
dc.contributor.committeemember	Yu, Guoqiang	en
dc.contributor.department	Electrical Engineering	en
dc.date.accessioned	2019-02-01T09:00:57Z	en
dc.date.available	2019-02-01T09:00:57Z	en
dc.date.issued	2019-01-31	en
dc.description.abstract	Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts' understanding of the relationships within complex data sets and their ability to make informed decisions from such data.	en
dc.description.abstractgeneral	Many data sets have too many features for conventional pattern recognition techniques to work properly. This thesis investigates techniques that alleviate these difficulties. One such technique, biclustering, clusters data in both dimensions and is inherently resistant to the challenges posed by having too many features. However, the algorithms that implement biclustering have limitations in that the user must know at least the structure of the data and how many biclusters to expect. This is where the VIsual Statistical Data Analyzer, or VISDA, can help. It is a visualization tool that successively and progressively explores the structure of the data, identifying clusters along the way. This thesis proposes coupling VISDA with biclustering to overcome some of the challenges of data sets with too many features. Further, to increase the performance, usability, and maintainability as well as reduce costs, VISDA was translated from Matlab to a Python version called VISDApy. Both VISDApy and the overall process were demonstrated with real and synthetic data sets. The results of this work have the potential to improve analysts’ understanding of the relationships within complex data sets and their ability to make informed decisions from such data.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:18613	en
dc.identifier.uri	http://hdl.handle.net/10919/87392	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	high-dimensional data	en
dc.subject	biclustering	en
dc.subject	VISDA	en
dc.subject	VISDApy	en
dc.title	Biclustering and Visualization of High Dimensional Data using VIsual Statistical Data Analyzer	en
dc.type	Thesis	en
thesis.degree.discipline	Electrical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Blake_PM_T_2019.pdf
Size:: 2.59 MB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses