Insight Driven Sampling for Interactive Data Intensive Computing

Masiane, Moeti Moeklesia

Insight Driven Sampling for Interactive Data Intensive Computing

dc.contributor.author	Masiane, Moeti Moeklesia	en
dc.contributor.committeechair	North, Christopher L.	en
dc.contributor.committeechair	Jacques, Eric Jean-Yves	en
dc.contributor.committeemember	Feng, Wu-chun	en
dc.contributor.committeemember	Su, Simon	en
dc.contributor.committeemember	Luther, Kurt	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2021-12-17T07:00:06Z	en
dc.date.available	2021-12-17T07:00:06Z	en
dc.date.issued	2020-06-24	en
dc.description.abstract	Data Visualization is used to help humans perceive high dimensional data, but it is unable to be applied in real time to data intensive computing applications. Attempts to process and apply traditional information visualization techniques to such applications result in slow or non-responsive applications. For such applications, sampling is often used to reduce big data to smaller data so that the benefits of data visualization can be brought to data intensive applications. Sampling allows data visualization to be used as an interface between humans and insights contained in the big data of data intensive computing. However, sampling introduces error. The objective of sampling is to reduce the amount of data being processed without introducing too much error into the results of the data intensive application. To determine an adequate level of sampling one can use statistical measures like standard error. However, such measures do not translate well for cases involving data visualization. Knowing the standard error of a sample can tell you very little about the visualization of that data. What is needed is a measure that allows system users to make an informed decision on the level of sampling needed to speed up a data intensive application. In this work we introduce an insight based measure for the impact of sampling on the results of visualized data. We develop a framework for the quantification of the level of insight, model the relationship between the level of insight and the amount of sampling, use this model to provide data intensive computing users with the ability to control the amount of sampling as a function of user provided insight requirements, and we develop a prototype that utilizes our framework. This work allows users to speed up data intensive applications with a clear understanding of how the speedup will impact the insights gained from the visualization of this data. Starting with a simple one dimensional data intensive application we apply our framework and work our way to a more complicated computational fluid dynamics case as a proof concept of the application of our framework and insight error feedback measure for those using sampling to speedup data intensive computing.	en
dc.description.abstractgeneral	Data Visualization is used to help humans perceive high dimensional data, but it is unable to be applied in real time to computing applications that generate or process vast amounts of data, also known as data intensive computing applications. Attempts to process and apply traditional information visualization techniques to such data result in slow or non-responsive data intensive applications. For such applications, sampling is often used to reduce big data to smaller data so that the benefits of data visualization can be brought to data intensive applications. Sampling allows data visualization to be used as an interface between humans and insights contained in the big data of data intensive computing. However, sampling introduces error. The objective of sampling is to reduce the amount of data being processed without introducing too much error into the results of the data intensive application. This error results from the possibility that a data sample could exclude valuable information that was included in the original data set. To determine an adequate level of sampling one can use statistical measures like standard error. However, such measures do not translate well for cases involving data visualization. Knowing the standard error of a sample can tell you very little about the visualization of that data. What is needed is a measure that allows one to make an informed decision of how much sampling to use in a data intensive application, as a result of knowing how sampling impacts how people gain insights from a visualization of the sampled data. In this work we introduce an insight based measure for the impact of sampling on the results of visualized data. We develop a framework for the quantification of the level of insight, model the relationship between the level of insight and the amount of sampling, use this model to provide data intensive computing users with an insight based feedback measure for each arbitrary sample size they choose for speeding up data intensive computing, and we develop a prototype that utilizes our framework. Our prototype applies our framework and insight based feedback measure to a computational fluid dynamics (CFD) case, but our work starts off with a simple one dimensional data application and works its way up to the more complicated CFD case. This work allows users to speed up data intensive applications with a clear understanding of how the speedup will impact the insights gained from the visualization of this data.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:26819	en
dc.identifier.uri	http://hdl.handle.net/10919/107087	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Visualization	en
dc.subject	Insight	en
dc.subject	Sampling	en
dc.subject	Perception Modelling	en
dc.subject	Simulation	en
dc.subject	Computational fluid dynamics	en
dc.title	Insight Driven Sampling for Interactive Data Intensive Computing	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science and Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en