Insight Driven Sampling for Interactive Data Intensive Computing

dc.contributor.authorMasiane, Moeti Moeklesiaen
dc.contributor.committeechairNorth, Christopher L.en
dc.contributor.committeechairJacques, Eric Jean-Yvesen
dc.contributor.committeememberFeng, Wu-chunen
dc.contributor.committeememberSu, Simonen
dc.contributor.committeememberLuther, Kurten
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2021-12-17T07:00:06Zen
dc.date.available2021-12-17T07:00:06Zen
dc.date.issued2020-06-24en
dc.description.abstractData Visualization is used to help humans perceive high dimensional data, but it is unable to be applied in real time to data intensive computing applications. Attempts to process and apply traditional information visualization techniques to such applications result in slow or non-responsive applications. For such applications, sampling is often used to reduce big data to smaller data so that the benefits of data visualization can be brought to data intensive applications. Sampling allows data visualization to be used as an interface between humans and insights contained in the big data of data intensive computing. However, sampling introduces error. The objective of sampling is to reduce the amount of data being processed without introducing too much error into the results of the data intensive application. To determine an adequate level of sampling one can use statistical measures like standard error. However, such measures do not translate well for cases involving data visualization. Knowing the standard error of a sample can tell you very little about the visualization of that data. What is needed is a measure that allows system users to make an informed decision on the level of sampling needed to speed up a data intensive application. In this work we introduce an insight based measure for the impact of sampling on the results of visualized data. We develop a framework for the quantification of the level of insight, model the relationship between the level of insight and the amount of sampling, use this model to provide data intensive computing users with the ability to control the amount of sampling as a function of user provided insight requirements, and we develop a prototype that utilizes our framework. This work allows users to speed up data intensive applications with a clear understanding of how the speedup will impact the insights gained from the visualization of this data. Starting with a simple one dimensional data intensive application we apply our framework and work our way to a more complicated computational fluid dynamics case as a proof concept of the application of our framework and insight error feedback measure for those using sampling to speedup data intensive computing.en
dc.description.abstractgeneralData Visualization is used to help humans perceive high dimensional data, but it is unable to be applied in real time to computing applications that generate or process vast amounts of data, also known as data intensive computing applications. Attempts to process and apply traditional information visualization techniques to such data result in slow or non-responsive data intensive applications. For such applications, sampling is often used to reduce big data to smaller data so that the benefits of data visualization can be brought to data intensive applications. Sampling allows data visualization to be used as an interface between humans and insights contained in the big data of data intensive computing. However, sampling introduces error. The objective of sampling is to reduce the amount of data being processed without introducing too much error into the results of the data intensive application. This error results from the possibility that a data sample could exclude valuable information that was included in the original data set. To determine an adequate level of sampling one can use statistical measures like standard error. However, such measures do not translate well for cases involving data visualization. Knowing the standard error of a sample can tell you very little about the visualization of that data. What is needed is a measure that allows one to make an informed decision of how much sampling to use in a data intensive application, as a result of knowing how sampling impacts how people gain insights from a visualization of the sampled data. In this work we introduce an insight based measure for the impact of sampling on the results of visualized data. We develop a framework for the quantification of the level of insight, model the relationship between the level of insight and the amount of sampling, use this model to provide data intensive computing users with an insight based feedback measure for each arbitrary sample size they choose for speeding up data intensive computing, and we develop a prototype that utilizes our framework. Our prototype applies our framework and insight based feedback measure to a computational fluid dynamics (CFD) case, but our work starts off with a simple one dimensional data application and works its way up to the more complicated CFD case. This work allows users to speed up data intensive applications with a clear understanding of how the speedup will impact the insights gained from the visualization of this data.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:26819en
dc.identifier.urihttp://hdl.handle.net/10919/107087en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectVisualizationen
dc.subjectInsighten
dc.subjectSamplingen
dc.subjectPerception Modellingen
dc.subjectSimulationen
dc.subjectComputational fluid dynamicsen
dc.titleInsight Driven Sampling for Interactive Data Intensive Computingen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 3 of 3
Loading...
Thumbnail Image
Name:
Masiane_MM_D_2020.pdf
Size:
5.38 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Masiane_MM_D_2020_support_3.pdf
Size:
97.85 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents
Loading...
Thumbnail Image
Name:
Masiane_MM_D_2020_support_1.pdf
Size:
264.58 KB
Format:
Adobe Portable Document Format
Description:
Supporting documents