Insight Driven Sampling for Interactive Data Intensive Computing

Masiane, Moeti Moeklesia2021-12-172021-12-172020-06-24vt_gsexam:26819http://hdl.handle.net/10919/107087Data Visualization is used to help humans perceive high dimensional data, but it is unable to be applied in real time to data intensive computing applications. Attempts to process and apply traditional information visualization techniques to such applications result in slow or non-responsive applications. For such applications, sampling is often used to reduce big data to smaller data so that the benefits of data visualization can be brought to data intensive applications. Sampling allows data visualization to be used as an interface between humans and insights contained in the big data of data intensive computing. However, sampling introduces error. The objective of sampling is to reduce the amount of data being processed without introducing too much error into the results of the data intensive application. To determine an adequate level of sampling one can use statistical measures like standard error. However, such measures do not translate well for cases involving data visualization. Knowing the standard error of a sample can tell you very little about the visualization of that data. What is needed is a measure that allows system users to make an informed decision on the level of sampling needed to speed up a data intensive application. In this work we introduce an insight based measure for the impact of sampling on the results of visualized data. We develop a framework for the quantification of the level of insight, model the relationship between the level of insight and the amount of sampling, use this model to provide data intensive computing users with the ability to control the amount of sampling as a function of user provided insight requirements, and we develop a prototype that utilizes our framework. This work allows users to speed up data intensive applications with a clear understanding of how the speedup will impact the insights gained from the visualization of this data. Starting with a simple one dimensional data intensive application we apply our framework and work our way to a more complicated computational fluid dynamics case as a proof concept of the application of our framework and insight error feedback measure for those using sampling to speedup data intensive computing.ETDIn CopyrightVisualizationInsightSamplingPerception ModellingSimulationComputational fluid dynamicsInsight Driven Sampling for Interactive Data Intensive ComputingDissertation