Browsing by Author "North, Christopher L."
Now showing 1 - 20 of 189
Results Per Page
Sort Options
- 2D Jupyter: Design and Evaluation of 2D Computational NotebooksChristman, Elizabeth (Virginia Tech, 2023-06-12)Computational notebooks are a popular tool for data analysis. However, the 1D linear structure used by many computational notebooks can lead to challenges and pain points in data analysis, including messiness, tedious navigation, inefficient use of screen space, and presentation of non-linear narratives. To address these problems, we designed a prototype Jupyter Notebooks extension called 2D Jupyter that enables a 2D organization of code cells in a multi-column layout, as well as freeform cell placement. We conducted a user study using this extension to evaluate the usability of 2D computational notebooks and understand the advantages and disadvantages that it provides over a 1D layout. As a result of this study, we found evidence that the 2D layout provides enhanced usability and efficiency in computational notebooks. Additionally, we gathered feedback on the design of the prototype that can be used to inform future work. Overall, 2D Jupyter was positively received and users not only enjoyed using the extension, but also expressed a desire to use 2D notebook environments in the future.
- 5SGraph: A Modeling Tool for Digital LibrariesZhu, Qinwei (Virginia Tech, 2002-11-18)The high demand for building digital libraries by non-experts requires a simplified modeling process and rapid generation of digital libraries. To enable rapid generation, digital libraries should be modeled with descriptive languages. A visual modeling tool would be helpful to non-experts so they may model a digital library without knowing the theoretical foundations and the syntactical details of the descriptive language. In this thesis, we describe the design and implementation of a domain-specific visual modeling tool, 5SGraph, aimed at modeling digital libraries. 5SGraph is based on a metamodel that describes digital libraries using the 5S theory. The output from 5SGraph is a digital library model that is an instance of the metamodel, expressed in the 5S description language (5SL). 5SGraph presents the metamodel in a structured toolbox, and provides a top-down visual building environment for designers. The visual proximity of the metamodel and instance model facilitates requirements gathering and simplifies the modeling process. Furthermore, 5SGraph maintains semantic constraints specified by the 5S metamodel and enforces these constraints over the instance model to ensure semantic consistency and correctness. 5SGraph enables component reuse to reduce the time and efforts of designers. The results from a pilot usability test confirm the usefulness of 5SGraph.
- Advances in aircraft design: multiobjective optimization and a markup languageDeshpande, Shubhangi Govind (Virginia Tech, 2014-01-23)Today's modern aerospace systems exhibit strong interdisciplinary coupling and require a multidisciplinary, collaborative approach. Analysis methods that were once considered feasible only for advanced and detailed design are now available and even practical at the conceptual design stage. This changing philosophy for conducting conceptual design poses additional challenges beyond those encountered in a low fidelity design of aircraft. This thesis takes some steps towards bridging the gaps in existing technologies and advancing the state-of-the-art in aircraft design. The first part of the thesis proposes a new Pareto front approximation method for multiobjective optimization problems. The method employs a hybrid optimization approach using two derivative free direct search techniques, and is intended for solving blackbox simulation based multiobjective optimization problems with possibly nonsmooth functions where the analytical form of the objectives is not known and/or the evaluation of the objective function(s) is very expensive (very common in multidisciplinary design optimization). A new adaptive weighting scheme is proposed to convert a multiobjective optimization problem to a single objective optimization problem. Results show that the method achieves an arbitrarily close approximation to the Pareto front with a good collection of well-distributed nondominated points. The second part deals with the interdisciplinary data communication issues involved in a collaborative mutidisciplinary aircraft design environment. Efficient transfer, sharing, and manipulation of design and analysis data in a collaborative environment demands a formal structured representation of data. XML, a W3C recommendation, is one such standard concomitant with a number of powerful capabilities that alleviate interoperability issues. A compact, generic, and comprehensive XML schema for an aircraft design markup language (ADML) is proposed here to provide a common language for data communication, and to improve efficiency and productivity within a multidisciplinary, collaborative environment. An important feature of the proposed schema is the very expressive and efficient low level schemata. As a proof of concept the schema is used to encode an entire Convair B58. As the complexity of models and number of disciplines increases, the reduction in effort to exchange data models and analysis results in ADML also increases.
- Algorithms for Modeling Mass Movements and their Adoption in Social NetworksJin, Fang (Virginia Tech, 2016-08-23)Online social networks have become a staging ground for many modern movements, with the Arab Spring being the most prominent example. In an effort to understand and predict those movements, social media can be regarded as a valuable social sensor for disclosing underlying behaviors and patterns. To fully understand mass movement information propagation patterns in social networks, several problems need to be considered and addressed. Specifically, modeling mass movements that incorporate multiple spaces, a dynamic network structure, and misinformation propagation, can be exceptionally useful in understanding information propagation in social media. This dissertation explores four research problems underlying efforts to identify and track the adoption of mass movements in social media. First, how do mass movements become mobilized on Twitter, especially in a specific geographic area? Second, can we detect protest activity in social networks by observing group anomalies in graph? Third, how can we distinguish real movements from rumors or misinformation campaigns? and fourth, how can we infer the indicators of a specific type of protest, say climate related protest? A fundamental objective of this research has been to conduct a comprehensive study of how mass movement adoption functions in social networks. For example, it may cross multiple spaces, evolve with dynamic network structures, or consist of swift outbreaks or long term slowly evolving transmissions. In many cases, it may also be mixed with misinformation campaigns, either deliberate or in the form of rumors. Each of those issues requires the development of new mathematical models and algorithmic approaches such as those explored here. This work aims to facilitate advances in information propagation, group anomaly detection and misinformation distinction and, ultimately, help improve our understanding of mass movements and their adoption in social networks.
- Analyzing and Navigating Electronic Theses and DissertationsAhuja, Aman (Virginia Tech, 2023-07-21)Electronic Theses and Dissertations (ETDs) contain valuable scholarly information that can be of immense value to the scholarly community. Millions of ETDs are now publicly available online, often through one of many digital libraries. However, since a majority of these digital libraries are institutional repositories with the objective being content archiving, they often lack end-user services needed to make this valuable data useful for the scholarly community. To effectively utilize such data to address the information needs of users, digital libraries should support various end-user services such as document search and browsing, document recommendation, as well as services to make navigation of long PDF documents easier. In recent years, with advances in the field of machine learning for text data, several techniques have been proposed to support such end-user services. However, limited research has been conducted towards integrating such techniques with digital libraries. This research is aimed at building tools and techniques for discovering and accessing the knowledge buried in ETDs, as well as to support end-user services for digital libraries, such as document browsing and long document navigation. First, we review several machine learning models that can be used to support such services. Next, to support a comprehensive evaluation of different models, as well as to train models that are tailored to the ETD data, we introduce several new datasets from the ETD domain. To minimize the resources required to develop high quality training datasets required for supervised training, a novel AI-aided annotation method is also discussed. Finally, we propose techniques and frameworks to support the various digital library services such as search, browsing, and recommendation. The key contributions of this research are as follows: - A system to help with parsing long scholarly documents such as ETDs by means of object-detection methods trained to extract digital objects from long documents. The parsed documents can be used for further downstream tasks such as long document navigation, figure and/or table search, etc. - Datasets to support supervised training of object detection models on scholarly documents of multiple types, such as born-digital and scanned. In addition to manually annotated datasets, a framework (along with the resulting dataset) for AI-aided annotation also is proposed. - A web-based system for information extraction from long PDF theses and dissertations, into a structured format such as XML, aimed at making scholarly literature more accessible to users with disabilities. - A topic-modeling based framework to support exploration tasks such as searching and/or browsing documents (and document portions, e.g., chapters) by topic, document recommendation, topic recommendation, and describing temporal topic trends.
- Andromeda in Education: Studies on Student Collaboration and Insight Generation with Interactive Dimensionality ReductionTaylor, Mia Rachel (Virginia Tech, 2022-10-04)Andromeda is an interactive visualization tool that projects high-dimensional data into a scatterplot-like visualization using Weighted Multidimensional Scaling (WMDS). The visualization can be explored through surface-level interaction (viewing data values), parametric interaction (altering underlying parameterizations), and observation-level interaction (directly interacting with projected points). This thesis presents analyses on the collaborative utility of Andromeda in a middle school class and the insights college-level students generate when using Andromeda. The first study discusses how a middle school class collaboratively used Andromeda to explore and compare their engineering designs. The students analyzed their designs, represented as high-dimensional data, as a class. This study shows promise for introducing collaborative data analysis to middle school students in conjunction with other technical concepts such as the engineering design process. Participants in the study on college-level students were given a version of Andromeda, with access to different interactions, and were asked to generate insights on a dataset. By applying a novel visualization evaluation methodology on students' natural language insights, the results of this study indicate that students use different vocabulary supported by the interactions available to them, but not equally. The implications, as well as limitations, of these two studies are further discussed.
- Anomalous Information Detection in Social MediaTao, Rongrong (Virginia Tech, 2021-03-10)This dissertation focuses on identifying various types of anomalous information pattern in social media and news outlets. We focus on three types of anomalous information, including (1) media censorship in news outlets, which is information that should be published but is actually missing, (2) fake news in social media, which is unreliable information shown to the public, and (3) media propaganda in news outlets, which is trustworthy information but being over-populated. For the first problem, existing approaches on censorship detection mostly rely on monitoring posts in social media. However, media censorship in news outlets has not received nearly as much attention, mostly because it is difficult to systematically detect. The contributions of our work include: (1) a hypothesis testing framework to identify and evaluate censored clusters of keywords, (2) a near-linear-time algorithm to identify the highest scoring clusters as indicators of censorship, and (3) extensive experiments on six Latin American countries for performance evaluation. For the second problem, existing approaches studying fake news in social media primarily focus on topic-level modeling or prediction based on a set of aggregated features from a col- lection of posts. However, the credibility of various information components within the same topic can be quite different. The contributions of our work in this space include: (1) a new benchmark dataset for fake news research, (2) a cluster-based approach to improve instance- level prediction of information credibility, and (3) extensive experiments for performance evaluations. For the last problem, existing approaches to media propaganda detection primarily focus on investigating the pattern of information shared over social media or evaluation from domain experts. However, these approaches cannot be generalized to a large-scale analysis of media propaganda in news outlets. The contributions of our work include: (1) non- parametric scan statistics to identify clusters of over-populated keywords, (2) a near-linear-time algorithm to identify the highest scoring clusters as indicators of propaganda, and (3) extensive experiments on two Latin American countries for performance evaluation.
- Applying Information Visualization Techniques to Visual DebuggingCostigan, John A. (Virginia Tech, 2003-04-24)In the arena of software development, implementing a software design (no matter how perfect the design) is rarely done right the first time. Consequently, debugging one's own (or someone else's) software is inevitable, and tools that assist in this often-arduous task become very important with respect to reducing the cost of debugging as well as the cost of the software life cycle as a whole. Many tools exist with this aim, but all are lacking in a key area: information visualization. Applying information visualization techniques such as zooming, focus and context, or graphical representation of numeric data may enhance the visual debugging experience. To this end, drawing data structures as graphs is potentially a step in the right direction, but more must be done to maximize the value of time spent debugging and to minimize the actual amount of time spent debugging. This thesis will address some information visualization techniques that may be helpful in debugging (specifically with respect to visual debugging) and will present the results of a small pilot study intended to illustrate the potential value of such techniques.
- Augmented Reality Pedestrian Collision Warning: An Ecological Approach to Driver Interface Design and EvaluationKim, Hyungil (Virginia Tech, 2017-10-17)Augmented reality (AR) has the potential to fundamentally change the way we interact with information. Direct perception of computer generated graphics atop physical reality can afford hands-free access to contextual information on the fly. However, as users must interact with both digital and physical information simultaneously, yesterday's approaches to interface design may not be sufficient to support the new way of interaction. Furthermore, the impacts of this novel technology on user experience and performance are not yet fully understood. Driving is one of many promising tasks that can benefit from AR, where conformal graphics strategically placed in the real-world can accurately guide drivers' attention to critical environmental elements. The ultimate purpose of this study is to reduce pedestrian accidents through design of driver interfaces that take advantage of AR head-up displays (HUD). For this purpose, this work aimed to (1) identify information requirements for pedestrian collision warning, (2) design AR driver interfaces, and (3) quantify effects of AR interfaces on driver performance and experience. Considering the dynamic nature of human-environment interaction in AR-supported driving, we took an ecological approach for interface design and evaluation, appreciating not only the user but also the environment. The requirement analysis examined environmental constraints imposed on the drivers' behavior, interface design translated those behavior-shaping constraints into perceptual forms of interface elements, and usability evaluations utilized naturalistic driving scenarios and tasks for better ecological validity. A novel AR driver interface for pedestrian collision warning, the virtual shadow, was proposed taking advantage of optical see-through HUDs. A series of usability evaluations in both a driving simulator and on an actual roadway showed that virtual shadow interface outperformed current pedestrian collision warning interfaces in guiding driver attention, increasing situation awareness, and improving task performance. Thus, this work has demonstrated the opportunity of incorporating an ecological approach into user interface design and evaluation for AR driving applications. This research provides both basic and practical contributions in human factors and AR by (1) providing empirical evidence furthering knowledge about driver experience and performance in AR, and, (2) extending traditional usability engineering methods for automotive AR interface design and evaluation.
- Augmenting Dynamic Query Expansion in Microblog TextsKhandpur, Rupinder P. (Virginia Tech, 2018-08-17)Dynamic query expansion is a method of automatically identifying terms relevant to a target domain based on an incomplete query input. With the explosive growth of online media, such tools are essential for efficient search result refining to track emerging themes in noisy, unstructured text streams. It's crucial for large-scale predictive analytics and decision-making, systems which use open source indicators to find meaningful information rapidly and accurately. The problems of information overload and semantic mismatch are systemic during the Information Retrieval (IR) tasks undertaken by such systems. In this dissertation, we develop approaches to dynamic query expansion algorithms that can help improve the efficacy of such systems using only a small set of seed queries and requires no training or labeled samples. We primarily investigate four significant problems related to the retrieval and assessment of event-related information, viz. (1) How can we adapt the query expansion process to support rank-based analysis when tracking a fixed set of entities? A scalable framework is essential to allow relative assessment of emerging themes such as airport threats. (2) What visual knowledge discovery framework to adopt that can incorporate users' feedback back into the search result refinement process? A crucial step to efficiently integrate real-time `situational awareness' when monitoring specific themes using open source indicators. (3) How can we contextualize query expansions? We focus on capturing semantic relatedness between a query and reference text so that it can quickly adapt to different target domains. (4) How can we synchronously perform knowledge discovery and characterization (unstructured to structured) during the retrieval process? We mainly aim to model high-order, relational aspects of event-related information from microblog texts.
- AVIST: A GPU-Centric Design for Visual Exploration of Large Multidimensional DatasetsMi, Peng; Sun, Maoyuan; Masiane, Moeti; Cao, Yong; North, Christopher L. (MDPI, 2016-10-07)This paper presents the Animated VISualization Tool (AVIST), an exploration-oriented data visualization tool that enables rapidly exploring and filtering large time series multidimensional datasets. AVIST highlights interactive data exploration by revealing fine data details. This is achieved through the use of animation and cross-filtering interactions. To support interactive exploration of big data, AVIST features a GPU (Graphics Processing Unit)-centric design. Two key aspects are emphasized on the GPU-centric design: (1) both data management and computation are implemented on the GPU to leverage its parallel computing capability and fast memory bandwidth; (2) a GPU-based directed acyclic graph is proposed to characterize data transformations triggered by users’ demands. Moreover, we implement AVIST based on the Model-View-Controller (MVC) architecture. In the implementation, we consider two aspects: (1) user interaction is highlighted to slice big data into small data; and (2) data transformation is based on parallel computing. Two case studies demonstrate how AVIST can help analysts identify abnormal behaviors and infer new hypotheses by exploring big datasets. Finally, we summarize lessons learned about GPU-based solutions in interactive information visualization with big data.
- BABES: Brushing+Linking, Attributes, and Blobs Extension to StoryboardJudge, Tejinder K.; Kopper, Regis; Ponce, Sean; Silva, Mara G.; North, Christopher L. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008)In this day and age, people not only deal with data but deal with vast amounts of data which needs to be sorted and made sense of. A subset of these people are intelligence analysts who sort through an enormous amount of data that need to be organized to uncover plots and subplots. We are proposing a tool called BABES (Brushing+Linking, Attributes, and Blobs Extension to Storyboard) that will enable the intelligence analyst to sort through data efficiently, uncover plots and subplots using the brushing and linking and attributes features and work with multiple subplots at the same time using the concept of ’blobs’.
- Battery-Sensing Intrusion Protection System (B-SIPS)Buennemeyer, Timothy Keith (Virginia Tech, 2008-12-05)This dissertation investigates using instantaneous battery current sensing techniques as a means of detecting IEEE 802.15.1 Bluetooth and 802.11b (Wi-Fi) attacks and anomalous activity on small mobile wireless devices. This research explores alternative intrusion detection methods in an effort to better understand computer networking threats. This research applies to Personal Digital Assistants (PDAs) and smart phones, operating with sensing software in wireless network environments to relay diagnostic battery readings and threshold breaches to indicate possible battery exhaustion attack, intrusion, virus, and worm activity detections. The system relies on host-based software to collect smart battery data to sense instantaneous current characteristics of anomalous network activity directed against small mobile devices. This effort sought to develop a methodology, design and build a net-centric system, and then further explore this non-traditional intrusion detection system (IDS) approach. This research implements the Battery-Sensing Intrusion Protection System (B-SIPS) client detection capabilities for small mobile devices, a server-based Correlation Intrusion Detection Engine (CIDE) for attack correlation with Snort's network-based IDS, device power profiling, graph views, security administrator alert notification, and a database for robust data storage. Additionally, the server-based CIDE provides the interface and filtering tools for a security administrator to further mine our database and conduct forensic analysis. A separate system was developed using a digital oscilloscope to observe Bluetooth, Wi-Fi, and blended attack traces and to create unique signatures. The research endeavor makes five significant contributions to the security field of intrusion detection. First, this B-SIPS work creates an effective intrusion detection approach that can operate on small, mobile host devices in networking environments to sense anomalous patterns in instantaneous battery current as an indicator of malicious activity using an innovative Dynamic Threshold Calculation (DTC) algorithm. Second, the Current Attack Signature Identification and Matching System (CASIMS) provides a means for high resolution current measurements and supporting analytical tools. This system investigates Bluetooth, Wi-Fi, and blended exploits using an oscilloscope to gather high fidelity data. Instantaneous current changes were examined on mobile devices during representative attacks to determine unique attack traces and recognizable signatures. Third, two B-SIPS supporting theoretical models are presented to investigate static and dynamic smart battery polling. These analytical models are employed to examine smart battery characteristics to support the theoretical intrusion detection limits and capabilities of B-SIPS. Fourth, a new genre of attack, known as a Battery Polling Cycle Timing Attack, is introduced. Today's smart battery technology polling rates are designed to support Advanced Power Management needs. Every PDA and smart phone has a polling rate that is determined by the device and smart battery original equipment manufacturers. If an attacker knows the precise timing of the polling rate of the battery's chipset, then the attacker could attempt to craft intrusion packets to arrive within those limited time windows and between the battery's polling intervals. Fifth, this research adds to the body of knowledge about non-traditional attack sensing and correlation by providing a component of an intrusion detection strategy. This work expands today's research knowledge towards a more robust multilayered network defense by creating a novel design and methodology for employing mobile computing devices as a first line of defense to improve overall network security and potentially through extension to other communication mediums in need of defensive capabilities. Mobile computing and communications devices such as PDAs, smart phones, and ultra small general purpose computing devices are the typical targets for the results of this work. Additionally, field-deployed battery operated sensors and sensor networks will also benefit by incorporating security mechanisms developed and described here.
- Bayesian Visual Analytics: Interactive Visualization for High Dimensional DataHan, Chao (Virginia Tech, 2012-12-07)In light of advancements made in data collection techniques over the past two decades, data mining has become common practice to summarize large, high dimensional datasets, in hopes of discovering noteworthy data structures. However, one concern is that most data mining approaches rely upon strict criteria that may mask information in data that analysts may find useful. We propose a new approach called Bayesian Visual Analytics (BaVA) which merges Bayesian Statistics with Visual Analytics to address this concern. The BaVA framework enables experts to interact with the data and the feature discovery tools by modeling the "sense-making" process using Bayesian Sequential Updating. In this paper, we use BaVA idea to enhance high dimensional visualization techniques such as Probabilistic PCA (PPCA). However, for real-world datasets, important structures can be arbitrarily complex and a single data projection such as PPCA technique may fail to provide useful insights. One way for visualizing such a dataset is to characterize it by a mixture of local models. For example, Tipping and Bishop [Tipping and Bishop, 1999] developed an algorithm called Mixture Probabilistic PCA (MPPCA) that extends PCA to visualize data via a mixture of projectors. Based on MPPCA, we developped a new visualization algorithm called Covariance-Guided MPPCA which group similar covariance structured clusters together to provide more meaningful and cleaner visualizations. Another way to visualize a very complex dataset is using nonlinear projection methods such as the Generative Topographic Mapping algorithm(GTM). We developped an interactive version of GTM to discover interesting local data structures. We demonstrate the performance of our approaches using both synthetic and real dataset and compare our algorithms with existing ones.
- Be the Data: Embodied Visual AnalyticsChen, Xin (Virginia Tech, 2016-08-22)With the rise of big data, it is becoming increasingly important to educate students about data analytics. In particular, students without a strong mathematical background usually have an unenthusiastic attitude towards high-dimensional data and find it challenging to understand relevant complex analytical methods, such as dimension reduction. In this thesis, we present an embodied approach for visual analytics designed to teach students exploring alternative 2D projections of high dimensional data points using weighted multidimensional scaling. We proposed a novel application, Be the Data, to explore the possibilities of using human's embodied resources to learn from high dimensional data. In our system, each student embodies a data point and the position of students in a physical space represents a 2D projection of the high-dimensional data. Students physically moves in a room with respect to others to interact with alternative projections and receive visual feedback. We conducted educational workshops with students inexperienced in relevant data analytical methods. Our findings indicate that the students were able to learn about high-dimensional data and data analysis process despite their low level of knowledge about the complex analytical methods. We also applied the same techniques into social meetings to explain social gatherings and facilitate interactions.
- A Bidirectional Pipeline for Semantic Interaction in Visual AnalyticsBinford, Adam Quarles (Virginia Tech, 2016-09-21)Semantic interaction in visual data analytics allows users to indirectly adjust model parameters by directly manipulating the output of the models. This is accomplished using an underlying bidirectional pipeline that first uses statistical models to visualize the raw data. When a user interacts with the visualization, the interaction is interpreted into updates in the model parameters automatically, giving the users immediate feedback on each interaction. These interpreted interactions eliminate the need for a deep understanding of the underlying statistical models. However, the development of such tools is necessarily complex due to their interactive nature. Furthermore, each tool defines its own unique pipeline to suit its needs, which leads to difficulty experimenting with different types of data, models, interaction techniques, and visual encodings. To address this issue, we present a flexible multi-model bidirectional pipeline for prototyping visual analytics tools that rely on semantic interaction. The pipeline has plug-and-play functionality, enabling quick alterations to the type of data being visualized, how models transform the data, and interaction methods. In so doing, the pipeline enforces a separation between the data pipeline and the visualization, preventing the two from becoming codependent. To show the flexibility of the pipeline, we demonstrate a new visual analytics tool and several distinct variations, each of which were quickly and easily implemented with slight changes to the pipeline or client.
- Bridging Cognitive Gaps Between User and Model in Interactive Dimension ReductionWang, Ming (Virginia Tech, 2020-05-05)High-dimensional data is prevalent in all domains but is challenging to explore. Analysis and exploration of high-dimensional data are important for people in numerous fields. To help people explore and understand high-dimensional data, Andromeda, an interactive visual analytics tool, has been developed. However, our analysis uncovered several cognitive gaps relating to the Andromeda system: users do not realize the necessity of explicitly highlighting all the relevant data points; users are not clear about the dimensional information in the Andromeda visualization; and the Andromeda model cannot capture user intentions when constructing and deconstructing clusters. In this study, we designed and implemented solutions to address these gaps. Specifically, for the gap in highlighting all the relevant data points, we introduced a foreground and background view and distance lines. Our user study with a group of undergraduate students revealed that the foreground and background views and distance lines could significantly alleviate the highlighting issue. For the gap in understanding visualization dimensions, we implemented a dimension-assist feature. The results of a second user study with students with various backgrounds suggested that the dimension-assist feature could make it easier for users to find the extremum in one dimension and to describe correlations among multiple dimensions; however, the dimension-assist feature had only a small impact on characterizing the data distribution and assisting users in understanding the meanings of the weighted multidimensional scaling (WMDS) plot axes. Regarding the gap in creating and deconstructing clusters, we implemented a solution utilizing random sampling. A quantitative analysis of the random sampling strategy was performed, and the results demonstrated that the strategy improved Andromeda's capabilities in constructing and deconstructing clusters. We also applied the random sampling to two-point manipulations, making the Andromeda system more flexible and adaptable to differing data exploration tasks. Limitations are discussed, and potential future research directions are identified.
- Bridging cognitive gaps between user and model in interactive dimension reductionWang, Ming; Wenskovitch, John; House, Leanna L.; Polys, Nicholas F.; North, Christopher L. (2021-06)Interactive machine learning (ML) systems are difficult to design because of the "Two Black Boxes" problem that exists at the interface between human and machine. Many algorithms that are used in interactive ML systems are black boxes that are presented to users, while the human cognition represents a second black box that can be difficult for the algorithm to interpret. These black boxes create cognitive gaps between the user and the interactive ML model. In this paper, we identify several cognitive gaps that exist in a previously-developed interactive visual analytics (VA) system, Andromeda, but are also representative of common problems in other VA systems. Our goal with this work is to open both black boxes and bridge these cognitive gaps by making usability improvements to the original Andromeda system. These include designing new visual features to help people better understand how Andromeda processes and interacts with data, as well as improving the underlying algorithm so that the system can better implement the intent of the user during the data exploration process. We evaluate our designs through both qualitative and quantitative analysis, and the results confirm that the improved Andromeda system outperforms the original version in a series of high-dimensional data analysis tasks. (C) 2021 The Author(s). Published by Elsevier B.V. on behalf of Zhejiang University and Zhejiang University Press Co. Ltd.
- Characterizing Human Driving Behavior Through an Analysis of Naturalistic Driving DataAli, Gibran (Virginia Tech, 2023-01-23)Reducing the number of motor vehicle crashes is one of the major challenges of our times. Current strategies to reduce crash rates can be divided into two groups: identifying risky driving behavior prior to crashes to proactively reduce risk and automating some or all human driving tasks using intelligent vehicle systems such as Advanced Driver Assistance Systems (ADAS) and Automated Driving Systems (ADS). For successful implementation of either strategy, a deeper understanding of human driving behavior is essential. This dissertation characterizes human driving behavior through an analysis of a large naturalistic driving study and offers four major contributions to the field. First, it describes the creation of the Surface Accelerations Reference, a catalog of all longitudinal and lateral surface accelerations found in the Second Strategic Highway Research Program Naturalistic Driving Study (SHRP 2 NDS). SHRP 2 NDS is the largest naturalistic driving study in the world with 34.5 million miles of data collected from over 3,500 participants driving in six separate locations across the United States. An algorithm was developed to detect each acceleration epoch and summarize key parameters, such as the mean and maxima of the magnitude, roadway properties, and driver inputs. A statistical profile was then created for each participant describing their acceleration behavior in terms of rates, percentiles, and the magnitude of the strongest event in a distance threshold. The second major contribution is quantifying the effect of several factors that influence acceleration behavior. The rate of mild to harsh acceleration epochs was modeled using negative binomial distribution-based generalized linear mixed effect models. Roadway speed category, driver age, driver gender, vehicle class, and location were used as fixed effects, and a unique participant identifier was as the random effect. Subcategories of each fixed effect were compared using incident rate ratios. Roadway speed category was found to have the largest effect on acceleration behavior, followed by driver age, vehicle class, and location. This methodology accounts for the major influences while simultaneously ensuring that the comparisons are meaningful and not driven by coincidences of data collection. The third major contribution is the extraction of acceleration-based long-term driving styles and determining their relationship to crash risk. Rates of acceleration epochs experienced on ≤ 30 mph roadways were used to cluster the participants into four groups. The metrics to cluster the participants were chosen so that they represent long-term driving style and not short-term driving behavior being influenced by transient traffic and environmental conditions. The driving style was also correlated to driving risk by comparing the crash rates, near-crash rates, and speeding behavior of the participants. Finally, the fourth major contribution is the creation of a set of interactive analytics tools that facilitate quick characterization of human driving during regular as well as safety-critical driving events. These tools enable users to answer a large and open-ended set of research questions that aid in the development of ADAS and ADS components. These analytics tools facilitate the exploration of queries such as how often do certain scenarios occur in naturalistic driving, what is the distribution of key metrics during a particular scenario, or what is the relative composition of various crash datasets? Novel visual analytics principles such as video on demand have been implemented to accelerate the sense-making loop for the user.
- Clustered Layout Word Cloud for User Generated Online ReviewsWang, Ji (Virginia Tech, 2012-11-20)User generated reviews, like those found on Yelp and Amazon, have become important reference material in casual decision making, like dining, shopping and entertainment. However, very large amounts of reviews make the review reading process time consuming. A text visualization can speed up the review reading process. In this thesis, we present the clustered layout word cloud -- a text visualization that quickens decision making based on user generated reviews. We used a natural language processing approach, called grammatical dependency parsing, to analyze user generated review content and create a semantic graph. A force-directed graph layout was applied to the graph to create the clustered layout word cloud. We conducted a two-task user study to compare the clustered layout word cloud to two alternative review reading techniques: random layout word cloud and normal block-text reviews. The results showed that the clustered layout word cloud offers faster task completion time and better user satisfaction than the other two alternative review reading techniques. [Permission email from J. Huang removed at his request. GMc March 11, 2014]