Browsing by Author "Kavanaugh, Andrea L."
Now showing 1 - 20 of 49
Results Per Page
Sort Options
- ACM Venue Recommender SystemKodur Kumar, Harinni (Virginia Tech, 2020-06-17)A frequent goal of a researcher is to publish his/her work in appropriate conferences and journals. With a large number of options for venues in the microdomains of every research discipline, the issue of selecting suitable locations for publishing cannot be underestimated. Further, the venues diversify themselves in the form of workshops, symposiums, and challenges. Several publishers such as IEEE and Springer have recognized the need to address this issue and have developed journal recommenders. In this thesis, our goal is to design and develop a similar recommendation system for the ACM dataset. We view this recommendation problem from a classification perspective. With the success of deep learning classifiers in recent times and their pervasiveness in several domains, we modeled several 1D Convolutional neural network classifiers for the different venues. When given some submission information like title, keywords, abstract, etc. about a paper, the recommender uses these developed classifier predictions to recommend suitable venues to the user. The dataset used for the project is the ACM Digital Library metadata that includes textual information for research papers and journals submitted at various conferences and journals over the past 60 years. We developed the recommender based on two approaches: 1) A binary CNN classifier per venue (single classifiers), and 2) Group CNN classifiers for venue groups (group classifiers). Our system has achieved a MAP of 0.55 and 0.51 for single and group classifiers. We also show that our system has a high recall rate.
- The Attitudes of African American Middle School Girls Toward Computer Science: Influences of Home, School, and Technology UseRobinson, Ashley Renee (Virginia Tech, 2015-05-13)The number of women in computing is significantly low compared to the number of men in the discipline, with African American women making up an even smaller segment of this population. Related literature accredits this phenomenon to multiple sources, including background, stereotypes, discrimination, self-confidence, and a lack of self-efficacy or belief in one's capabilities. However, a majority of the literature fails to represent African American females in research studies. This research used a mixed methods approach to understand the attitudes of African American middle school girls toward computer science and investigated the factors that influence these attitudes. Since women who do pursue computing degrees and continue with graduate education often publish in Human-Computer Interaction (HCI) in greater proportions than men, this research used an intervention to introduce African American middle school girls to computational thinking concepts using HCI topics. To expand the scope of the data collected, a separate group of girls were introduced to computational thinking concepts through Algorithms. Data were collected through both quantitative and qualitative sources, and analyzed using inferential statistics and content analysis. The results show that African American middle school girls generally have negative attitudes toward computer science. However, after participating in a computer science intervention, perceptions toward computer science become more positive. The results also reveal that four factors influence the attitudes of African American middle school girls toward computer science, such as the participation in an intervention, the intervention content domain, the facilitation of performance accomplishments, and participant characteristics like socioeconomic status, mother's education, school grades, and the use of smart phones and video game consoles at home.
- Between a Rock and a Cell Phone: Social Media Use during Mass Protests in Iran, Tunisia and EgyptKavanaugh, Andrea L.; Yang, Seungwon; Sheetz, Steven D.; Li, Lin Tzy; Fox, Edward A. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011-05-01)In this paper we examine the use of social media, and especially Twitter, in Iran, Tunisia and Egypt during the mass political demonstrations and protests in June 2009, December 2010 - January 2011, and February 2011, respectively. We compare this usage with methods and findings from other studies on the use of Twitter in emergency situations, such as natural and man-made disasters. We draw on our own experiences and participant-observations as an eyewitness in Iran (first author), and on Twitter data from Iran, Tunisia and Egypt. In these three cases, Twitter filled a unique technology and communication gap at least partially. We summarize suggested directions for future research with a view of placing this work in the larger context of social media use in conditions of crisis and social convergence.
- Civic and political involvement among young adults: Exploring political talk, political efficacy and political participation in a community contextHash, Andrae Stephen (Virginia Tech, 2014-12-18)This study expands research on uses and gratifications by exploring political information-seeking uses of the Internet and social networking sites (SNS) and their relationships with political efficacy and political participation. Approximately 300 young adults completed a survey covering information-seeking, information access, and information sharing uses for local civic and political purposes. The study hypothesizes that young adults' political talk, particularly in their online social networks, is associated with political efficacy. Variables that support the relationship between information-seeking and political efficacy are also explored. Random and convenience samples of young adults were combined in this study to explore the cognitive (perceived efficacy) and civic (actual behavior) behaviors of undergraduate students at Virginia Tech in order to examine the role of political talk in individuals' opinion networks measured by the outcome of political talk. Results show considerable support for hypotheses emphasizing the predicted relationships between Internet and SNS for political information-seeking uses, political efficacy, and political participation gratifications. Future research exploring the broad range of political communication uses and their association with political efficacy and political participation is warranted.
- Collecting, Analyzing and Visualizing Tweets using Open Source ToolsYang, Seungwon; Kavanaugh, Andrea L. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2011)This tutorial will teach participants how to collect, analyze and visualize results from twitter data. We will demonstrate several different free, open-source web-based tools that participants can use to collect twitter data (e.g., Archivist, 140kit.com, TwapperKeeper), and show them a few different methods, tools or programs they can use to analyze the data in a given collection. Finally, we will show participants visualization tools and programs they can use to present the analyses, such as tag clouds, graphs and other data clustering techniques. As much as possible this will be a hands-on tutorial, so participants can learn by making their own twitter data collection, analysis and visualization as part of the tutorial.
- Communication of Emotion in Mediated and Technology-Mediated Contexts: Face-to-Face, Telephone, and Instant MessagingBurge, Jamika D. (Virginia Tech, 2007-04-23)This dissertation work considers communication between people. I look at coordinating dyads (couples in relationships) and people in working relationships to develop an understanding of how people engage in high-stakes, or emotional communication via various communicative media. The approach for this research is to observe and measure people's behavior during interaction and subsequent reporting of that behavior and associated internal experiences. Qualitative and quantitative methods are employed. Quantitative data are analyzed using a range of statistical analyses, including correlations matrices, ANOVAs, and multivariate statistics. Two controlled laboratory experiments were conducted for this research. These experiments involved couples in relationships. Couples were brought into the lab and argued with each other across one of three technological media: face-to-face, telephone, and instant messaging (IM). In one set of couples' experiments, the couples argued for twenty minutes; in the subsequent couples' experiment, couples were encouraged to take as much time as they needed for their arguments. One of the main results from the first experiment is that couples did, indeed, argue when brought into a laboratory setting. One of the important findings for the second experiment is that time did not affect couples' tendency to reach closure during their arguments. This research is a contribution in that it examines how people engage in highly emotional communication using various technological media. In a society with ever-increasing communication needs that require technology, it becomes necessary to study its communicative affordances. Understanding the context of highly emotional interactions between members of couples gives insight into how technology meets (or fails to meet) these communication needs.
- Communities of Tweeple: How Communities Engage with Microblogging When Co-locatedVega, Edgardo Luis (Virginia Tech, 2011-04-22)Most of the research done on microblogging services, such as Twitter, has focused on how the individual communicates with their community at a micro and macro level; less research has been done on how the community affects the individual. We present in this thesis some ideas about this phenomenon. We do this by collecting data of Twitter users at a conference. We collected 21,150 tweets from approximately 400 users during a five week period and additionally collected survey data from a small subset of the tweeters. By observing users of Twitter, before, during, after a specific event we discovered a pattern in postings. Specifically, we found that tweets increased the week of the conference and that by the end of the conference the network was strong. These findings lead us to conclude that collocation of communities, like conferences, has a substantial effect on online microblogging behaviors.
- Contextinator: Recreating the context lost amid information fragmentation on the webAhuja, Ankit (Virginia Tech, 2013-06-01)The web browser has emerged as a central workspace for information workers, where they make use of cloud-based applications to access their information. While this solution nicely supports access to their data from multiple devices, it presents a nightmare for organizing and coordinating data between tools for a single project. Information is typically scattered between various online tools, where storage and organization structures are replicated. Information workers are interrupted and have to switch between projects frequently. Once interrupted, resuming work on a project can be hard. To address this information fragmentation and the impact of work interruptions, I created Contextinator, a personal information manager for the web browser that lets information workers organize their work activity and information into projects. Contextinator assists in coordinating information for projects, thereby ameliorating information fragmentation for projects that live on the cloud. It assists information workers in context switching and resuming work after interruptions. In my the- sis, I describe the problem of information fragmentation in the cloud. I discuss the different areas of related work of Personal Information Management, the design of Contextinator and how it is grounded in previous research. I briefly discuss how Contextinator is implemented. I then present the results from my field-evaluation of Contextinator. Finally, I conclude by discussing future work in this research.
- Contextualizing Remote Touch for Affect ConveyanceWang, Rongrong (Virginia Tech, 2012-09-27)Touch is an expressive and powerful modality in affect conveyance. A simple touch like a hug can elicit strong feelings of affection both in the touch initiator and recipient. Therefore delivering touch over a distance to a long-distance family member or significant other has been an appealing concept for both researchers and designers. However compared to the development of audio, video channels which allow the transmission of voice, facial expression and gesture, digitally mediated touch (Remote Touch) has not received much attention. We believe that this is partially due to the lack of understanding of the capabilities and communication possibilities that remote touch brings. This dissertation presents a review of relevant psychological and sociological literature of touch and proposes a model of immediacy of the touch channel for affect conveyance. We advance three hypotheses regarding the possibility of remote touch in immediate affect conveyance: presence, fidelity and context. We posit that remote touch with relatively low touch fidelity can convey meaningful immediate affect when it is accompanied by a contextualizing channel. To test the hypothesis, two sets of remote touch devices are designed and prototyped which allow users to send/receive a squeeze on the upper arm to/from others effectively. Three in-lab user studies are conducted to investigate the role of remote touch in affect conveyance. These studies showed clearly that remote touch, when contextualized, can influence the affective component in communication. Our results demonstrated that remote touch can afford a rich spectrum of meanings and affects. Three major categories of the usage are identified as positive affect touch which serves to convey affects such as affection, sympathy and sharing, comfort etc., playful touch which serves to lighten the conversations, and conversational touch which serves to regulate the dynamics in the discourse. Our interview results also provide insights of how people use this new channel in their communication.
- Creating an Interactive Learning Environment with Reusable HCI KnowledgeFabian, Alain (Virginia Tech, 2006-05-25)This thesis proposes creating an interactive learning environment for Human Computer Interaction (HCI) to facilitate access to, and learning of, important design knowledge. By encapsulating HCI knowledge into reusable claims stored in a knowledge repository, or claims library, this learning environment aims at allowing students to effectively explore design features to limit their reliance on intuition to mold their interfaces, help them address proper design concerns, and evaluate alternatives for their designs. This learning approach is based on active learning where students create their own knowledge by gathering information. However, building adequate development records from which students can gather HCI knowledge is critical to support this approach. This thesis explores using effective reusable design components to act as design records to create an interactive learning environment for students learning HCI design. An initial prototype for the learning environment introduces claims as an encapsulation mechanism for design features from which students can gather HCI knowledge. Pilot testing outlines the accessibility, applicability and reusability problems associated with this approach. To solve theses issues, a taxonomic organization of an improved form of claims (reference claims), is introduced to share core design knowledge among students. A taxonomy is designed as a way to expose students to important design concerns as well as a method to categorize claims. Reference claims are introduced as improved claims inspired by reference tasks to expose students to design alternatives for design concerns. A detailed taxonomy and a set of reference claims for the domain of notification systems demonstrate how existing theories of design can be translated into reference claims to create an interactive learning environment. An experiment illustrates the applicability and reusability of reference claims for various designs within a particular domain. Finally, an evaluation assesses the benefits of this learning environment based on reference claims in terms of improving student designs and increasing the amount of HCI knowledge they reuse. Results show that by exposing students to valuable concerns and alternatives for the design of interactive systems, an interactive learning environment based on reference claims can improve students' understanding of the design scope and lead to an increased use of existing HCI knowledge in their designs.
- CTRnet Final ReportFox, Edward A.; Shoemaker, Donald J.; Sheetz, Steven D.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2013-08-26)The CTRnet project team has been developing a digital library including many webpage archives and tweet archives related to disasters, in collaboration with the Internet Archive. The goals of the CTRnet project are to provide such archived data sets for analysis, including by researchers who are seeking deep insights about those events, and to support a range of services and infrastructure regarding those tragic events for the various stakeholders and the general public, allowing them to study and learn.
- CTRnet: Project Proposal to NSFFox, Edward A.; Shoemaker, Donald J.; Kavanaugh, Andrea L.; Ramakrishnan, Naren (2009)Crises and tragedies are, regrettably, part of life; a recent sample, showing the small number of collections preserved at the Internet Archive, is shown in Table 1. While always difficult, recovery from tragic events may be increasingly facilitated and supported by information and communication technology (IC1). Individuals, groups, and communities are using ICT in innovative ways to learn from these events and recover more quickly and more effectively. During and after a crisis, individuals and communities face a confusing plethora of data and information, and strive to make sense by way of that data [114]. They seek to carry out their usual activities, but want to be informed by new insights. They work to help others, or to receive help, but the context and technologies involved in communication today (e.g., Internet, WWW, online communities, mobile devices) make it exceedingly difficult to integrate content, community, and services. Accordingly, individuals and communities respond by attempting to meet their needs with the tools they have, e.g., creating a Facebook group to quickly inform members who is OK, and other groups to share pictures, comments, and additional contributions.
- The Deliberative Potential of Social Media: Face Threat and Face Support in Online Political ExpressionSmith, Anjelica Marie (Virginia Tech, 2016-08-01)Engaging in productive political discussion has long been a valued aspect of American democratic life. Due to ease of access and the potential for exposure to diverse views, the Internet and social media may support mediated political talk. Literature on the concept of face and politeness theory provides a framework for understanding interpersonal interactions, both online and offline. To understand if social media has the potential to host political discussion among millennials, a survey (N = 352) of undergraduate students examined social media use and political interaction experiences. Facebook was the most popular platform for exposure to others' political opinions and political self-expression. Facebook users with more diverse networks engaged in more political expression. Across numerous platforms, participants reported frequently being exposed to others' political opinions but infrequently sharing their own views. Negative and positive political interactions on Facebook and Twitter were explored for their threat to and support of negative face (need for autonomy) and positive face (need for validation). Findings indicate that engaging in negative interactions leads to more face threat while observing negative interactions solicits more face support. Engaging in positive interactions results in more face support and observing positive interactions leads to more face threat. Across interaction type and platform, participants who actively engaged in political interactions as opposed to merely observing them reported significantly more subsequent online political engagement. Future research on political interactions across various social media platforms and the application of interpersonal communication theory to the study of mediated political talk is warranted.
- The Effect of Technology on Social Interaction in Local Community OrganizationsSnook, Jason Spence (Virginia Tech, 2002-05-10)With each new innovation in technology since at least the Industrial Revolution, and probably before, optimists and pessimists have squared off in a cyclic debate over the impact of the day's newest technology. Self-proclaimed futurists for centuries have attempted to foretell the impact of technology on society with varied success. The goal of this research project is to study the effect of computer network technology on the social interactions of the local community organizations in Blacksburg, VA. Online surveys filled out by the leaders and members of these organizations measure different aspects of each organization and the use and usage of Internet technology within that organization. Correlations between the two may help us identify ways technology has affected the way we communicate with one another. Are community organizations communicating more or less? If so, how? Has face-to-face interaction been forsaken in lieu of technology such as email? The effects found in the survey results should give way to meaningful discourse on how technology can best be used to aid social interaction in local organizations.
- The Efficacy of Knowledge Sharing: Centralized Vs. Self-Organizing Online CommunitiesGodara, Jaideep (Virginia Tech, 2007-05-04)This study investigates the impact of an online community's control structure on the knowledge sharing process in that community. Using a framework comprised of legitimate peripheral participation theory and the weak-ties phenomenon, the study focuses on a comparative analysis of self-organizing online communities (e.g., weblog networks) and centralized online communities (e.g., discussion forums communities) with respect to the efficacy of knowledge sharing in these communities. The findings of this study indicate that self-organizing communities of practice have more weak-ties among their members compared to centralized communities. As per weak-ties theory of Granovetter (1973, 1983), these findings suggest that self-organizing communities facilitate greater dissemination of knowledge and flow of information among their members than centralized communities. The abundance of weak-ties in their community structure also makes self-organizing communities better environments for the discovery of new information compared to centralized community environments. This study did not find any evidence of community structure impact on peripheral participation and the interaction activity level among peripheral participants of a given online community. These observations may have stemmed from the limitations of research design, however, it is safe to say as of now that verdict on peripheral participation differences in different community structures is inconclusive at best.
- An Evaluation Method for Thinking in Technology EcologiesChu Yew Yee, Sharon L. (Virginia Tech, 2013-12-09)As technology progresses, we become surrounded with an ever increasing number of devices. Information can now be persistently represented beyond a single screen and a single session. In the educational context, we see a rapid adoption of the panoply of devices, but often without any careful thought. Devices in isolation are unlikely to enable effective learning. This research explores how devices function in technological display and device ecologies or ecosystems to support human thinking, learning and sensemaking. Based on the theories of Vygotsky's sign mediation triangle, we contribute a method that may allow one to evaluate how technology configurations support (or hinder) students' thinking. Our method proposes the concept of objectification as a way to identify the potential or opportunity for learning in technology ecologies. The significance of such an evaluation methodology is considerable, given the nascent field of sensemaking and the lack of consensus on evaluation in such contexts: our research advances a principled approach by which device ecologies can be examined for their potential to provide 'learning experiences', and enables one to articulate affordances for the design of technological spatial environments that can help to support higher thought. Our contribution thus is in terms of methodology, theory, evaluation and the design of technology ecologies.
- Event-related Collections Understanding and ServicesLi, Liuqing (Virginia Tech, 2020-03-18)Event-related collections, including both tweets and webpages, have valuable information, and are worth exploring in interdisciplinary research and education. Unfortunately, such data is noisy, so this variety of information has not been adequately exploited. Further, for better understanding, more knowledge hidden behind events needs to be unearthed. Regarding these collections, different societies may have different requirements in particular scenarios. Some may need relatively clean datasets for data exploration and data mining. Social researchers require preprocessing of information, so they can conduct analyses. General societies are interested in the overall descriptions of events. However, few systems, tools, or methods exist to support the flexible use of event-related collections. In this research, we propose a new, integrated system to process and analyze event-related collections at different levels (i.e., data, information, and knowledge). It also provides various services and covers the most important stages in a system pipeline, including collection development, curation, analysis, integration, and visualization. Firstly, we propose a query likelihood model with pre-query design and post-query expansion to rank a webpage corpus by query generation probability, and retrieve relevant webpages from event-related tweet collections. We further preserve webpage data into WARC files and enrich original tweets with webpages in JSON format. As an application of data management, we conduct an empirical study of the embedded URLs in tweets based on collection development and data curation techniques. Secondly, we develop TwiRole, an integrated model for 3-way user classification on Twitter, which detects brand-related, female-related, and male-related tweeters through multiple features with both machine learning (i.e., random forest classifier) and deep learning (i.e., an 18-layer ResNet) techniques. As guidance to user-centered social research at the information level, we combine TwiRole with a pre-trained recurrent neural network-based emotion detection model, and carry out tweeting pattern analyses on disaster-related collections. Finally, we propose a tweet-guided multi-document summarization (TMDS) model, which generates summaries of the event-related collections by using tweets associated with those events. The TMDS model also considers three aspects of named entities (i.e., importance, relatedness, and diversity) as well as topics, to score sentences in webpages, and then rank selected relevant sentences in proper order for summarization. The entire system is realized using many technologies, such as collection development, natural language processing, machine learning, and deep learning. For each part, comprehensive evaluations are carried out, that confirm the effectiveness and accuracy of our proposed approaches. Regarding broader impact, the outcomes proposed in our study can be easily adopted or extended for further event analyses and service development.
- Expressive Forms of Topic Modeling to Support Digital HumanitiesGad, Samah Hossam Aldin (Virginia Tech, 2014-10-15)Unstructured textual data is rapidly growing and practitioners from diverse disciplines are expe- riencing a need to structure this massive amount of data. Topic modeling is one of the most used techniques for analyzing and understanding the latent structure of large text collections. Probabilistic graphical models are the main building block behind topic modeling and they are used to express assumptions about the latent structure of complex data. This dissertation address four problems related to drawing structure from high dimensional data and improving the text mining process. Studying the ebb and flow of ideas during critical events, e.g. an epidemic, is very important to understanding the reporting or coverage around the event or the impact of the event on the society. This can be accomplished by capturing the dynamic evolution of topics underlying a text corpora. We propose an approach to this problem by identifying segment boundaries that detect significant shifts of topic coverage. In order to identify segment boundaries, we embed a temporal segmentation algorithm around a topic modeling algorithm to capture such significant shifts of coverage. A key advantage of our approach is that it integrates with existing topic modeling algorithms in a transparent manner; thus, more sophisticated algorithms can be readily plugged in as research in topic modeling evolves. We apply this algorithm to studying data from the iNeighbors system, and apply our algorithm to six neighborhoods (three economically advantaged and three economically disadvantaged) to evaluate differences in conversations for statistical significance. Our findings suggest that social technologies may afford opportunities for democratic engagement in contexts that are otherwise less likely to support opportunities for deliberation and participatory democracy. We also examine the progression in coverage of historical newspapers about the 1918 influenza epidemic by applying our algorithm on the Washington Times archives. The algorithm is successful in identifying important qualitative features of news coverage of the pandemic. Visually convincing results of data mining algorithms and models is crucial to analyzing and driving conclusions from the algorithms. We develop ThemeDelta, a visual analytics system for extracting and visualizing temporal trends, clustering, and reorganization in time-indexed textual datasets. ThemeDelta is supported by a dynamic temporal segmentation algorithm that integrates with topic modeling algorithms to identify change points where significant shifts in topics occur. This algorithm detects not only the clustering and associations of keywords in a time period, but also their convergence into topics (groups of keywords) that may later diverge into new groups. The visual representation of ThemeDelta uses sinuous, variable-width lines to show this evolution on a timeline, utilizing color for categories, and line width for keyword strength. We demonstrate how interaction with ThemeDelta helps capture the rise and fall of topics by analyzing archives of historical newspapers, of U.S. presidential campaign speeches, and of social messages collected through iNeighbors. ThemeDelta is evaluated using a qualitative expert user study involving three researchers from rhetoric and history using the historical newspapers corpus. Time and location are key parameters in any event; neglecting them while discovering topics from a collection of documents results in missing valuable information. We propose a dynamic spatial topic model (DSTM), a true spatio-temporal model that enables disaggregating a corpus's coverage into location-based reporting, and understanding how such coverage varies over time. DSTM naturally generalizes traditional spatial and temporal topic models so that many existing formalisms can be viewed as special cases of DSTM. We demonstrate a successful application of DSTM to multiple newspapers from the Chronicling America repository. We demonstrate how our approach helps uncover key differences in the coverage of the flu as it spread through the nation, and provide possible explanations for such differences. Major events that can change the flow of people's lives are important to predict, especially when we have powerful models and sufficient data available at our fingertips. The problem of embedding the DSTM in a predictive setting is the last part of this dissertation. To predict events and their locations across time, we present a predictive dynamic spatial topic model that can predict future topics and their locations from unseen documents. We showed the applicability of our proposed approach by applying it on streaming tweets from Latin America. The prediction approach was successful in identify major events and their locations.
- Facilitating Design Knowledge Reuse Through RelationshipsWahid, Shaikh Shahtab (Virginia Tech, 2011-01-27)Design reuse is an approach in which the creation of new designs is based on the identification of previously employed solutions and the incorporation of those into new contexts. This notion has been extensively studied especially by software engineers. This research seeks to support the reuse of design knowledge in the Human-Computer Interaction (HCI) community in creating new designs as it is generally argued that reuse has the potential to reduce development time and costs. Efforts to reuse design elements in HCI, often in the form of design patterns, are slowly emerging. This work seeks to facilitate the reuse of design knowledge in the form of claims. To achieve this goal, the notion of claim relationships—descriptions of connections between claims that emerge in design—is introduced as a mechanism to facilitate reuse. Claims relationships can be used to connect a collection of reusable claims so that they can be searched, understood, tailored, and integrated into new designs. A method for how to use the relationships is presented to aid in the creation of scenarios. Through a series of studies starting from the use of relationships to locate and reuse claims to the use of cards sets incorporating images and rationale for storyboards, the potential for relationships is demonstrated. These works inform the design and evaluation of a storyboarding tool called PIC-UP. PIC-UP is introduced as an example of how relationships can be utilized in the creation of storyboards made of reusable artifacts in the form of claims. Studies of PIC-UP position the tool as one that enables the reuse through the use of a storyboarding guide and social navigation by collecting and sharing claims. It shows potential in aiding novice and non-designers and can serve as a communication tool.
- A Framework for Hadoop Based Digital Libraries of TweetsBock, Matthew (Virginia Tech, 2017-07-17)The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) projects. Researchers across varying disciplines have an interest in leveraging DLRL's collections of tweets for their own analyses. However, due to the steep learning curve involved with the required tools (Spark, Scala, HBase, etc.), simply converting the Twitter data into a workable format can be a cumbersome task in itself. This prompted the effort to build a framework that will help in developing code to analyze the Twitter data, run on arbitrary tweet collections, and enable developers to leverage projects designed with this general use in mind. The intent of this thesis work is to create an extensible framework of tools and data structures to represent Twitter data at a higher level and eliminate the need to work with raw text, so as to make the development of new analytics tools faster, easier, and more efficient. To represent this data, several data structures were designed to operate on top of the Hadoop and Spark libraries of tools. The first set of data structures is an abstract representation of a tweet at a basic level, as well as several concrete implementations which represent varying levels of detail to correspond with common sources of tweet data. The second major data structure is a collection structure designed to represent collections of tweet data structures and provide ways to filter, clean, and process the collections. All of these data structures went through an iterative design process based on the needs of the developers. The effectiveness of this effort was demonstrated in four distinct case studies. In the first case study, the framework was used to build a new tool that selects Twitter data from DLRL's archive of tweets, cleans those tweets, and performs sentiment analysis within the topics of a collection's topic model. The second case study applies the provided tools for the purpose of sociolinguistic studies. The third case study explores large datasets to accumulate all possible analyses on the datasets. The fourth case study builds metadata by expanding the shortened URLs contained in the tweets and storing them as metadata about the collections. The framework proved to be useful and cut development time for all four of the case studies.
- «
- 1 (current)
- 2
- 3
- »