Browsing by Author "Powell, Edward"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Front-End Kibana (FEK) CS5604 Fall 2019Powell, Edward; Liu, Han; Huang, Rong; Sun, Yanshen; Xu, Chao (Virginia Tech, 2020-01-13)During the last two decades, web search engines have been driven to new quality levels due to the continuous efforts made to optimize the effectiveness of information retrieval. More and more people are becoming satisfied during their information retrieval processes, and web search has gradually replaced older methods, where people obtained information from each other or from libraries. Information retrieval systems are in constant interaction with users and help users interpret and analyze data. Currently, we are building the front end of a search engine, where users can explore information related to Tobacco Settlement documents from the University of California, San Francisco, as well as the Electronic Theses and Dissertations (ETDs) of Virginia Tech (and possibly other sites). This submission introduces the current work of the front-end team to build a functional user interface, which is one of the key components of a larger project to build a state-of-the-art search engine for two large datasets. We also seek to understand how users search for data, and accordingly provide the users with more insight and utilities from the two datasets with the help of the visualization tool Kibana. Already, a search website, where users can explore the two datasets, Tobacco Settlement dataset and ETDs dataset, has been created. A series of functionalities of the searching page have been realized, for instance, the login system, searching, filter functions, a Q&A page, and a visualization page.
- Tweet Analysis and Classification: Diabetes and Heartbleed Internet Virus as Use CasesKarajeh, Ola; Arachie, Chidubem; Powell, Edward; Hussein, Eslam (Virginia Tech, 2019-12-24)The proliferation of data on social media has driven the need for researchers to develop algorithms to filter and process this data into meaningful information. In this project, we consider the task of classifying tweets relative to some topic or event and labeling them as informational or non-informational, using the features in the tweets. We focus on two collections from different domains: a diabetes dataset in the health domain and a heartbleed dataset in the security domain. We show the performance of our method in classifying tweets in the different collections. We employ two approaches to generate features for our models: 1) a graph based feature representation and 2) a vector space model, e.g., with TF-IDF weighting or a word embedding. The representations generated are fed into different machine learning algorithms (Logistic Regression, Naïve Bayes, and Decision Tree) to perform the classification task. We evaluate these approaches using metrics (accuracy, precision, recall, and F1-score) on a held out test dataset. Our results show that we can generalize our approach with tweets across different domains.