CS4624: Multimedia, Hypertext, and Information Access
Permanent URI for this collection
This collection contains the final projects of the students in in the course Computer Science 4624: Multimedia, Hypertext, and Information Access, at Virginia Tech.
This course, taught by Professor Ed Fox, is part of the Human-Computer Interaction track, the Knowledge, Information, and Data track, and the Media/Creative Computing track. The curriculum introduces the architectures, concepts, data, hardware, methods, models, software, standards, structures, technologies, and issues involved with: networked multimedia (e.g., image, audio, video) information, access and systems; hypertext and hypermedia; electronic publishing; virtual reality. Coverage includes text processing, search, retrieval, browsing, time-based performance, synchronization, quality of service, video conferencing and authoring.
Browse
Browsing CS4624: Multimedia, Hypertext, and Information Access by Content Type "Report"
Now showing 1 - 20 of 190
Results Per Page
Sort Options
- ABC Drone TeamBartal, Connor; Cooper, Jared (Virginia Tech, 2021-05-13)The ABC Sports Drone capstone team is an extension of the ABC Drone Project which is a group spearheaded by client Charles Kerr and in conjunction with the VT Club Ultimate team, Burn. The goal of the project as a whole is to provide high-quality footage and streaming of amateur sports to the masses. This capstone team is a subsection of the ABC Drone Project that has been tasked with creating software solutions and developing new techniques to help push this drone project to fruition. This report covers the progress of the capstone team in developing new routines for the drone, and the pivots that have been introduced as the team has received new data. The first goal that was tackled was identifying players on a field from an endzone-to-endzone view. This started with the analyzing of contours in addition to their position and attributes to determine if a contour was a player. Artifacts from off the field of play proved to be greatly troublesome, so a field bounding solution was created to eliminate as many artifacts as possible that were not on the field of play. Fairly good accuracy was achieved with this method (~75%), but the goal was set at 85%+ accuracy for identification. After experimenting with motion-detection and object persistence, the best course of action seemed to be identification via a convolutional neural network. No datasets were available that matched the application of this network, so an original dataset needed to be created. An application was developed that allowed for fairly quick extraction of data from sample videos. This data was fed to the neural network and constantly yields around 94% identification accuracy. Although the accuracy is high, it reduces frame rates to approximately 1 FPS. Some market interviews with actual coaches revealed a larger interest in post-processing capability than live-identification, so the client decided to pivot. A system that allows for speed-editing of footage has been developed, and a (proof of concept) companion application will allow coaches to easily track stats and pre-edit film via a GUI. The speed editing program takes in the footage and allows the coach to use a video game controller to create quick cuts to eliminate down time, as well as pan, tilt, and zoom on the footage to ensure the action is always framed. The edits are recorded in an edit-decision-list (EDL) file which is then sent in conjunction with the video file to Amazon Web Services. AWS takes the EDL file and original video and returns a fully-edited game film. With this method, a 90 minute game can be edited in 5 minutes or less. If coaches are recording stats during the game, the footage will also be annotated with important plays which are recorded on a similar EDL for gameplay statistics. Players will then have access to a program that will allow them to click their name to see the timestamps of all of their highlights.
- ADS Assessment VideoBeemsterboer, Christopher; Zebina, Tyler (Virginia Tech, 2019-05-19)In our Multimedia/Hypertext/Information Access capstone course, we worked with Adult Day Services to create a training video system to teach new instructors in their organization how to conduct recurring interviews with the adult clients. Adult Day Services is an organization at Virginia Tech that provides person-centered care to older adults who need assistance. Adult Day Services also aims to promote the physical, social, emotional, mental, and cognitive health of its participants, and they use a variety of assessments to measure overall well-being and participant progress. These assessments are conducted in the form of interviews, and the body language, tone, and speech of the interviewer are key to performing them successfully. The training video system we created covers five different types of assessments and is designed to efficiently train new instructors to conduct these interviews. We filmed an Adult Day Services instructor conducting interviews with five different participants, each completing the five assessments. We edited the footage and compiled all of the clips of each type of assessment together including transitions and titles. We later created a menu system which allows a user to select to play all of the training videos at once, or to play just the training video for a specific type of assessment. We have also included sub-categories within each type of assessment so the user can decide to view a specific participant as opposed to all. We delivered this project in the form of a Blu-ray .iso file on a USB drive which contains the menu system and the associated videos. We have also included instructions on how to download the VLC media player, which is the optimal software for viewing the contents on the .iso file. Finally, we have included our final presentation from our capstone course that goes over the final product as well as the lessons learned and our future plans.
- Adult Day Services Memory Masterclass Promotional VideoKulik, Maddie; Castillo, Pablo; Zurita, Jose (Virginia Tech, 2019-05-01)The goal of the project was to create a promotional video for Virginia Tech’s Adult Day Services center, specifically to advertise for their Memory Masterclass program. Adult Day Services is a center located within the Human Development and Family Sciences Department at Virginia Tech. They are licensed by the Department of Social Services to offer personal care, health monitoring, meals, therapeutic activities, dementia care, and recovery assistance. They serve typically 18 participants each operating day who average about 75 years of age. According to ADS’s mission statement, the center is dedicated to providing a center focused on the well-being and optimal functioning of its participants, a resource for caregiver support, an education opportunity for students, and a community among generations of children, college students, and adults. One of ADS’s main service offerings is their Memory Masterclass course. This course is offered in 6-week sessions to participants over 55 years of age who want to maximize their brain health. The focus of the course is to educate and serve people who have been diagnosed with Mild Cognitive Impairment (MCI). MCI is not a symptom or precursor to Alzheimer's or dementia, but rather a condition that occurs as aging changes brain function. In the 6-week course participants learn strategies for application to daily life that can strengthen brain reserve as you age and get connected with others who have similar concerns about memory. Our main objective was to create a promotional video that Adult Day Services could use on their website to inform and attract people to take the class. This project was broken up into several different stages. The first stage was to meet with our clients, Adult Day Service professionals, to gain a better understanding of the project requirements. Our clients described to us that they would like a video that showcased the active, healthy lifestyle of one of their Memory Masterclass participants. This would include footage of men and women doing outdoor activities, participating in class, and doing mentally-stimulating activities. From meeting with our clients, we came to realize that they wanted a specific type of aesthetic to their video - a combination of active and “homey” footage. An important goal for our clients was to have the video ready to be presented at an AARP event in mid-March, so the first stage of this project had to be completed by that deadline. The second stage was scheduling time to physically shoot the videos. This involved renting camera and sound equipment, coordinating with our clients and course participants, deciding on filming locations, and collecting the raw footage. Once we had shot all of the raw footage, the third stage was comprised of condensing, cleaning, and enhancing the raw footage to create a preliminary draft of the video. The video was delivered to the client, we received feedback, and have begun work revising the video to meet client specifications. The client will be able to use this video for advertising on the ADS website, as well as at different events where their services are promoted. The fourth stage of this project is what we are currently working on right now. Another recommendation was that we prepare another video that was a bit shorter, approximately 90 seconds long, that could be used as a shorter promotion. This shorter video will likely be a condensed version of highlights from the 4-minute video. The third stage was to revise the initial version of the video based on client feedback. This involved sitting down with our client and gaining specific insight as to what details they liked and what they wanted to have modified. After we acquired feedback, we were able to reshoot footage that was not preferable and take more shots of outdoor activities. The final version of the video incorporated footage from both stages of filming and incorporated the client's desired changes. This version of the video was also shown to an applicable user pool of Memory Masterclass students who gave us further feedback.
- AgInsuranceLLMsShi, Michael; Rajesh, Saketh; Truong, An; Hilgenberg, Kyle (2024-05-09)Our project is to develop a conversational assistant to aid users in understanding and choosing appropriate agricultural insurance policies. The assistant leverages a Large Language Model (LLM) trained on datasets from the Rainfall Index Insurance Standards Handbook and USDA site information. It is designed to provide clear, easily understood explanations and guidance, helping users navigate their insurance options. The project encompasses the development of an accessible chat interface, backend integration with a Flask API, and the deployment of the assistant on Virginia Tech's Endeavour cluster. Through personalized recommendations and visualizations, the assistant empowers users to make well-informed decisions regarding their insurance needs. Our project report and presentation outline the project's objectives, design, implementation, and lessons learned, highlighting the potential impact of this interactive conversational assistant in simplifying the complex process of selecting agricultural insurance policies.
- AI Aided AnnotationBishop, Jonah B. M.; David, Isaac; Lubana, Ishaandeep (Virginia Tech, 2022-05-11)Human annotation of long documents is a very important task in training and evaluation in NLP. The process generally starts with the human annotators reading over the document in its entirety. Once the annotator feels they have a sufficient grasp on the document, they can begin to annotate it. Specifically, annotators will look for questions that can be answered, and then write down the question and answer. In our client’s case, the chosen long documents are electronic theses and dissertations (ETDs) which are often 100-150 pages minimum, thereby making it a time consuming and expensive process to annotate. The ETDs are annotated on a chapter by chapter basis as content can vary significantly in each chapter. The annotations generated are then used to help evaluate downstream tasks such as summarization, topic modeling, and question answering. The system aids the annotators in the creation of a Knowledge Base that is rich with topics/keywords and question-answer pairs for each chapter in ETDs. The core of the system revolves around an algorithm known as the Maximal Marginal Relevance. By utilizing the MMR algorithm with a changeable lambda value, keywords, and a couple of other elements, we can identify sentences based on their similarity or diversity relative to a collection of sentences. This algorithm would greatly enhance the annotation process in ETDs by automating the process of identifying the most relevant sentences. Thus, annotators do not have to sift through the ETDs one sentence at a time, instead making a comprehensive summary as fast as the MMR algorithm can work. As a result, annotators can save many hours per ETD, resulting in more human generated annotations in a shorter amount of time. The final deliverables are the project, a final slideshow presenting our work throughout the semester, a final report, and a video demonstrating exactly how to use our platform. All of this is available here on VTechWorks in this report. Additionally, the project is being built using GitHub, making it free and available to the public to fork and modify in any way they see fit.
- AI-Assisted Annotation of Medical ImagesDewan, Suha; Zhou, Daodao; Huynh, Long; Guo, Zipeng (Virginia Tech, 2022-12-15)In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments. More precisely, it’s the process of assigning a label to every pixel in an image so that pixels with the same label share certain characteristics. Image segmentation is an important step in almost any medical image study. Segments are used in images from microscopes that show us different types of cells and these cells contain hundreds of organelles and macromolecular assemblies. Cell Segmentation is the task of splitting a microscopic image domain into lots of different segments, which represent individual instances of cells, however, this requires enormous time for domain experts to label manually and thus the need for AI-Assisted annotation of medical Images. Our project will aid the annotators in receiving images quickly and easily through our web application and performing the predictions on these images.
- Airbnb ScrapingYu, Wang; Huang, Baokun; Liu, Han; Pham, Vinh; Nikolov, Alexander (Virginia Tech, 2020-05-13)Inside Airbnb is a project by Murray Cox, a digital storyteller, who visualized Airbnb data that was scraped by author and coder Tom Slee. The website offers scraped Airbnb data for select cities around the world; historically data is also available. We were tasked with creating visualizations with listing data over Virginia and Austria to see what impact Airbnb was having on the communities in each respective region. The choice was Virginia and Austria because our team was familiar with both regions, with parts of our team being familiar with Virginia and other parts being familiar with Austria. The eventual goal is to expand past analysis of these 2 regions and expand further to say the rest of the United States. Since July 2019, Tom Slee has abandoned the script2 to collect data. To collect data on Virginia and Austria, we needed to update the script to collect more recent data. We began inspecting the script and found it was not collecting as much data as it once was. This was almost certainly due to Airbnb’s website layout changing over time (a common nature of websites). After finding out how the script worked, we eventually found out the various problems related to the script and updated it to the new Airbnb website design. Doing so, we were able to get even more data than we thought possible such as calendar and review data. From there, we were able to begin our data collection process. During all the time fixing the script, our team was making mock visualizations to be displayed on a website for easy viewability. Once data collection was complete, the data was transferred over to be used for these mock visualizations. We visualized many things such as how many listings a single host had, how many listings were in a given county, etc. The main visualization created was to see where all the listings for Airbnb were on the map. We displayed this on a map. We also made maps to visualize availability, prices, and the number of reviews. Further, we created pie charts and histograms to represent Superhosts, instantly bookable listings, and price distributions. We expect that in the future the script and the data collected and visualized will be used by both future CS Students working on subsequent iterations of the project as well as Dr. Zach himself, our client.
- Analyzing Microblog Feeds to Trade StocksWatts, Joseph; Anderson, Nick; Asbill, Connor; Mehr, Joseph (Virginia Tech, 2017-05-10)The goal of this project is to leverage microblogging data about the stock market to predict price trends and execute trades based on these predictions. Predicting the price trends of stocks with microblogging data involves a complex opinion aggregation model. For this, we built upon previous research, specifically a paper called "CrowdIQ" submitted by a team consisting of some Virginia Tech faculty. This paper details a complicated method of aggregating an accurate opinion by modeling judge reliability and interdependence. Once the overall sentiment of the judges was deduced, we built trading strategies that take this information into account to execute trades. The first step of the project was a sentiment analysis of posts on a microblogging site named StockTwits. These messages can contain a label indicating a bullish or bearish sentiment, which will help indicate a specific position to take on a given stock. However, most users choose not to use these labels on their StockTwits. A classification of these unlabeled tweets is required to autonomously utilize StockTwits to drive the proposed trading strategies. With a working sentiment analysis model, we implemented the opinion aggregation model described by CrowdIQ. This can gauge an accurate market sentiment for a particular stock based on the collection of sentiments that are received from users on StockTwits. The next step was the creation of a trading simulation platform, including a complete virtual portfolio management system and an API for retrieving historical and current stock data. These tools allow us to run quick and repeatable tests of our trading strategies on historical data. We can easily compare the performance of strategies by running them with the same historical data. After we had a viable testing environment setup, we implemented trading strategies. This required research and analysis of other attempts at similar uses of microblogging data on predicting stock returns. The testing environment was focused on a set of stocks that is consistent with those used in CrowdIQ. The implementation of the CrowdIQ strategy served as a baseline against which we compared our results. Development of new trading strategies is an open-ended task that involved a process of trial and error. It is possible for a strategy to find success in 2014, but not perform quite as well in other years, because market climates can be fickle. To assess the dependence of the market climate on our strategy's success, we also tested against data for the year of 2015 and compared the performance. The final deliverable is a viable trading simulation environment coupled with various trading strategies and an analysis of their performance in the years of 2014 and 2015. The analysis of each strategy's performance indicated that our sentiment-based strategies perform better than the index in bullish markets like that of 2014, but, when they encounter a bear market, they typically make poor trading decisions which result in a loss of value.
- Anti-Poaching Drone ControlLyman, Matthew; Hudson, Matthew; Bishop, Cory (Virginia Tech, 2022-05-11)Our project assists the SeaQL Lab of Virginia Tech's Department of Fisheries and Wildlife Conservation. Working with the Marine Management Organisation of the UK, the Lab's project entails developing an autonomous drone swarm that can fly predetermined routes around the Chagos Archipelago and send alerts about potential poaching boats, based on machine learning image analysis in the drones' attached computing modules. The main goal of this project is to save the sharks and the ecosystem of those waters while decreasing the time, money, and effort for the local Coast Guard to perform regular monitoring. Instead, the drones will send detection alerts to a remote server being monitored by a ranger if it spots a potential poaching boat. Our report details our contributions to the overall project. Our team took responsibility for several smaller tasks integral to the overall project. First, we familiarized ourselves with the Robotic Operating System (ROS) to connect, calibrate, test, and record video using the cameras provided. ROS will control much of the drones' added functionality such as running the poaching boat detection algorithm, sending flight commands to the drones, and streaming video over a cellular connection. Next, we aided the larger project team in repairing one off-the-shelf drone for potential flight testing. After unsuccessful troubleshooting, we moved to help finish construction of the primary hexacopter. Finally, we wrote a script to start the 4G cellular connection automatically when a drone is powered on. The AntiPoachingDroneControlReport details this work amidst the larger project goals of the SeaQL Lab. The AntiPoachingDroneControlPresentation gives a brief summary of our project work and the lessons learned. This was presented to our CS4624: Multimedia, Hypertext, and Information Access class to summarize our project work and experiences.
- AppTrackWildlifeDiseasesJi, Shangzheng; Lyu, Jiarui; Vu, Justin; Zhang, Tenghui (Virginia Tech, 2021-05-07)Our project is to design a smartphone application and a website to report mange and other wildlife diseases in realtime. Our free smartphone app is designed for both professionals (e.g. hunters) and non-professionals. Our app provides a mini questionnaire to collect the users' familiarity with the mange, take photos of the wildlife species and potential disease, and get the geolocation and date of the photo. Then, all information collected will be saved to the firebase and used by the website. Our website will summarize the data and images collected and display them on the map. We submit the PDF and the PowerPoint of our final presentation. Our final presentation starts from project Introduction, then to the project design, timeline, work completed, iOS application, website, testing, future works, lessons learned, acknowledgment, and references. We also submit the PDF and the zip project dump from Overleaf of our final report. Our final report covers Executive Summary /Abstract, Introduction, Requirements, Design, Implementation, Testing/Evaluation/Assessment, Users' Manual, Developer’s Manual, Lessons Learned, and Acknowledgements.
- Artificial Immune System (AIS) Based Intrusion Detection System (IDS) for Smart Grid Advanced Metering Infrastructure (AMI) NetworksSong, Kevin; Kim, Paul; Tyagi, Vedant; Rajasekaran, Shivani (Virginia Tech, 2018-05-09)The Smart Grid is a large system consisting of many components that contribute to the bidirectional exchange of power. The reason for it being “smart” is because vast amounts of data are transferred between the meter components and the control systems which manage the data. The scale of the smart grid is too large to micromanage. That is why smart grids must learn to use Artificial Intelligence (AI) to be resilient and self-healing against cyber-attacks that occur on a daily basis. Unlike traditional cyber defense methods, Artificial Immune System (AIS) principles have an advantage because they can detect attacks from inside the network and stop them before they occur. The goal of the report is to provide a proof of concept that an AIS can be implemented on smart grid AMI (Advanced Metering Infrastructure) networks and furthermore be able to detect intrusions and anomalies in the network data. The report describes a proof of concept implementation of an AIS system for intrusion detection with a synthetic packet capture (pcap) dataset containing common Internet protocols used in Smart grid AMI networks. An intention of the report is to provide the necessary background for understanding the implementation in the later sections. The background section defines what a smart grid is and how its Advanced Metering Infrastructure (AMI) works, describing all three networks the AMI consists of. The Wide Area Network (WAN) is one of the three networks and we were scoping down to WAN for our project. The report goes on to discuss the current cyber threats as well as defense solutions related to the smart grid network infrastructure today. One of the most widely used defense mechanisms is the Intrusion Detection System (IDS), which has many important techniques that can be used in the AIS based IDS implementation of this report. The most commonly used AIS algorithms are defined. Specifically, the Negative Selection Algorithm (NSA) is used for our implementation. The NSA algorithm components used in the implementation section are thoroughly explained and the AIS based IDS framework is defined. A list of AIS usages/values in enterprise networks is presented as well as research on current NSA use in AIS implementations. The latter portion of the report consists of the design and implementation. Due to data constraints and various other limitations, the team wasn’t able to complete the initial implementation successfully. Therefore, a second implementation design was created, leading to the main implementation which meets the project’s objective. The implementation employs a proof of concept approach using a C# console application which performs all steps of an AIS on user created network data. In conclusion, the second implementation has the ability to detect intrusions in a synthetic dataset of “man-made” network data. This proves the AIS algorithm works and furthers the understanding that if the implementation was scaled up and used on real-time WAN network data it would run successfully and prevent attacks. The report also documents the limitations and problems one can run into when attempting to implement a solution of this scale. The ending sections of the report consists of the Requirements, Assessment, Assumptions, Results, and lessons learned followed by the Acknowledgments to MITRE Corporation which helped immensely throughout the development of the report.
- Assistive Voice AssistantSatnur, Abishek Ajai; Bruner, Charles (2024-05-09)This project is an extension of work that has been done in previous years on the sharkPulse website. sharkPulse was created due to the escalating exploitation of shark species and the difficulty of classifying shark sightings. Due to sharks’ low population dynamics, exploitation has only exacerbated the issue and made sharks the most endangered group of marine animals. sharkPulse retrieves sightings from several sources such as Flickr, Instagram, and user submissions to generate shark population data. The website utilizes WordPress , HTML, and CSS for the front end and R-Shiny, PostgreSQL, and PHP to connect the website to the back end database. The team was tasked with improving the general usability of the site by integrating dynamic data-informed visualizations. The major clients of the project are Assistant Professor Franceso Ferreti from the Virginia Tech Department of Fish and Wildlife Conservation and Graduate Research Assistant Jeremy Jenrette. The team established regular contact through Slack, scheduled weekly meetings online with both clients, and acquired access to all major code repositories and relevant databases. The team was tasked with creating dynamic and data-informed visualizations, general UI/UX improvements, and stretch goals for improving miscellaneous pages throughout the site. The team developed PHP scripts to model a variety of statistics by dynamically querying the database. These scripts were then sourced directly through the site via the Elementor WordPress module. All original requirements from the clients have been met as well as some stretch goals established later in the semester. The team created a Leaflet global network map of affiliate links which dynamically sourced the sharkPulse social network groups from an Excel spreadsheet and generated country border markers and links to each country’s social network sites as well as a Taxonomic Accuracy Table for the Shark Detector AI. The team created and distributed a survey form to collect user feedback on the general usability of the site which was compiled and sent to the client for future work.
- ATinstagramJeshong, Tashi; Joseph, Zubin; Barden, Mason; Halstead, Nicholas; Cho, Steve (Virginia Tech, 2022-05-09)For this project, we wanted to discover if and how hikers use the social media platform, Instagram, to talk about Leave No Trace (LNT) principles on the Appalachian Trail. Leave No Trace principles refer to a set of guidelines that hikers should follow in order to promote conservation on trails. The workflow to complete the project included: collecting relevant Instagram posts, performing sentiment analysis on these posts, and finally creating a series of graphs that show the different connections between posts. We started by utilizing Python, JSON objects, and Selenium to gather all of the Instagram posts with specific hashtags, such as “#AppalachianTrail” , “LeaveNoTrace”, and “LNT”. Selenium is used for the API calls, which retrieve the many Instagram posts. Information about each post, such as its geographic location, caption, and hashtag are extracted using JSON objects. The final two parts of the project include performing sentiment analysis on the collected posts and then visualizing the data in a variety of ways. For the sentiment analysis, we analyzed each caption of every post, and assigned it a score ranging from negative one to positive one. Negative one would represent a highly negative sentiment and positive one represents a highly positive sentiment. From there, we utilized the K-Means Clustering algorithm to gather posts with similar hashtags. For the visualizations, we displayed what tags occur in the same post, connections between different hashtags, and the geolocations of the different posts. The deliverables of our project include the source code that is used to scrape the Instagram posts, perform sentiment analysis, and visualize the data, along with several folders showing the results of our data collection. These results include the scraped Instagram posts, the sentiment analysis results, and the visualizations we created. These deliverables could help our client and those interested with research relating to Instagram, Leave No Trace principles, and the Appalachian Trail.
- Authoritative VenuesYoussef, Ali; Marku, Bella; Spicer, Tanner; Forst, Kyle (Virginia Tech, 2021)This submission details the progress made on the Authoritative Venues project. The goal of the Authoritative Venues project was to use machine learning algorithms to create a web application that can accurately recommend fitting ACM-related venues for Computer Science researchers trying to publish their work. By providing a ranked output list of publication venues related to a paper’s topic, we help researchers make more informed decisions about where to submit their work for publication. Additionally, we provide insight into the data collection, virtual machine setup, and website hosting process that allowed for this project to be easily accessible by anyone. This project is particularly useful for CS researchers wanting to gain insight into which ACM-related publication venue would best fit their paper. The recommender is hosted at authvenue.cs.vt.edu. On this website, there are two input fields that researchers can use to provide the title and abstract of their paper. Once this is inputted, researchers can submit this information and receive recommendations specifically catered to their work.
- Autism Support PortalQuayum, Sib; Galliher, Ryan; Nagies, Kenneth; Ritchie, Ayumi (Virginia Tech, 2018-05-08)The Autism Support Portal project involves the creation of a portal site that helps users find information they need about autism. The primary goal of the project is to help users quickly find credible information for their specific need. With the amount of information available online, it can be hard for those interested in autism to find information that is not only credible but useful and updated to reflect current research. The site needs to be easy to use both for the users and for the future administrators of the site. The site also needs to help guide people towards reliable resources while potentially exposing users to new resources. To ensure that our project meets the needs of our potential users, the project was divided into different phases involving data collection, research, design, and implementation. To gather data for our project, we used resources such as the Virginia Tech Center for Autism Research and their connections, to send out anonymous surveys to some of our potential users. We asked several questions pertaining to their interests in the site, what they needed from the site, and what resources were useful to them. This data allowed us to implement a site as specific to the user needs as possible while also giving us other resources to collect credible information from. In addition, Dr. Scarpa provided a lot of other resources that allowed us to solve some of the needs of users, with other resources allowing this project to focus entirely on the implementation of our search engine and the guiding of our users towards effective answers, solutions, and resources. Upon entering the site, users have direct access to the search and are provided with search tips and external resources to help them. The site is set up entirely using WordPress.org. WordPress was chosen to be the CMS or content management system for the site because it is very easy to use and allows administrators to do a lot for the site without the need for extensive technical knowledge. The site needs to be very easy to modify and change after its initial set up so that those who work on it at the Virginia Tech Center for Autism Research can do so quickly. However, using solely WordPress and its plugins created a variety of new obstacles stemming from the different uses of different plugins. To save time and money, research needed to be done on several different plugins to find the ones that not only met the needs of the site but that were also affordable. Even with these obstacles, using WordPress not only allows for easier creation and maintenance, but also easy modification of the site if additional features are wanted or needed. The design of the site allows users to find necessary information very quickly through alphabetically sorted lists that will expose the user to terms that may have been unknown previously. One of the problems with researching autism is asking the right questions. For example, a child with a special need such as autism needs an IEP or individualized education program, which requires a specific search for an IEP. When a user explores education information, the user also needs to be shown some specifics such as IEPs. This example also serves as an example of the need to have our site easily modifiable, as a change in law or name would require someone to change the resource in the site. Using the data and implementation techniques discussed, the end result portal is composed of help and resource pages as well as a refined search that links questions to reliable answers. In addition, the site is designed such that any user without prior technical experience can use the site and adjust the sites that are searched and any other information within the site that is changed.
- Automated Crisis Collection Builder - Final Project ReportBrian Hays; Alex Zhang; Mitchel Rifae; Trevor Kappauf; Parsa Nikpour (2023-11-30)In the contemporary digital landscape, access to timely and relevant information during crisis events is crucial for effective decision-making and response coordination. This project addresses the need for a specialized web application equipped with a sophisticated crawler system to streamline the process of collecting pertinent information related to a user-specified crisis event. The inherent challenge lies in the vast and dynamic nature of online content, where identifying and extracting valuable data from a multitude of sources can be overwhelming. This project aims to empower users by allowing them to input a list of newline-delimited URLs associated with the crisis at hand. The embedded crawler software then systematically traverses these URLs, extracting additional outgoing links for further exploration. Afterwards, the contents of each outgoing URL is then run through a predict function, which evaluates the relevance of each URL based on a scoring system ranging from 0 to 1. This scoring mechanism serves as a critical filter, ensuring that the collected web pages are not only related to the specified crisis event but also possess a significant degree of pertinence. We allow the user to set these thresholds, which enhances the efficiency of information retrieval by prioritizing content most likely to be valuable to the user's needs. Throughout the crawling process, our system tracks a range of statistics, including individual website domains, the origin of each child URL, and the average score assigned to each domain. To provide users with a comprehensive and visually intuitive experience, our user interface leverages React and D3 to display these statistics effectively. Moreover, to enhance user engagement and customization, our platform allows users to create individual accounts. This feature not only provides a personalized experience but also grants users access to a historical record of every crawl they have executed. Users are further empowered with the ability to effortlessly export or delete any of their previous crawls based on their preferences. In terms of deliverables, our project commits to providing fully developed code encompassing both frontend and backend components. Complementing this, we will furnish comprehensive user and developer manuals, facilitating seamless continuity for future students or developers who may build upon our work. Additionally, our final deliverables include a detailed report and a compelling presentation, serving the dual purpose of showcasing our team's progress across various project stages and providing insights into the functionalities and outcomes achieved.
- Automated ExercisesCunningham, Clayton; Mokuvos, Jacob; Li, Mingchi; Zhou, Shuhao; Liu, Shengwei (Virginia Tech, 2019-05-12)The goal of the Automated Exercises project is to create an automated assessment exercise framework which will allow instructors to build a number of different exercises for the Formal Languages course by uploading a JFLAP file to an OpenDSA textbook. The project will impact both instructors and students. Instructors will use it to build different exercises for students that will eliminate the time and effort to grade these exercises manually. Students will use these exercises to practice more on different topics, as time is available. The final product will eventually have to complete generating exercises, auto-grade exercises, and store students' answers and grades in an OpenDSA database. To complete the project, we needed to utilize some basic web design language, such as HTML or JavaScript. We had to complete being able to generate, complete, and grade the exercises for all the topics required, including NFA/DFA and PDA. However, the editor in Turing Machine needs more work. The platform should be easily manageable and configurable because the clients, Dr. Mostafa Mohammed and other instructors and students, could make heavy use of this software. We need to make it as easy as possible. The UI part of the platform is mostly designed, and we added more buttons and features to make it useful. However, aesthetics are not our focus since only the instructor would be dealing with our designed platform, whereas the student would be doing exercises on the OpenDSA platform. Data input is for the instructor to upload a JFLAP file to the site and it will be converted to JSON file for the auto grading system. Then, the auto grading system takes the answers of the student to check the correctness in each case, and the result will be shown underneath the graph in a table. The result of the test would be stored into an object inside the exercise grader and shown on the screen as a form of alert, containing attempts, test results, highest scores, and time consumed. The attached presentation and report give details of the project, including our milestones, objectives, methods and lessons learned.
- AWS Document RetrievalKim, Daniel; Durah, Fadi; Pleimling, Xavier; Le, Brandon; Hwang, Matthew (Virginia Tech, 2020-05-05)In the course CS5604 Information Retrieval, the class built a functioning search engine/information retrieval system on the Computer Science Container Cluster. The objective of the original project was to create a system that allows users to request Electronic Theses and Dissertations (ETDs) and Tobacco Settlement Documents using various fields, through their queries. The objective of our project is to migrate this system onto Amazon Web Services (AWS) so that the system can be stood up independently from Virginia Tech’s infrastructure. AWS was chosen due to its robust nature. The system itself needs to be able to store the documents in an accessible way. This was accomplished by setting up a pipeline that will stream data directly to the search engine using AWS S3 buckets. Each of the two document types were placed into their own S3 bucket. We set up an RDS instance for login verification. This database is used to store user information as they sign-up with the front-end application and will be referenced when the application is validating a user’s login attempt. This instance is publicly accessible and can connect to developer environments outside of the AWS group with the right endpoint and admin credentials. We worked with our client to set up an ElasticSearch instance to ingest the documents along with communicating and manage the health of the instance. This instance is accessible to all of us with permissions and we are able to manually ingest data using cURL commands in the command line. Once the login verification database and ElasticSearch search engine were properly implemented, we had to connect both components to the front-end application where users could create accounts and search for desired documents. After both were connected and all features were working properly, we used Docker to create a container for the front-end application. To migrate the front-end to AWS, we used the Elastic Container Registry (ECR) to push our front-end container image to AWS and store it in a registry. Then we used an ECS cluster running AWS Fargate, a serverless-compute engine for containers, to deploy the front-end to the network for all users to access. Additionally, we implemented data streaming using AWS Lambda so that new entries can be automatically ingested into our ElasticSearch instance. We note that the system is not in a fully demonstrable state due to conflicts with the expected data fields. However, the infrastructure around the various components is established and would just need proper data to read. Overall, our team was able to learn many aspects of standing up and building the infrastructure of the project on AWS, along with learning to utilize many different Amazon services. The new system serves as a functioning proof of concept that would allow a feasible alternative other than relying on Virginia Tech’s system.
- AWS Tobacco Settlement RetrievalSitaula, Anamol; Mekap, Abhinandan; Kanuri, Aditya; Bossart, Douglas; Pokharel, Nishan; Ray, Rahul (Virginia Tech, 2020-05-14)The Tobacco Industry is one of the largest and most influential industries. It has spent hundreds of millions of dollars on advertising and marketing tactics to ensure dominance and control in the economy. This is especially evident when considering tobacco settlement cases where the enormous power and influence of the Tobacco Industry has allowed them to develop key strategies and tactics for trials and settlement cases over the past century. Our client Dr. Townsend is currently researching the tactics and inner-workings of the Tobacco Industry over the past few decades to expose the marketing and legal strategies as well as the key players who have been influential in the Industry. Dr. Townsend is utilizing the “Truth Tobacco Industry Documents”, a library of documents created and facilitated by the UCSF Library for research purposes. Our project is meant to further enable researchers specializing in business, public health, law or computer science, who will benefit from easier access to tobacco settlement related documents, with enhanced search capabilities, extending the work of the Fall 2019 CS5604 Information Retrieval teams. We studied the 14 million tobacco related documents from UCSF. We improved upon the indexing of the roughly 8000 depositions, to support line-wise as well as page-wise indexing. We modified and updated existing Python scripts to output the results in the required JSON format, and then pushed the documents into ElasticSearch. Furthermore, we also created another tobacco index and added another 3 million tobacco files to this index. All testing and evaluation work was done using Python scripts. We used the existing Kibana tool for the visual representation of the data.
- Background Check for R4 OpSec, LLCHyres, Thomas; Tea, Zachary; Yang, Ted; Gray, Philippe; Springsteen, Timothy; Bierly, Alex (Virginia Tech, 2017-04-28)The main project deliverable was a website for R4 OpSec (r4opsec.com). The purpose of this website is to display information about the company’s services and be able to accept résumés for new hires. The company is owned by Joe Romagnoli and is based in Chantilly, VA. The company works in the field of background investigation checks for the federal, state, and local government, as well as the civilian sector. The background investigation process starts with a company or a government agency reaching out to independent companies that handle an investigation of a new hire to that company. A background investigation usually includes verifying identity, past employment, credit history, and criminal history. The process can take anywhere from a week to a month, depending on how quickly the company is able to verify a person’s information given what the person provides to the company (i.e., proof of past education, W2 forms, date of birth, etc.). The website has a home landing page that displays images and text. There is a section explaining what services the company provides. Another section to display a simple about-us description. Finally, there is a button that brings a user to another page to upload a résumé. There is an admin login page, too, where employees at R4 OpSec can view past submissions. An admin can download the résumé, delete the submission information, search past submissions, or mark submissions as “pending”, “accepted”, or “rejected”. The admin is also able to create new admin accounts, edit their email address, or change their password from the same screen. The client needed the website to be fully functional in about 90 days. The client did not have a basic design in mind. Though, the client did provide a basic website that we could reference for when we were thinking of designs for this website. In November, the client had purchased a year subscription from GoDaddy.com to host his website. We did raise concerns we thought the client should know about when it comes to shared web hosting, which we shall discuss in the report (Section 3.2.5). Lastly, the client wanted to make sure that this project would be expandable, and in the future, other groups or employees of R4 OpSec would be able to build upon what we delivered.