CS4624: Multimedia, Hypertext, and Information Access

Permanent URI for this collection

This collection contains the final projects of the students in in the course Computer Science 4624: Multimedia, Hypertext, and Information Access, at Virginia Tech. This course, taught by Professor Ed Fox, is part of the Human-Computer Interaction track, the Knowledge, Information, and Data track, and the Media/Creative Computing track. The curriculum introduces the architectures, concepts, data, hardware, methods, models, software, standards, structures, technologies, and issues involved with: networked multimedia (e.g., image, audio, video) information, access and systems; hypertext and hypermedia; electronic publishing; virtual reality. Coverage includes text processing, search, retrieval, browsing, time-based performance, synchronization, quality of service, video conferencing and authoring.

Browse

Recent Submissions

Now showing 1 - 20 of 284
  • ScrapingGenAI
    James Do; Heewoon Bae; Julius Colby (2024-05-10)
    AI has been widely used for many years and has been a constant front-page news topic. The recent but fast development of generative AI inspired many conversations, from concerns to aspirations. Understanding how the topic develops and when people become more supportive of generative AI is critical for social scientists to pinpoint which developments inspire public discussions. The use of generative AI is relatively new. The data and insight gathered could be used to determine if use in a commercial setting (like in Travel/Hospitality) is viable and what the potential feedback from the public might look like. We developed two specialized web scrapers. The first targets specific keywords within Reddit subreddits to gauge public opinion, and the second extracts discussions from corporate earnings calls to capture the business perspective. The collected data were then processed and analyzed using Python libraries, with visualizations created in Matplotlib, Pandas, and Tkinter to depict trends through line charts, pie charts, and bar charts. We limited our analysis period from August 2022 to March 2024, which is significant as ChatGPT was released in November 2022, allowing us to observe notable changes. These tools not only show changes in public interest and sentiment but also provide a graphical representation of temporal shifts in the perception of AI technologies over time. The final product is designed for anyone interested in company transcripts and in comparing them to the public perspective. The product offers users access to detailed data representations, including numerical trends and visual summaries to further understand the correlation between the company and the public. This comprehensive overview assists in understanding how public and corporate sentiments towards AI have shifted during a recent 20-month period. A significant hurdle was using the PRAW API for Reddit data scraping. Through review of documentation, tutorials, and additional support from a teaching assistant, we successfully implemented the functionality needed to extract and process the data from subreddits effectively. To make our findings more accessible and engaging, future additional work transforming this product into a fully functional website would be beneficial. This platform would make the insights more readily available to a wider audience, including the general public and industry stakeholders. Doing so could enhance the impact and usefulness of our project.
  • Assistive Voice Assistant
    Satnur, Abishek Ajai; Bruner, Charles (2024-05-09)
    This project is an extension of work that has been done in previous years on the sharkPulse website. sharkPulse was created due to the escalating exploitation of shark species and the difficulty of classifying shark sightings. Due to sharks’ low population dynamics, exploitation has only exacerbated the issue and made sharks the most endangered group of marine animals. sharkPulse retrieves sightings from several sources such as Flickr, Instagram, and user submissions to generate shark population data. The website utilizes WordPress , HTML, and CSS for the front end and R-Shiny, PostgreSQL, and PHP to connect the website to the back end database. The team was tasked with improving the general usability of the site by integrating dynamic data-informed visualizations. The major clients of the project are Assistant Professor Franceso Ferreti from the Virginia Tech Department of Fish and Wildlife Conservation and Graduate Research Assistant Jeremy Jenrette. The team established regular contact through Slack, scheduled weekly meetings online with both clients, and acquired access to all major code repositories and relevant databases. The team was tasked with creating dynamic and data-informed visualizations, general UI/UX improvements, and stretch goals for improving miscellaneous pages throughout the site. The team developed PHP scripts to model a variety of statistics by dynamically querying the database. These scripts were then sourced directly through the site via the Elementor WordPress module. All original requirements from the clients have been met as well as some stretch goals established later in the semester. The team created a Leaflet global network map of affiliate links which dynamically sourced the sharkPulse social network groups from an Excel spreadsheet and generated country border markers and links to each country’s social network sites as well as a Taxonomic Accuracy Table for the Shark Detector AI. The team created and distributed a survey form to collect user feedback on the general usability of the site which was compiled and sent to the client for future work.
  • SharkPulse App
    Hagood, Mia; Warner, Patrick; Tran, Anhtuan Vuong (2024-05-09)
    This project is an extension of work that has been done in previous years on the sharkPulse website. sharkPulse was created due to the escalating exploitation of shark species and the difficulty of classifying shark sightings. Due to sharks’ low population dynamics, exploitation has only exacerbated the issue and made sharks the most endangered group of marine animals. sharkPulse retrieves sightings from several sources such as Flickr, Instagram, and user submissions to generate shark population data. The website utilizes WordPress , HTML, and CSS for the front end and R-Shiny, PostgreSQL, and PHP to connect the website to the back end database. The team was tasked with improving the general usability of the site by integrating dynamic data-informed visualizations. The major clients of the project are Assistant Professor Franceso Ferreti from the Virginia Tech Department of Fish and Wildlife Conservation and Graduate Research Assistant Jeremy Jenrette. The team established regular contact through Slack, scheduled weekly meetings online with both clients, and acquired access to all major code repositories and relevant databases. The team was tasked with creating dynamic and data-informed visualizations, general UI/UX improvements, and stretch goals for improving miscellaneous pages throughout the site. The team developed PHP scripts to model a variety of statistics by dynamically querying the database. These scripts were then sourced directly through the site via the Elementor WordPress module. All original requirements from the clients have been met as well as some stretch goals established later in the semester. The team created a Leaflet global network map of affiliate links which dynamically sourced the sharkPulse social network groups from an Excel spreadsheet and generated country border markers and links to each country’s social network sites as well as a Taxonomic Accuracy Table for the Shark Detector AI. The team created and distributed a survey form to collect user feedback on the general usability of the site which was compiled and sent to the client for future work.
  • Case Studies Library
    O'Such, Joseph; Woody, Jonathan; Fields, Eliza; Jaldi, Hamza (2024-05-09)
    The purpose of this project is to create an online repository for the CS 3604 course at Virginia Tech. This course, Professionalism in Computing, has students complete case studies on various ethical issues. The issues range from historical supreme court cases to ongoing struggles. Each year nearly 300 such studies are conducted. There should be a mechanism in place to store these studies so the repository is easy to navigate and search. Previous work attempted to use a preexisting digital library tool hosted on AWS to implement this repository. Over time, the CS 3604 copy became out of sync and out of date, leading to a mountain of issues. Initially, this group sought to overcome those issues and stay with the previous approach. After attempting to resolve those issues, the group met with a software engineer from the team supporting the original digital library platform. This resulted in a switch to a custom website, built from scratch, to host the CS 3604 repository. The new full stack website used React.js, Express.js, Node.js, and MongoDB to accomplish this goal. Due to the late start, the group created a preliminary website architecture, before breaking into tasks of frontend development, application development, backend work, and authentication. The new repository offers a user profile to each student in the capstone class that is accessed via a Microsoft login linked to their Virginia Tech account. Each user can upload a title, list of tags, and PDF document showcasing their case study. The rest of the site is publicly accessible and can be searched by title and tags. The searching features are less sophisticated compared to the prior website. However, the new website has the advantages of user login, linking of case studies to users via login, and easier maintainability.
  • Language and Sentiment Analysis of Extremist Behavior in Online Game Communities
    McBride, Liam; Lanigan, Daniel; Neps, Renzo (2024-05-08)
    Language and Sentiment Analysis of Extremist Behavior in Online Game Communities was a Multimedia, Hypertext, and Information Access capstone project to assist the VT Gamer Lab in gathering more data to analyze links between online video game communities and extremist behavior. Specifically military simulation games (referred to as milsims) were analyzed due to the inherently political and violent nature of the gameplay. The deliverables for the project were a community forum and YouTube scraper, cleaned data, visualizations, and sentiment analysis. We collected large datasets from both the community forums and YouTube, successfully cleaned the data, and did an analysis to create interesting visualizations. Sentiment analysis was originally going to be conducted with the client but was delayed past the submission of our report so we created our own analysis methods to produce interesting visualizations. Two of these visualizations showed us that the potentially extremist language on both platforms is very similar, both in word choice and frequency. This suggests that the communities define the language more than the platform they exist on.
  • Knowledge Graph Building
    Hao, Qianxiang; Xing, Haoran (2024-05-09)
    Our team’s main objective was to expand the Virtuoso database by integrating a comprehensive dataset of 500,000 enriched Electronic Theses and Dissertations (ETDs). We built upon the preliminary framework of 200 XML records used for initial testing. This database expansion would enable the developers to deploy more robust testing and analysis of the current Knowledge Graph database. Additionally, our team focused on standardizing the data expansion process, ensuring that future developers have a consistent and reliable foundation for their work. The current Knowledge Graph was established with the Virtuoso graph database system. We primarily worked on four steps to expand the KG database, including inserting Object IDs into each element in XML files, converting XML files to RDF triples, uploading RDF triples to the Virtuoso database, and URI resolution. We leveraged the power of Python, along with its robust libraries (rdflib, sparqlwarpper, requests, xmltodict, Node.js, NPM, tkinter) and tools (REST API, Docker) to execute these steps. Initially, our team successfully tested the data expansion process on a local Virtuoso instance to ensure the functionality and correctness of the expanding procedure. We prepared to deploy the process on the Virtuoso database within the Endeavour cluster upon confirmation. Although we successfully expanded the database by 333 ETDs, we were unable to reach our target of 500,000 ETDs due to a shortage of XML data. This limitation made us refocus our efforts on refining the data expansion process for better standardization and future scalability. We streamlined the data expansion process by integrating the Object ID insertion, data conversion, and data uploading processes into a single GUI application, creating a more straightforward and compact workflow. This visual interface would enhance usability for future developers and teams.
  • AgInsuranceLLMs
    Shi, Michael; Rajesh, Saketh; Truong, An; Hilgenberg, Kyle (2024-05-09)
    Our project is to develop a conversational assistant to aid users in understanding and choosing appropriate agricultural insurance policies. The assistant leverages a Large Language Model (LLM) trained on datasets from the Rainfall Index Insurance Standards Handbook and USDA site information. It is designed to provide clear, easily understood explanations and guidance, helping users navigate their insurance options. The project encompasses the development of an accessible chat interface, backend integration with a Flask API, and the deployment of the assistant on Virginia Tech's Endeavour cluster. Through personalized recommendations and visualizations, the assistant empowers users to make well-informed decisions regarding their insurance needs. Our project report and presentation outline the project's objectives, design, implementation, and lessons learned, highlighting the potential impact of this interactive conversational assistant in simplifying the complex process of selecting agricultural insurance policies.
  • Tweet Collections
    Kolakaleti, Sushen; D'Alessandro, Kevin; Narantsatsralt, Enk; Mruz, Ilya; Lam, Chris (Virginia Tech, 2024-05-09)
    For a series of various Virginia Tech research projects related to Dr. Andrea Kavanaugh, more than six billion tweets between the years 2009-2024 were collected to be used for research purposes. These tweets cover many topics, but primarily focus on trends and important events that occurred during the time period. These tweets were collected in three different formats: Social Feed Manager (SFM), yourTwapperKeeper (YTK), and Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT). The original focus of the project was to convert these tweets into a singular format (JSON) to make tweet access easier and simplify the research process. The team in the Fall of 2021 consisting of Yash Bhargava, Daniel Burdisso, Pranav Dhakal, Anna Herms, and Kenneth Powell were the first to take on this project and managed to finish the process of writing the initial Python scripts used to convert the three tweet formats to JSON. They originally provided six different Python scripts, two for each of the three tweet formats, one for the individual schema and the other for the collection level schema. However, large parts of these Python scripts were highly unoptimized and would take an unreasonably long time to run. Thus, the team in Spring of 2022 consisting of Matt Gonley, Ryan Nicholas, Nicole Fitz, Griffin Knock, and Derek Bruce took on the project and managed to optimize a portion of the original Python scripts in addition to implementing a BERT-based machine learning model used to classify the tweets. They adjusted the scripts to better accommodate scale and were able to begin the tweet conversion process, getting through about 800 million of the roughly 6 billion tweets collected. This project was taken over again in Spring of 2024, and began by writing additional automation scripts to simplify the process and reduce the amount of work that had to be done manually for the SFM conversion process. In addition to writing new scripts, our team updated some of the scripts done by the past team, to better suit our uses. We exported 45 collections from the SFM machine and were able to convert 9,744,468 tweets from SFM. Regarding DMI_TCAT and YTK, the raw SQL files needed to be transferred to a new database in order to convert the remaining tweets. This process was begun for DMI and YTK at the Digital Library Research Laboratory, located in room 2030 at Torgerson Hall, and will be continued into Summer 2024. Regarding the machine learning aspects of the project, we implemented a new hate speech classifier, due to the prevalence of hate speech on the internet. We ran a test with both a GloVe model and a BERT model with a Naive Bayes classifier, before ultimately settling on the GloVe model due to the speed being significantly faster while still providing enough accuracy to be useful.
  • Chapter Classification and Summarization
    Jackson, Miles; Zhao, Yinhjie (2024-05-07)
    The US corpus of Electronic Theses and Dissertations (ETDs), partly captured in our research collection numbering over 500,000, is a valuable resource for education and research. Unfortunately, as the average length of these documents is around 100 pages, finding specific research information is not a simple task. Our project aims to tackle this issue by segmenting our sample of 500,000 ETDs, and providing a web interface that provides users with an application that summarizes individual chapters from the previously segmented sample. The first step of the project was to verify that the automatic segmentation process, performed in advance by our client, could be relied upon. This required each team member to analyze 50 segmented documents and verify their integrity by confirming that each chapter was correctly identified and separated into a PDF. During this process, we noted any peculiarities, to identify recurring issues and improve the segmentation process. The rest of our time and effort went into creating an efficient web interface that would allow users to upload ETD chapters and display said chapter’s summary and classification results. We were able to complete a web interface that allows a user to upload an ETD chapter PDF from the sampled ETD database and view the summary of the PDF along with all of the metadata (author, title, publication date, etc.) of the associated ETD. Additionally, the group verified approximately 60 of the automatically segmented documents and detailed any errors or peculiarities thoroughly. Our group delivered both the web interface as a GitHub repository and an Excel spreadsheet detailing the complete results of our segmentation verification process. The interface was designed to be used in aiding research on ETDs. Although this application won’t be available publicly, researchers may use it privately to assist with any ETD research projects they participate in. The web interface uses Streamlit, which is a Python framework for web development. This was the first time anyone in the group had used Streamlit, so we had to learn each feature that we used, which caused quite a few issues. However, quickly searching and accessing the metadata database, which was originally an Excel sheet with 500,000 entries, posed the biggest threat to the usability of our interface. Luckily, we were able to solve all issues through the use of API documentation, our client, Bipasha Banerjee, and our extremely helpful instructor, Professor Edward A. Fox. In terms of technical skills, we have learned how to operate a Streamlit web interface as well as how to use MySQL. However, we also learned a few life lessons. Firstly, do not use the first tool available when attempting to solve a solution. It is wise to take extra time to search for the best tool for a given situation instead of wasting time compensating for using the wrong tool. Secondly, life happens without regard and without warning, but the best move is to reanalyze the situation and push forward to complete the work that must be done.
  • CTE Website
    Sheik, Smera; Alavala, Vaishnavi; Waheed, Aren; Strine, Corbin (Virginia Tech, 2024-05-08)
    The Computational Tissue Engineering program at Virginia Tech is an interdisciplinary program that allows graduate students to learn about the following fields: Tissue Engineering, Computational Science, and Molecular and Cell Biology. The vision of the CTE program is for students to feel better equipped about these disciplines and act as trained professionals that can both develop and help push the boundaries of these disciplines. The current CTE website was created a decade ago with a software system called Basecamp by our client Dr. Murali. Throughout the years, it was established that the CTE website became more difficult to update due to newer releases and versions of Basecamp and PHP. Our goal for this project was to update the current CTE website to a modern framework that would allow for an easier to update interface. Our methodology to update the current CTE website started with choosing a web development system that would best fit the needs and requirements of this CTE website. The capabilities of WordPress and additive functionalities led us to choose WordPress as our web development system. Our methodology to update Dr. Murali’s research website involved understanding the layout and overview of how the website currently looks, researching the UI of other research websites, and creating the research website on WordPress. The outlined project deliverables involved understanding the pros and cons of choosing a web development system that would leverage the capabilities of easier maintenance with a refined layout, implementing a bare bones CTE website, and implementing additional features including building Dr. Murali’s research website. Throughout the project, group members worked together to understand the front-end and back-end aspects of the project including researching specific plugins to use that would best fit the feasibility of the website, building of Figma wireframes, form creation, migration of past CTE website pages to the new CTE website, and testing both the functionalities of the CTE website and Dr. Murali’s research website. Final URL for CTE Website: https://wordpress.cs.vt.edu/cteigep/ Final URL for Dr. Murali's Personal Website: https://wordpress.cs.vt.edu/tmmurali/research/
  • Support BRANCH
    Akshath Majumder; Luke Marks; Nihar Satasia; Shreyas Sakhalkar (2024-05-06)
    Support BRANCH, formerly known as Mobile Parenting App, is the outcome of a project initially planned for migrating the Treks mobile app to AWS. Financial challenges at Thrust Interactive led to the transfer of data of the Treks App to Virginia Tech. However, further challenges at Thrust Interactive rendered the project defunct. So, Support BRANCH came about to attempt to achieve the same goals as Treks. Support BRANCH is a web application that helps parents manage behavioral disorders in children, including ASD, ADHD, ODD, and general behavior problems. Parents complete weekly educational adventures that reward parents with badges upon completion. The technical setup includes a front-end TALL stack—Tailwind CSS, Alpine.js, Laravel, and Livewire—and a back-end LAMP stack composed of Linux, Apache, MySQL, and PHP. Laravel is used as the main PHP framework, with Laravel Blade for building web pages. The environment runs on Docker Compose.
  • NeuroVeTele
    Yang, Danny; Lopez, Miguel; Song, Puchuan (2024-05-06)
    The nervous system, a complex anatomical structure, is crucial for sensory perception in animals, including sight, hearing, smell, taste, touch, and pain. When issues occur in regards to these senses in pets, veterinarians often utilize an examination called neurolocalization to pinpoint these issues to specific parts of a pet's nervous system. Understanding neurolocalizations is vital in veterinary medicine for diagnosing nervous system disorders in pets. To streamline this process, a new mobile and desktop application called NeuroVeTele was developed under the guidance of Dr. Richard Shinn, a neurology professor at Virginia Tech. This innovative tool assists veterinarians by using a weighted system to provide accurate neurolocalizations based on user inputs. NeuroVeTele features a easily usable front-end for input selection and a back-end that calculates the best neurolocalization based on these inputs. The front-end and back-end are connected by a Model-View-Controller (MVC) architecture to provide dynamic feedback to users. This application helps veterinarians transition to a paperless medium and aids in devising prognosis plans. The latest version includes a point system for behavior based on Dr. Shinn's research and expertise. The front-end has a basic user interface, and there is a beta version of a customizable point system for user adjustments.
  • Practice 10k Music App
    Bass, Rylen; Kim, Minwu; Lee, Johan; Ylagan, Jillian (Virginia Tech, 2024-05-07)
    Most musicians strive to practice their instrument every day, which warrants a comprehensive companion app. Such an application should allow users to log their practice sessions and keep them motivated, among other useful features. The Practice 10k App continues in its development to bring these features to aspiring musicians. As of the prior development team, users can currently use the app to create customizable profiles, log their practice sessions, and plan future practice sessions. Our team has continued development by adding a metronome feature, switching the backend service, and fixing database issues. Practice 10k will teach beginner, intermediate, and professional musicians new ways to practice and hold themselves accountable in their musical studies.
  • PromptLibrary
    Wang, Ziyan; Hoang, Brandon; Shin, Gabriel; Shin, Daniel (2024-05-06)
    The rapid advancement and integration of Large Language Models (LLMs) in academic research underscores the critical need for specialized, context-rich training datasets. The PromptLibrary project is designed to address this gap by establishing a comprehensive library of prompts tailored for academic libraries. This initiative aims to amass a wide array of instructional prompts, thereby forming an essential instruction dataset to enhance the utility and relevance of LLMs within scholarly domains. By encapsulating real-world, academic-specific inquiries and scenarios, this dataset is poised to significantly improve LLMs' learning capabilities and adaptability. The project features a web-based, searchable repository that allows for the submission and retrieval of high-quality prompts, ensuring a robust quality assurance mechanism for prompt validation. This repository not only serves as a critical resource for prompt tuning LLMs but also fosters a collaborative environment for librarians, educators, and researchers, thereby advancing the narrative and utility of LLMs in academic settings.
  • Episodic Future Thinking Chatbot
    Buxton, Tyler; Gomez, Aaron; Melini, Jared; Johnson, Patrick (Virginia Tech, 2024-05-09)
    In healthcare, addressing lifestyle diseases such as obesity and type 2 diabetes through innovative methods is crucial. This project introduces a user interface (UI) for an artificial intelligence (AI) chatbot designed to enhance episodic future thinking (EFT). EFT encourages patients to visualize future scenarios, aiding in decision-making processes that favor long-term health benefits over immediate pleasures. Utilizing AI, this chatbot aims to deliver accessible, high-quality care, helping users focus on significant health goals. Initially, the project intended to adapt an existing codebase for UI development. However, it became clear that creating a new UI from scratch would better meet our specific needs. This new interface includes four main pages: Login, Chatbot Interaction, Usability Assessment, and an About page. Each page is carefully crafted using React JavaScript to ensure dynamic user interactions, complemented by Cascading Style Sheets for aesthetic design. Python is used to facilitate connectivity with the client’s chatbot backend and to manage a database that stores user information. Users begin their journey on the website by logging in with credentials obtained after completing a demographic survey via Qualtrics, where they also consent to participate in the study. Subsequently, users interact with a GPT-4 powered chatbot, which guides them through personalized future-thinking scenarios based on their inputs. After the interaction, users assess the vividness of their imagined scenarios and the quality of their chat experience, rating their responses. This data, along with chat logs, is stored for analysis and further enhancements. Finally, participants provide detailed feedback through a structured numerical assessment, contributing to continuous improvement of the chatbot’s effectiveness. Through a meticulous design and iterative testing process, this project not only addresses critical health issues but also sets a new standard for integrating AI with user-centric healthcare solutions.
  • Parking Spaces Occupancy Prediction
    Across Virginia Tech’s campus, finding parking is consistently a source of frustration for students and faculty. During peak hours, locating free parking spots becomes a challenging task; leading to significant delays and increased traffic around campus. Leveraging modern data-driven technologies such as Smart City infrastructure and Intelligent Transportation, we can alleviate some of the school’s congestion and enhance the parking experience for Virginia Tech residents. The proposed solution is a web app that users can integrate into their daily commute. With the help of live data, the app will give real-time parking recommendations as well various other helpful insights. It will analyze the live data at each of the garages, to predict the occupancy of the garages at a given time of arrival. Machine learning will allow us to estimate the occupancy of each of the garages a given time into the future, depending on the distance to each garage, and provide a recommendation for which garage to target. The application will also allow for more effective collection of data for parking services and could eventually take into account more factors such as schedules and live traffic.
  • Visualizing eTextbook Study Sessions
    Lily Chiang, Arjun Vellanki, Egor Lukiyanov, Tuan Chau, Kavya Polina (2023-11-30)
    OpenDSA is an online platform that allows professors to create e-Textbooks with fundamental CS courses. Our project seeks to enhance OpenDSA by providing instructors with a user-friendly web interface and visualization tool. This tool allows them to understand student interactions during study sessions, in the areas of: Reading, Visualizations, and Exercises. The tool could lead to improvements in a student’s learning process. OpenDSA is heavily used at Virginia Tech and other universities for CS courses. It records student interactions with learning materials but doesn’t have an efficient way for instructors to understand these interactions. Our project tackles this issue by developing a web interface that visualizes student interactions. We expand upon past research by sorting interactions into Reading, Visualizations, and Exercises, displaying detailed study session data. These visualizations will give insight into whether students are active learners or credit-seekers.
  • Mass Shooting Digital Library
    In light of the escalating prevalence of mass shootings in the U.S., there is an urgent need for a structured digital repository to centralize, categorize, and offer detailed analyses of these events. This project aims to develop a comprehensive website functioning as a digital library. This library will house mass shooting objects where each object symbolizes a specific mass shooting event, elaborating on who, what, when, where, why, and how. The website's central features will include the ability to visualize and compare various mass shooting incidents, facilitating a broader understanding of trends, patterns, and anomalies. Users will be able to explore the data via geographic visualizations, timelines, and more, providing an immersive and informative experience. Underpinning the platform, our backend system will utilize Python, Flask, and MongoDB, ensuring robust data collection and management. This data includes information fields, URL sources associated with each event, and more. On the front end, technologies like NextJS, React, and Javascript will drive the user interface, supported by essential libraries such as React Chrono and Leaflet.js for advanced visualization. Deployment will be executed via Firebase or AWS for the frontend and Heroku for the backend. Two primary user categories have been identified: general users, who can view the data, and administrators, who can modify the contents. Ensuring the integrity of the data input, admin access will be safeguarded by authentication processes. In summary, this digital library emerges as a timely and crucial initiative in response to the rising tide of mass shootings in the U.S. This project aims to provide comprehensive insights into the tragic events that have marked the nation. Beyond its functional capabilities, the digital library strives to improve understanding, awareness, and ultimately, change in the narrative surrounding mass shootings.
  • Automated Students' short answers assessment
    Padath, Mathew; Wang, Wenmiao; Jiang, Westin; McGovern, Ryan; Wan, Yifei (2023-11-26)
    The objective of this innovative project was to create an automated web application for the assessment and scoring of computer science-related short answers. This solution directly addresses the often labor-intensive and time-consuming process of manually grading written responses, a challenge that educators across various academic disciplines frequently encounter. The developed web application stands out not just for its efficiency but also for its versatility, being applicable to a wide range of subjects beyond computer science, provided that appropriate teacher answer files are supplied. At the heart of the application lies a user-friendly interface created using ReactJS. This frontend allows educators to seamlessly upload 'teacher' and 'student' files in .tsv format. Following the upload, the application's backend, developed using Flask, takes over. It processes these submissions by comparing student responses against predefined model answers. The scoring mechanism of the application is particularly noteworthy. It employs an advanced semantic analysis approach, utilizing a pre-existing deep learning model, RoBERTa Large. This model is integral to the AutoGrader class, which is responsible for the semantic evaluation of the text. The grading logic embedded within the AutoGrader class is both innovative and sophisticated. It assesses student responses by breaking them down into phrases and then computing the semantic similarity between each phrase and the concepts outlined in the model answers. The process employs SentenceTransformer to generate text embeddings, allowing for a nuanced evaluation based on cosine similarity between vector representations. This method ensures a grading system that transcends simple keyword matching, delving into the semantic content and understanding of the student answers. The application boasts several key features that enhance user experience and provide educators with comprehensive insights into student performance. These include the ability to display scores and grades directly on the web application, download detailed Grade Reports that include each question, student's response, the grade awarded, and the model answer. Additionally, the application allows for the viewing of previous submissions and the downloading of historical documents such as past versions of 'teacher file', 'student file', and grade reports. In terms of future development, the project team has outlined several ambitious goals. These include implementing a dataset-driven strategy for enhancing the training of deep learning models, thereby significantly advancing the current framework. Another focus will be on allowing for a variety of file types to be uploaded for both teacher and student files, thereby increasing the accessibility and usability of the system. Lastly, there are plans to update the functionality and appearance of the web application, incorporating features such as scrolling, standardized formatting, and improved design elements to enhance the overall user experience. The project was developed with the invaluable guidance and support of Dr. Mohamed Farag, a research associate at the Center for Sustainable Mobility at Virginia Tech. Dr. Farag's expertise in computer science and his commitment to educational innovation have been instrumental in steering the project towards success. In conclusion, this project marks a significant advancement in the field of educational technology, particularly in the realm of academic grading. By leveraging the power of artificial intelligence and modern web technologies, it provides an efficient, reliable, and versatile tool for educators, streamlining the grading process and offering a scalable solution adaptable to various academic contexts. The future developments outlined promise to further enhance the capabilities of this already impressive tool, pointing towards a new era in academic assessment.
  • Automated Crisis Collection Builder - Final Project Report
    Brian Hays; Alex Zhang; Mitchel Rifae; Trevor Kappauf; Parsa Nikpour (2023-11-30)
    In the contemporary digital landscape, access to timely and relevant information during crisis events is crucial for effective decision-making and response coordination. This project addresses the need for a specialized web application equipped with a sophisticated crawler system to streamline the process of collecting pertinent information related to a user-specified crisis event. The inherent challenge lies in the vast and dynamic nature of online content, where identifying and extracting valuable data from a multitude of sources can be overwhelming. This project aims to empower users by allowing them to input a list of newline-delimited URLs associated with the crisis at hand. The embedded crawler software then systematically traverses these URLs, extracting additional outgoing links for further exploration. Afterwards, the contents of each outgoing URL is then run through a predict function, which evaluates the relevance of each URL based on a scoring system ranging from 0 to 1. This scoring mechanism serves as a critical filter, ensuring that the collected web pages are not only related to the specified crisis event but also possess a significant degree of pertinence. We allow the user to set these thresholds, which enhances the efficiency of information retrieval by prioritizing content most likely to be valuable to the user's needs. Throughout the crawling process, our system tracks a range of statistics, including individual website domains, the origin of each child URL, and the average score assigned to each domain. To provide users with a comprehensive and visually intuitive experience, our user interface leverages React and D3 to display these statistics effectively. Moreover, to enhance user engagement and customization, our platform allows users to create individual accounts. This feature not only provides a personalized experience but also grants users access to a historical record of every crawl they have executed. Users are further empowered with the ability to effortlessly export or delete any of their previous crawls based on their preferences. In terms of deliverables, our project commits to providing fully developed code encompassing both frontend and backend components. Complementing this, we will furnish comprehensive user and developer manuals, facilitating seamless continuity for future students or developers who may build upon our work. Additionally, our final deliverables include a detailed report and a compelling presentation, serving the dual purpose of showcasing our team's progress across various project stages and providing insights into the functionalities and outcomes achieved.