Automated Crisis Collection Builder - Final Project Report

Abstract

In the contemporary digital landscape, access to timely and relevant information during crisis events is crucial for effective decision-making and response coordination. This project addresses the need for a specialized web application equipped with a sophisticated crawler system to streamline the process of collecting pertinent information related to a user-specified crisis event. The inherent challenge lies in the vast and dynamic nature of online content, where identifying and extracting valuable data from a multitude of sources can be overwhelming. This project aims to empower users by allowing them to input a list of newline-delimited URLs associated with the crisis at hand. The embedded crawler software then systematically traverses these URLs, extracting additional outgoing links for further exploration. Afterwards, the contents of each outgoing URL is then run through a predict function, which evaluates the relevance of each URL based on a scoring system ranging from 0 to 1. This scoring mechanism serves as a critical filter, ensuring that the collected web pages are not only related to the specified crisis event but also possess a significant degree of pertinence. We allow the user to set these thresholds, which enhances the efficiency of information retrieval by prioritizing content most likely to be valuable to the user's needs. Throughout the crawling process, our system tracks a range of statistics, including individual website domains, the origin of each child URL, and the average score assigned to each domain. To provide users with a comprehensive and visually intuitive experience, our user interface leverages React and D3 to display these statistics effectively. Moreover, to enhance user engagement and customization, our platform allows users to create individual accounts. This feature not only provides a personalized experience but also grants users access to a historical record of every crawl they have executed. Users are further empowered with the ability to effortlessly export or delete any of their previous crawls based on their preferences. In terms of deliverables, our project commits to providing fully developed code encompassing both frontend and backend components. Complementing this, we will furnish comprehensive user and developer manuals, facilitating seamless continuity for future students or developers who may build upon our work. Additionally, our final deliverables include a detailed report and a compelling presentation, serving the dual purpose of showcasing our team's progress across various project stages and providing insights into the functionalities and outcomes achieved.

Description

This submission contains the complete final project parameters for our CS4624 Capstone Effort. You will find our composite Final Report, Final Presentation, and Source Code in the accompanying files.

Keywords

Fullstack Application, Flask, React, Javascript, Dockerfile, Capstone, Python, Final Report, Final Presentation, CS4624, Multimedia, Hypertext, Information Access

Citation