Browsing by Author "Hicks, Megan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Covid19 Data Webpage Design for Montgomery County, VA School SystemHicks, Megan (2021-12-08)The product was inspired by the current state of the pandemic. This school year, children and families immerged from lock-down to return to the classroom. With in-person instruction resuming, it was important to follow covid19 cases in schools so that children stay healthy. The current information available on the Montgomery County School Website was difficult to interpret, and doesn’t give the user options to view data over longer periods of time, or for specific schools within the school district. The goal of the project was to provide visualizations for several views of Covid19 data in the Montgomery County School District. These visualizations provide overviews of data for the whole school district. They give the current weeks’ total number of cases, as well as a time series showing levels within schools (as a whole and within geographic locations) from the beginning of the school year to today’s date. The project also provides visualizations that allow the user to select specific schools and/or school-levels to view. This information was embedded into a webpage so that it could be accessed by anyone interested in the data.
- CS 5604 2020: Information Storage and Retrieval TWT - Tweet Collection Management TeamBaadkar, Hitesh; Chimote, Pranav; Hicks, Megan; Juneja, Ikjot; Kusuma, Manisha; Mehta, Ujjval; Patil, Akash; Sharma, Irith (Virginia Tech, 2020-12-16)The Tweet Collection Management (TWT) Team aims to ingest 5 billion tweets, clean this data, analyze the metadata present, extract key information, classify tweets into categories, and finally, index these tweets into Elasticsearch to browse and query. The main deliverable of this project is a running software application for searching tweets and for viewing Twitter collections from Digital Library Research Laboratory (DLRL) event archive projects. As a starting point, we focused on two development goals: (1) hashtag-based and (2) username-based search for tweets. For IR1, we completed extraction of two fields within our sample collection: hashtags and username. Sample code for TwiRole, a user-classification program, was investigated for use in our project. We were able to sample from multiple collections of tweets, spanning topics like COVID-19 and hurricanes. Initial work encompassed using a sample collection, provided via Google Drive. An NFS-based persistent storage was later involved to allow access to larger collections. In total, we have developed 9 services to extract key information like username, hashtags, geo-location, and keywords from tweets. We have also developed services to allow for parsing and cleaning of raw API data, and backup of data in an Apache Parquet filestore. All services are Dockerized and added to the GitLab Container Registry. The services are deployed in the CS cloud cluster to integrate services into the full search engine workflow. A service is created to convert WARC files to JSON for reading archive files into the application. Unit testing of services is complete and end-to-end tests have been conducted to improve system robustness and avoid failure during deployment. The TWT team has indexed 3,200 tweets into the Elasticsearch index. Future work could involve parallelization of the extraction of metadata, an alternative feature-flag approach, advanced geo-location inference, and adoption of the DMI-TCAT format. Key deliverables include a data body that allows for search, sort, filter, and visualization of raw tweet collections and metadata analysis; a running software application for searching tweets and for viewing Twitter collections from Digital Library Research Laboratory (DLRL) event archive projects; and a user guide to assist those using the system.