Object Detection and Document Accessibility
Electronic Theses and Dissertations (ETDs) are the primary way that students and professors write down and report their degree research. They allow new minds to understand where that field of study was left off, and how to continue the work that has been left. However, since many of the ETDs uploaded onto the internet are presented via PDF, it's difficult for users to view these ETDs in an effective manner, especially when you consider potential students with disabilities such as visual impairments. The goal of this project was to extend upon the previous work that has been done to make a Flask-based web application so that we can transform these long documents into something much more readable, user-friendly, and accessible via HTML rather than PDF. Also, our goal was to apply an algorithm to the returned bounding boxes that come from the object detection model to make sure that separate paragraphs and references are placed into their own box for correct XML generation on the website. To make the application's UI usable, we have applied a few changes to improve the experience. We have created the option for users to download the paper via PDF or XML, have a side-bar on the left of the website that contains a dynamic table of contents to jump to whatever part of the paper you select, and have a side-bar view on the right of the website that contains the original PDF so that any errors in our application don't ruin the user's understanding. We plan for future contributors to add a dark mode and dyslexic-friendly font. Lots of accessibility features will be added via HTML/CSS/React through improving the UI, but what's also included is the option to use an on-screen reader. Our project focuses on using NVDA, a popular screen reader, to allow for users with potential visual impairments to be able to listen along to the ETD instead. This was studied thoroughly throughout the course of this project. Finally, for the algorithms side of the project, the focus has been to improve upon the returned bounding boxes from the object detection models to separate paragraph and reference bounding boxes to only include one paragraph or one reference per box. The object detection models do the best they can for the amount of training they've received, but errors are still possible. This side of the project focused on fixing those errors from the model to make sure that the XML generation works well and the text is readable on our final application. The algorithms team was able to get a good post-processing algorithm to work for around 90% of the paragraphs in the ETDs that were tested, but were unable to get to the references part of the deliverable. This is left for future collaborators.