AI Aided Annotation

dc.contributor.authorBishop, Jonah B. M.en
dc.contributor.authorDavid, Isaacen
dc.contributor.authorLubana, Ishaandeepen
dc.date.accessioned2022-05-13T14:40:21Zen
dc.date.available2022-05-13T14:40:21Zen
dc.date.issued2022-05-11en
dc.description.abstractHuman annotation of long documents is a very important task in training and evaluation in NLP. The process generally starts with the human annotators reading over the document in its entirety. Once the annotator feels they have a sufficient grasp on the document, they can begin to annotate it. Specifically, annotators will look for questions that can be answered, and then write down the question and answer. In our client’s case, the chosen long documents are electronic theses and dissertations (ETDs) which are often 100-150 pages minimum, thereby making it a time consuming and expensive process to annotate. The ETDs are annotated on a chapter by chapter basis as content can vary significantly in each chapter. The annotations generated are then used to help evaluate downstream tasks such as summarization, topic modeling, and question answering. The system aids the annotators in the creation of a Knowledge Base that is rich with topics/keywords and question-answer pairs for each chapter in ETDs. The core of the system revolves around an algorithm known as the Maximal Marginal Relevance. By utilizing the MMR algorithm with a changeable lambda value, keywords, and a couple of other elements, we can identify sentences based on their similarity or diversity relative to a collection of sentences. This algorithm would greatly enhance the annotation process in ETDs by automating the process of identifying the most relevant sentences. Thus, annotators do not have to sift through the ETDs one sentence at a time, instead making a comprehensive summary as fast as the MMR algorithm can work. As a result, annotators can save many hours per ETD, resulting in more human generated annotations in a shorter amount of time. The final deliverables are the project, a final slideshow presenting our work throughout the semester, a final report, and a video demonstrating exactly how to use our platform. All of this is available here on VTechWorks in this report. Additionally, the project is being built using GitHub, making it free and available to the public to fork and modify in any way they see fit.en
dc.description.notesThe deliverables available for download include a copy of the presentation in both .pdf and .pptx form, a copy of the report in .pdf and .docx form, and an MP4 of the video walkthrough. Another version of the video is online at: https://www.youtube.com/watch?v=C5N5sKn4a2E. The following is a link to the GitHub available for use/modification: https://github.com/jonahbishop/AI-Aided-Annotation.en
dc.identifier.urihttp://hdl.handle.net/10919/110085en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsAttribution 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/en
dc.subjectMaximal Marginal Relevanceen
dc.subjectAnnotationen
dc.subjectWebsiteen
dc.subjectAI Aided Annotationen
dc.subjectChapter Annotationen
dc.titleAI Aided Annotationen
dc.typePresentationen
dc.typeReporten
dc.typeVideoen

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
AIAidedAnnotationPresentation.pdf
Size:
1.91 MB
Format:
Adobe Portable Document Format
Name:
AIAidedAnnotationPresentation.pptx
Size:
3.16 MB
Format:
Microsoft Powerpoint XML
Name:
AIAidedAnnotationReport.docx
Size:
1.35 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
AIAidedAnnotationReport.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format
Name:
AIAidedAnnotationVideo.mp4
Size:
30.43 MB
Format:
MP4 Container format for video files
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: