Browsing by Author "Yuan, Liling"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Collection Management of Electronic Theses and Dissertations (CME) CS5604 Fall 2019Kaushal, Kulendra Kumar; Kulkarni, Rutwik; Sumant, Aarohi; Wang, Chaoran; Yuan, Chenhan; Yuan, Liling (Virginia Tech, 2019-12-23)The class ``CS 5604: Information Storage and Retrieval'' in the fall of 2019 is divided into six teams to enhance the usability of the corpus of electronic theses and dissertations maintained by Virginia Tech University Libraries. The ETD corpus consists of 14,055 doctoral dissertations and 19,246 masters theses from Virginia Tech University Libraries’ VTechWorks system. Our study explored document collection and processing, application of Elasticsearch to the collection to facilitate searching, testing a custom front-end, Kibana, integration, implementation, text analytics, and machine learning. The result of our work would help future researchers study the natural language processed data using deep learning technologies, address the challenges of extracting information from ETDs, etc. The Collection Management of Electronic Theses and Dissertations (CME) team was responsible for processing all PDF files from the ETD corpus and extracting well-formatted text files from them. We also used advanced deep learning and other tools like GROBID to process metadata, obtain text documents, and generate chapter-wise data. In this project, the CME team completed the following steps: comparing different parsers; doing document segmentation; preprocessing the data; and specifying, extracting, and preparing metadata and auxiliary information for indexing. We finally developed a system that automates all the above-mentioned tasks. The system also validates the output metadata, thereby ensuring the correctness of the data that flows through the entire system developed by the class. This system, in turn, helps to ingest new documents into Elasticsearch.
- SleuthTalk: Addressing the Last-Mile Problem in Historical Person Identification with Privacy, Collaboration, and Structured FeedbackYuan, Liling (Virginia Tech, 2021-06-14)Identifying people in historical photographs is an important task in many fields, including history, journalism, genealogy, and collecting. A wide variety of different methods, such as manual analysis, facial recognition, and crowdsourcing, have been used to identify the unknown photos. However, because of the large numbers of candidates and the poor quality or lack of source evidence, accurate historical person identification still remains challenging. Researchers especially struggle with the ``last mile problem" of historical person identification, where they must make a selection among a small number of highly similar candidates. Collaboration, including both human-AI collaboration and collaboration within human teams, has shown the advantages of improving data accuracy, but there is lack of research about how we can design a collaborative workspace to support the historical person identification. In this work, we present SleuthTalk, a web-based collaboration tool integrated into the public website Civil War Photo Sleuth which addresses the last-mile problem in historical person identification by providing support for shortlisting potential candidates from face recognition results, private collaborative workspaces, and structured feedback interfaces. We evaluated this feature in a mixed-method study involving 6 participants, who spent one week each using SleuthTalk and a comparable social media platform to identify an unknown photo. The results of this study show how our design helps with identifying historical photos in a collaborative way and suggests directions for improvement in future work.