Browsing by Author "Hamilton, Leah"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Generating Text Summaries for the Facebook Data Breach with Prototyping on the 2017 Solar EclipseHamilton, Leah; Robb, Esther; Fitzpatrick, April; Goel, Akshay; Nandigam, Ramya (Virginia Tech, 2018-12-13)Summarization is often a time-consuming task for humans. Automated methods can summarize a larger volume of source material in a shorter amount of time, but creating a good summary with these methods remains challenging. This submission contains all work related to a semester-long project in CS 4984/5984 to generate the best possible summary of a collection of 10,829 web pages about the Facebook-Cambridge Analytica data breach, with some early prototyping done on 500 web pages about the 2017 Solar Eclipse. A final report, a final presentation, and several archives of code, input data, and results are included. The work implements basic natural language processing techniques such as word frequency, lemmatization, and part-of-speech tagging, working up to a complete human-readable summary at the end of the course. Extractive, abstractive, and combination methods were used to generate the final summaries, all of which are included and the results compared. The summary subjectively evaluated as best was a purely extractive summary built from concatenating summaries of document categories. This method was coherent and thorough, but involved manual tuning to select categories and still had some redundancy. All attempted methods are described and the less successful summaries are also included. This report presents a framework for how to summarize complex document collections with multiple relevant topics. The summary itself identifies information which was most covered about the Facebook-Cambridge Analytica data breach and is a reasonable introduction to the topic.
- The Open Science of Deep Learning: Three Case StudiesMiller, Chreston; Lahne, Jacob; Hamilton, Leah (2022-03)The open science movement, which prioritizes the open availability of research data and methods for public scrutiny and replication, includes practices like providing code implementing described algorithms in openly available publications. An area of research in which open-science principles may have particularly high impact is in deep learning, where researchers have developed a plethora of algorithms to solve complex and challenging problems, but where others may have difficulty in replicating results and applying these algorithms to other problems. In response, some researchers have begun to open up deep-learning research by making their code and resources available (e.g., datasets and/or pre-trained models) to the current and future research community. This presentation describes three case studies in deep learning where openly available resources differed and investigates the impact on the project and the outcome. This provides a venue for discussion on successes, lessons learned, and recommendations for future researchers facing similar situations, especially as deep learning increasingly becomes an important tool across disciplines. In the first case study, we present a workflow for text summarization, based on thousands of news articles. The outcome, generalizable to many situations, is a tool that can concisely report key facts and events from the articles. In the second case study, we describe the development of an Optical Character Recognition tool for archival research of physical typed notecards, in this case documenting an important, curated collection of thousands of items of clothing. In the last case study, we describe the workflow for applying common Natural Language Processing tools to a novel task: identifying descriptive language for whiskies from thousands of free-form text reviews. These case studies resulted in working solutions addressing their respective, challenging problems because of researchers embracing the concept of open science.
- The Open Science of Deep Learning: Three Case StudiesMiller, Chreston; Hamilton, Leah; Lahne, Jacob (2023-02-15)Objective: An area of research in which open science may have particularly high impact is in deep learning (DL), where researchers have developed many algorithms to solve challenging problems, but others may have difficulty in replicating results and applying these algorithms. In response, some researchers have begun to open up DL research by making their resources available (e.g., code, datasets and/or pre-trained models) to the research community. This article describes three case studies in DL where openly available resources are used and we investigate the impact on the projects, the outcomes, and make recommendations for what to focus on when making DL resources available. Methods: Each case study represents a single project using openly available DL resources for a research project. The process and progress of each case study is recorded along with aspects such as approaches taken, documentation of openly available resources, and researchers’ experience with the openly available resources. The case studies are in multiple-document text summarization, optical character recognition (OCR) of thousands of text documents, and identifying unique language descriptors for sensory science. Results: Each case study was a success but had its own hurdles. Some takeaways are well-structured and clear documentation, code examples and demos, and pre-trained models were at the core to the success of these case studies. Conclusions: Openly available DL resources were the core of the success of our case studies. The authors encourage DL researchers to continue to make their data, code, and pre-trained models openly available where appropriate.
- Sensory Descriptor Analysis of Whisky Lexicons through the Use of Deep LearningMiller, Chreston; Hamilton, Leah; Lahne, Jacob (MDPI, 2021-07-14)This paper is concerned with extracting relevant terms from a text corpus on whisk(e)y. “Relevant” terms are usually contextually defined in their domain of use. Arguably, every domain has a specialized vocabulary used for describing things. For example, the field of Sensory Science, a sub-field of Food Science, investigates human responses to food products and differentiates “descriptive” terms for flavors from “ordinary”, non-descriptive language. Within the field, descriptors are generated through Descriptive Analysis, a method wherein a human panel of experts tastes multiple food products and defines descriptors. This process is both time-consuming and expensive. However, one could leverage existing data to identify and build a flavor language automatically. For example, there are thousands of professional and semi-professional reviews of whisk(e)y published on the internet, providing abundant descriptors interspersed with non-descriptive language. The aim, then, is to be able to automatically identify descriptive terms in unstructured reviews for later use in product flavor characterization. We created two systems to perform this task. The first is an interactive visual tool that can be used to tag examples of descriptive terms from thousands of whisky reviews. This creates a training dataset that we use to perform transfer learning using GloVe word embeddings and a Long Short-Term Memory deep learning model architecture. The result is a model that can accurately identify descriptors within a corpus of whisky review texts with a train/test accuracy of 99% and precision, recall, and F1-scores of 0.99. We tested for overfitting by comparing the training and validation loss for divergence. Our results show that the language structure for descriptive terms can be programmatically learned.