The Open Science of Deep Learning: Three Case Studies
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The open science movement, which prioritizes the open availability of research data and methods for public scrutiny and replication, includes practices like providing code implementing described algorithms in openly available publications. An area of research in which open-science principles may have particularly high impact is in deep learning, where researchers have developed a plethora of algorithms to solve complex and challenging problems, but where others may have difficulty in replicating results and applying these algorithms to other problems. In response, some researchers have begun to open up deep-learning research by making their code and resources available (e.g., datasets and/or pre-trained models) to the current and future research community. This presentation describes three case studies in deep learning where openly available resources differed and investigates the impact on the project and the outcome. This provides a venue for discussion on successes, lessons learned, and recommendations for future researchers facing similar situations, especially as deep learning increasingly becomes an important tool across disciplines. In the first case study, we present a workflow for text summarization, based on thousands of news articles. The outcome, generalizable to many situations, is a tool that can concisely report key facts and events from the articles. In the second case study, we describe the development of an Optical Character Recognition tool for archival research of physical typed notecards, in this case documenting an important, curated collection of thousands of items of clothing. In the last case study, we describe the workflow for applying common Natural Language Processing tools to a novel task: identifying descriptive language for whiskies from thousands of free-form text reviews. These case studies resulted in working solutions addressing their respective, challenging problems because of researchers embracing the concept of open science.