Evaluating Automated Summarization with Analyst Memories
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Automatic summarization remains a challenging area in natural language processing, particularly in the development of robust evaluation metrics. In this work we attempted to develop a task-specific summarization evaluation method by examining intelligence analyst memories for documents and summaries. We ran a feasibility study to see if analyst memories for full texts one day later compare to what is included in automatic summaries as a way of measuring summary quality. We find memories are comparable to summaries, but that methodology tweaks are likely necessary before that comparison can serve as an evaluation of varied summaries. We also compared analyst memories for full texts versus summary texts to see the impact summarization has on memory. We indeed see different information is retained based on what document analysts saw - particularly more details were recalled from full texts while summary texts were more often incorporated into broad statements about multiple documents. We conclude that there is merit to examining memory as a form of summary evaluation - both as a way of thinking about how to summarize and how to incorporate summaries into analyst workflows.