Are Repositories Impeding Big Data Reuse?

TR Number
Date
2016-06-14
Journal Title
Journal ISSN
Volume Title
Publisher
Virginia Tech
Abstract

In this intentionally provocative presentation, we question the scalability of popular digital repositories and whether they are suitable for big data reuse. Are the layers of API these repositories have painted over file system primitives necessary? How essential is it for the repository to insist on being the sole manager of the content, and arranging files in ways to prevent access other than from their own APIs? We explore these questions from the perspective of big data reuse, and describe controlled reuse experiments against Fedora 4 to evaluate the cost of these practices.

Description
Keywords
Institutional repository, Data management, Big data, Scalability, Throughput
Citation