Are Repositories Impeding Big Data Reuse?

TR Number

Date

2016-06-14

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

In this intentionally provocative presentation, we question the scalability of popular digital repositories and whether they are suitable for big data reuse. Are the layers of API these repositories have painted over file system primitives necessary? How essential is it for the repository to insist on being the sole manager of the content, and arranging files in ways to prevent access other than from their own APIs? We explore these questions from the perspective of big data reuse, and describe controlled reuse experiments against Fedora 4 to evaluate the cost of these practices.

Description

Keywords

Institutional repository, Data management, Big data, Scalability, Throughput

Citation