Towards Use And Reuse Driven Big Data Management

dc.contributor.authorXie, Zhiwuen
dc.contributor.authorChen, Yinlinen
dc.contributor.authorGriffin, Julieen
dc.contributor.authorWalters, Tyleren
dc.contributor.authorTarazaga, Pablo Albertoen
dc.contributor.authorKasarda, Maryen
dc.contributor.departmentUniversity Librariesen
dc.contributor.departmentMechanical Engineeringen
dc.date.accessioned2015-06-03T09:50:57Zen
dc.date.available2015-06-03T09:50:57Zen
dc.date.issued2015-06-03en
dc.description.abstractWe propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing number of researchers who don’t fit in the big science/small science dichotomy. This approach will allow researchers to more easily use, manage, and collaborate around big data sets, as well as give librarians the opportunity to work alongside the researchers to preserve and curate data while it is still fresh and being actively used. This also provides the technological foundation to foster a sharing culture more aligned with the open source software development paradigm than the lone-wolf, gift-exchanging small science sharing or the top-down, highly structured big science sharing. To materialize this vision, we provide a system architecture consisting of a scalable digital repository system coupled with the co-located cloud storage and cloud computing, as well as a job scheduler and a deployment management system. Motivated by Virginia Tech’s Goodwin Hall instrumentation project, we implemented and evaluated a prototype. The results show not only sufficient capacities for this particular case, but also near perfect linear storage and data processing scalabilities under moderately high workload.en
dc.description.versionpre-printen
dc.identifier.doihttps://doi.org/10.1145/2756406.2756924en
dc.identifier.urihttp://hdl.handle.net/10919/51621en
dc.language.isoen_USen
dc.relation.hasversionhttp://dl.acm.org/citation.cfm?id=2756924en
dc.rightsCreative Commons Attribution-ShareAlike 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/en
dc.subjectBig dataen
dc.subjectDigital libraryen
dc.subjectCloud computingen
dc.subjectDigital repositoryen
dc.subjectSmart infrastructureen
dc.subjectSensor dataen
dc.titleTowards Use And Reuse Driven Big Data Managementen
dc.typeConference proceedingen
dc.typePresentationen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
jcdl117-xie-postprint.pdf
Size:
605.79 KB
Format:
Adobe Portable Document Format
Description:
postprint
Loading...
Thumbnail Image
Name:
jcdl117-xie-slides.pdf
Size:
40.43 MB
Format:
Adobe Portable Document Format
Description:
presentation slides
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: