Big Data Processing in the Cloud: a Hydra/Sufia Experience

dc.contributor.authorBrittle, Collinen
dc.contributor.authorXie, Zhiwuen
dc.date.accessioned2016-06-25T21:38:35Zen
dc.date.available2016-06-25T21:38:35Zen
dc.date.issued2014-06-10en
dc.description.abstractPresentation video available at https://connectpro.helsinki.fi/p1txjdy74ts/ This presentation addresses the challenge of processing big data in a cloud-based data repository. Using the Hydra Project’s Hydra and Sufia ruby gems and working with the Hydra community, we created a special repository for the project, and set up background jobs. Our approach is to create the metadata with these jobs, which are distributed across multiple computing cores. This will allow us to scale our infrastructure out on an as-needed basis, and decouples automatic metadata creation from the response times seen by the user. While the metadata is not immediately available after ingestion, it does mean that the object is. By distributing the jobs, we can compute complex properties without impacting the repository server. Hydra and Sufia allowed us to get a head start by giving us a simple self deposit repository, complete with background jobs support via Redis and Resque.en
dc.identifier.urihttp://hdl.handle.net/10919/71460en
dc.language.isoen_USen
dc.relation.hasparthttp://urn.fi/URN:NBN:fi-fe2014070432268en
dc.relation.hasparthttps://connectpro.helsinki.fi/p1txjdy74ts/en
dc.relation.ispartofOpen Repositories 2014en
dc.rightsCreative Commons Attribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/en
dc.subjectDigital libraryen
dc.subjectBig dataen
dc.subjectInstitutional repositoryen
dc.subjectFedoraen
dc.subjectHydraen
dc.titleBig Data Processing in the Cloud: a Hydra/Sufia Experienceen
dc.typeArticleen

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
brittle-or2014.pdf
Size:
128.26 KB
Format:
Adobe Portable Document Format
Description:
Final submission
Loading...
Thumbnail Image
Name:
OR2014-slides.pdf
Size:
10.15 MB
Format:
Adobe Portable Document Format
Description:
Slides for presentation at OR 2014
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: