Now showing items 1-6 of 6

    • Are Repositories Impeding Big Data Reuse? 

      Xie, Zhiwu; Galad, Andrej; Chen, Yinlin; Fox, Edward A. (Virginia Tech, 2016-06-14)
      In this intentionally provocative presentation, we question the scalability of popular digital repositories and whether they are suitable for big data reuse. Are the layers of API these repositories have painted over file ...
    • Big Data Processing in the Cloud: a Hydra/Sufia Experience 

      Brittle, Collin; Xie, Zhiwu (2014-06-10)
      Presentation video available at https://connectpro.helsinki.fi/p1txjdy74ts/ This presentation addresses the challenge of processing big data in a cloud-based data repository. Using the Hydra Project’s Hydra and Sufia ...
    • Evaluating Cost of Cloud Execution in a Data Repository 

      Xie, Zhiwu; Chen, Yinlin; Griffin, Julie; Walters, Tyler (ACM, 2016-06)
      In this paper, we utilize a set of controlled experiments to benchmark the cost associated with the cloud execution of typical repository functions such as ingestion, fixity checking, and heavy data processing. We focus ...
    • Facilitate Cross-Repository Big Data Discovery and Reuse 

      Xie, Zhiwu (Virginia Tech, 2013-03-13)
      Researchers have accumulated large amount of observational, experimental, and simulation data. Much effort has been made to collect, curate, preserve, and provide open access to them, but putting the data online is only ...
    • On-Demand Big Data Analysis in Digital Repositories 

      Xie, Zhiwu; Chen, Yinlin; Jiang, Tingting; Griffin, Julie; Walters, Tyler; Tarazaga, Pablo Alberto; Kasarda, Mary (Springer International Publishing, 2015-12-18)
      We describe a use and reuse driven digital repository integrated with lightweight data analysis capabilities provided by the Docker framework. Using building sensor data collected from the Virginia Tech Goodwin Hall Living ...
    • Towards Use And Reuse Driven Big Data Management 

      Xie, Zhiwu; Chen, Yinlin; Griffin, Julie; Walters, Tyler; Tarazaga, Pablo Alberto; Kasarda, Mary (2015-06-03)
      We propose a use and reuse driven big data management approach that fuses the data repository and data processing capabilities in a co-located, public cloud. It answers to the urgent data management needs from the growing ...