Show simple item record

dc.contributor.authorBurnett, Austin
dc.contributor.authorNeuman, Shawn
dc.contributor.authorArdura, Anthony
dc.contributor.authorLacy, Rex
dc.date.accessioned2014-05-09T21:37:45Z
dc.date.available2014-05-09T21:37:45Z
dc.date.issued2014-05-09
dc.identifier.urihttp://hdl.handle.net/10919/47942
dc.descriptionConnecting the IDEAL database to a spreadsheet interface. Source code developed is in the zip file provided. Our clients were Mohamed Magdy, a Ph. D. student at Virginia Polytechnic Institute and State University, and the Integrated Digital Event Archiving and Library (IDEAL) Team, supported through NSF IIS - 1319578.en_US
dc.description.abstractThe IDEAL proposal encompasses an incredibly vast infrastructure of technology intended to be used by people of varying backgrounds. The analysts and researchers who will be familiar with the data presented through many aspects of the IDEAL project may not be familiar with the means of accessing it from the differing resources. The purpose of this project is to provide non technically-skilled personnel with the ability to access data in a easy to use and intuitive way. The data this project focuses on are tweets, photos, and webpages found on web-archive files, or ‘warc’ files. These warc files are comprised of a few, to several hundreds of gigabytes, making a manual search to find specific information near impossible. Instead, we use a Cloudera VM as a prototype of the cluster used in IDEAL, and demonstrate how to load WARC files for Hadoop processing. That allows parallel big data processing with several software tools, supporting database and full-text searching, text extraction, and various machine learning applications. Our project goal to present relevant data in an attractive, useful, and intuitive way was achieved through the creation of a web based spreadsheet-like service. While the exact use goes on in greater detail below, the overarching plan was to provide the user with an easy to use spreadsheet, which takes input from the user and returns the relevant data in spreadsheet cells. The other functionality requested by the client for special jobs such as ‘all images’ or ‘word count’ led to other features. To summarize, this project intends to provide a web service to provide IDEAL researchers with the means to retrieve relevant information from warc files in an intuitive and effective manner. The project called for several technologies and frameworks which will be elaborated on below, and this project paves the way for increased future development in the IDEAL project mission.en_US
dc.description.sponsorshipMohamed Magdy - Ph. D. student, mmagdy@vt.eduen_US
dc.description.sponsorshipNSF IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL)en_US
dc.language.isoen_USen_US
dc.subjectIDEALen_US
dc.subjectSpreadsheeten_US
dc.subjectCS4624en_US
dc.subjectHadoopen_US
dc.titleCS4624 IDEAL Spreadsheeten_US
dc.typePresentationen_US
dc.typeSoftwareen_US
dc.typeTechnical reporten_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record