Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • Student Works
    • CS4624: Multimedia, Hypertext, and Information Access
    • View Item
    •   VTechWorks Home
    • Student Works
    • CS4624: Multimedia, Hypertext, and Information Access
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    CS4624 IDEAL Spreadsheet

    Thumbnail
    View/Open
    Final presentation in PDF format (213.5Kb)
    Downloads: 191
    Final presentation in pptx format (178.5Kb)
    Downloads: 54
    Midterm presentation in PDF format (241.1Kb)
    Downloads: 62
    Midterm presentation in .pptx format (208.2Kb)
    Downloads: 45
    Final Report in PDF Format (688.0Kb)
    Downloads: 334
    Final Report in .docx Format (357.8Kb)
    Downloads: 114
    source.zip (118.6Mb)
    Downloads: 114
    Date
    2014-05-09
    Author
    Burnett, Austin
    Neuman, Shawn
    Ardura, Anthony
    Lacy, Rex
    Metadata
    Show full item record
    Abstract
    The IDEAL proposal encompasses an incredibly vast infrastructure of technology intended to be used by people of varying backgrounds. The analysts and researchers who will be familiar with the data presented through many aspects of the IDEAL project may not be familiar with the means of accessing it from the differing resources. The purpose of this project is to provide non technically-skilled personnel with the ability to access data in a easy to use and intuitive way. The data this project focuses on are tweets, photos, and webpages found on web-archive files, or ‘warc’ files. These warc files are comprised of a few, to several hundreds of gigabytes, making a manual search to find specific information near impossible. Instead, we use a Cloudera VM as a prototype of the cluster used in IDEAL, and demonstrate how to load WARC files for Hadoop processing. That allows parallel big data processing with several software tools, supporting database and full-text searching, text extraction, and various machine learning applications. Our project goal to present relevant data in an attractive, useful, and intuitive way was achieved through the creation of a web based spreadsheet-like service. While the exact use goes on in greater detail below, the overarching plan was to provide the user with an easy to use spreadsheet, which takes input from the user and returns the relevant data in spreadsheet cells. The other functionality requested by the client for special jobs such as ‘all images’ or ‘word count’ led to other features. To summarize, this project intends to provide a web service to provide IDEAL researchers with the means to retrieve relevant information from warc files in an intuitive and effective manner. The project called for several technologies and frameworks which will be elaborated on below, and this project paves the way for increased future development in the IDEAL project mission.
    URI
    http://hdl.handle.net/10919/47942
    Collections
    • CS4624: Multimedia, Hypertext, and Information Access [165]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us