Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • Student Works
    • CS4624: Multimedia, Hypertext, and Information Access
    • View Item
    •   VTechWorks Home
    • Student Works
    • CS4624: Multimedia, Hypertext, and Information Access
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Global Event Crawler and Seed Generator for GETAR

    Thumbnail
    View/Open
    GEDcode.zip (1.126Mb)
    Downloads: 49
    GEDpresentation.pptx (8.417Mb)
    Downloads: 80
    GEDpresentation.pdf (674.5Kb)
    Downloads: 140
    GEDreport.docx (2.534Mb)
    Downloads: 935
    GEDreport.pdf (3.682Mb)
    Downloads: 1304
    Date
    2017-04-28
    Author
    Manchester, Emma
    Srinivasan, Ravi
    Crenshaw, Sean
    Masterson, Alec
    Grinnan, Harrison
    Metadata
    Show full item record
    Abstract
    Global Event and Trend Archive Research (GETAR) is a research project at Virginia Tech, studying the years from 1997 to 2020, which seeks to investigate and catalog events as they happen in support of future research. It will devise interactive and integrated digital library and archive systems coupled with linked and expert-curated web page and tweet collections. This historical record enables research on trends as history develops and captures valuable primary sources that would otherwise not be archived. An important capability of this project is the ability to predict which sources and stories will be most important in the future in order to prioritize those stories for archiving. It is in that space that our project will be most important. In support of GETAR, this project will build a powerful tool to scrape the news to identify important global events. It will generate seeds that contain relevant information like a link, the topic, person, organization, source, etc. The seeds can then be used by others working on GETAR to collect webpages and tweets using tools like the Event Focused Crawler and Twitter Search. To achieve this goal, the Global Event Detector (GED) will crawl Reddit to determine possible important news stories. These stories will be grouped, and the top groupings will be displayed on a website as well as a display in Torgersen Hall. This project will serve future research for the GETAR project, as well as those seeking real time updates on events currently trending. The final deliverables discussed in this report includes code that scrapes Reddit and processes the data, and the webpage that visualizes the data.
    URI
    http://hdl.handle.net/10919/77620
    Collections
    • CS4624: Multimedia, Hypertext, and Information Access [229]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us