CS6604 Spring 2017 Global Events Team Project

dc.contributor.authorLi, Liuqingen
dc.contributor.authorHarb, Islamen
dc.contributor.authorGalad, Andrejen
dc.date.accessioned2017-05-27T15:02:14Zen
dc.date.available2017-05-27T15:02:14Zen
dc.date.issued2017-05-03en
dc.description.abstractThis submission describes the work the Global Events team completed in Spring 2017. It includes the final report and presentation, as well as key relevant materials (source code). Based on the previous reports and different modules created by former teams, the Global Events team established a pipeline for processing Web ARChives supporting the IDEAL and GETAR projects, both funded by NSF. With the Internet Archive’s help, the Global Events team enhanced the Event Focused Crawler to retrieve more relevant webpages (i.e., about school shooting events) in WARC format. ArchiveSpark, an Apache Spark framework that facilitates access to Web Archives, was deployed on a stand-alone server, and multiple techniques, such as parsing, Stanford NER, regular expression and statistical methods, were leveraged to process and analyze the data, and describe those events. For the data visualization, an integrated user interface using Gradle was designed and implemented for trend results, which can be easily used by both CS and non-CS researchers and students. Moreover, new well written manuals could be easier for users and developers to read and get familiar with ArchiveSpark, Spark, and Scala.en
dc.description.notesGlobalEvents_Presentation.pptx -- Global Events team final presentation in PPTX format GlobalEvents_Presentation.pdf -- Global Events team final presentation in PDF format GlobalEvents_Report.docx -- Global Events team final report in DOCX format GlobalEvents_Report.pdf -- Global Events team final report in PDF format GlobalEvents_Code.zip -- Global Events team software code package (including data collection, processing and visualization)en
dc.description.sponsorshipNSF:IIS-1319578en
dc.description.sponsorshipNSF:IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/77867en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectWeb Archivesen
dc.subjectGlobal Eventsen
dc.subjectTrendsen
dc.subjectEvent Focused Crawleren
dc.subjectArchiveSparken
dc.subjectStanford NERen
dc.subjectGradleen
dc.subjectWARC Filesen
dc.titleCS6604 Spring 2017 Global Events Team Projecten
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
GlobalEvents_Code.zip
Size:
2.59 MB
Format:
Loading...
Thumbnail Image
Name:
GlobalEvents_Presentation.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format
Name:
GlobalEvents_Presentation.pptx
Size:
1.22 MB
Format:
Microsoft Powerpoint XML
Name:
GlobalEvents_Report.docx
Size:
7.61 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
GlobalEvents_Report.pdf
Size:
5.79 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: