CS6604 Spring 2017 Global Events Team Project
dc.contributor.author | Li, Liuqing | en |
dc.contributor.author | Harb, Islam | en |
dc.contributor.author | Galad, Andrej | en |
dc.date.accessioned | 2017-05-27T15:02:14Z | en |
dc.date.available | 2017-05-27T15:02:14Z | en |
dc.date.issued | 2017-05-03 | en |
dc.description.abstract | This submission describes the work the Global Events team completed in Spring 2017. It includes the final report and presentation, as well as key relevant materials (source code). Based on the previous reports and different modules created by former teams, the Global Events team established a pipeline for processing Web ARChives supporting the IDEAL and GETAR projects, both funded by NSF. With the Internet Archive’s help, the Global Events team enhanced the Event Focused Crawler to retrieve more relevant webpages (i.e., about school shooting events) in WARC format. ArchiveSpark, an Apache Spark framework that facilitates access to Web Archives, was deployed on a stand-alone server, and multiple techniques, such as parsing, Stanford NER, regular expression and statistical methods, were leveraged to process and analyze the data, and describe those events. For the data visualization, an integrated user interface using Gradle was designed and implemented for trend results, which can be easily used by both CS and non-CS researchers and students. Moreover, new well written manuals could be easier for users and developers to read and get familiar with ArchiveSpark, Spark, and Scala. | en |
dc.description.notes | GlobalEvents_Presentation.pptx -- Global Events team final presentation in PPTX format GlobalEvents_Presentation.pdf -- Global Events team final presentation in PDF format GlobalEvents_Report.docx -- Global Events team final report in DOCX format GlobalEvents_Report.pdf -- Global Events team final report in PDF format GlobalEvents_Code.zip -- Global Events team software code package (including data collection, processing and visualization) | en |
dc.description.sponsorship | NSF:IIS-1319578 | en |
dc.description.sponsorship | NSF:IIS-1619028 | en |
dc.identifier.uri | http://hdl.handle.net/10919/77867 | en |
dc.language.iso | en_US | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Web Archives | en |
dc.subject | Global Events | en |
dc.subject | Trends | en |
dc.subject | Event Focused Crawler | en |
dc.subject | ArchiveSpark | en |
dc.subject | Stanford NER | en |
dc.subject | Gradle | en |
dc.subject | WARC Files | en |
dc.title | CS6604 Spring 2017 Global Events Team Project | en |
dc.type | Presentation | en |
dc.type | Report | en |
dc.type | Software | en |
Files
Original bundle
1 - 5 of 5
Loading...
- Name:
- GlobalEvents_Presentation.pdf
- Size:
- 1.1 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: