IDEAL Pages

Abstract

The main goal of this project is to provide a convenient Web enabled interface to a large collection of event-related webpages supporting the two main services of browsing and searching. We first studied the events and decided what fields are required to build the events index based on the dataset available to us. We then configured a SolrCloud with a collection based on these fields in the Schema.xml file. Then we built a Hadoop Map-Reduce function along with SolrCloud to index documents related to the data about 60 events crawled from the Web. Then we were able to find a way to interface with the Solr server and indexed documents through a PHP server application. Finally, we were able to design a convenient user interface that allows users to browse the documents by event category and event name as well as to search the document collection for particular keywords.

Description
The submitted files include the full technical report, midterm presentation, final presentation, and complete source code for the document indexing as well as for the Web interface. We would like to acknowledge NSF for funding the project under the grant IIS - 1319578: Integrated Digital Event Archiving and Library (IDEAL) For a working URL of the project results, please contact Mohamed Magdy (mmagdy@vt.edu) or Edward Fox (fox@vt.edu) or visit http://www.eventsarchive.org/
Keywords
Events Collection, Web Interface, Indexing large collections, Solr Cloud, Hadoop, Solarium client
Citation