Now showing items 1-3 of 3
English Wikipedia on Hadoop Cluster
To develop and test big data software, one thing that is required is a big dataset. The full English Wikipedia dataset would serve well for testing and benchmarking purposes. Loading this dataset onto a system, such as an ...
Clustering and Topic Analysis in CS 5604 Information Retrieval Fall 2016
(Virginia Tech, 2016-12-08)
The IDEAL (Integrated Digital Event Archiving and Library) and Global Event and Trend Archive Research (GETAR) projects aim to build a robust Information Retrieval (IR) system by retrieving tweets and webpages from social ...
Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEAL
The IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level ...