Now showing items 1-2 of 2
English Wikipedia on Hadoop Cluster
To develop and test big data software, one thing that is required is a big dataset. The full English Wikipedia dataset would serve well for testing and benchmarking purposes. Loading this dataset onto a system, such as an ...
Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEAL
The IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level ...