Now showing items 1-3 of 3
English Wikipedia on Hadoop Cluster
To develop and test big data software, one thing that is required is a big dataset. The full English Wikipedia dataset would serve well for testing and benchmarking purposes. Loading this dataset onto a system, such as an ...
The BTD Importer is used to importer Bound Thesis Dissertations to the Electronic Thesis Dissertation Database. The process involves taking a hard copy thesis and scanning it into PDF form. Once in PDF form, the importer ...
Database Creation and Information Extraction from ETDs for CRA-E
This project was in support of the educational activities of the Computing Research Association (CRA-E). The main point of the project was to collect data associated with electronic theses and dissertations (ETDs) to allow ...