CS6604: Digital Libraries
Permanent URI for this collection
Browse
Browsing CS6604: Digital Libraries by Subject "Arabic"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
- Qatar content classificationHandosa, Mohamed (2014-05-09)This reports on a term project for the CS660 Digital libraries course (Spring 2014). The project has been conducted under the supervision of Prof. Edward Fox and Mr. Tarek Kanan. The goal is to develop an Arabic newspaper article classifier. We have built a collection of 700 Arabic newspaper articles and 1700 Arabic full-newspaper PDF files. A stemmer, named “P-Stemmer”, is proposed. Evaluation have shown that P-Stemmer outperforms Larkey’s widely used light stemmer. Several classification techniques were tested on Arabic data including SVM, Naïve Bayes and Random Forest. We built and tested 21 multiclass classifiers, 15 binary classifiers, and 5 compound classifiers using the voting technique. Finally, we uploaded the classified instances to Apache Solr for searching and indexing purposes.