Otrouha: Automatic Classification of Arabic ETDs

dc.contributor.authorAlotaibi, Fatimahen
dc.contributor.authorAbdelrahman, Emanen
dc.date.accessioned2020-01-24T16:02:52Zen
dc.date.available2020-01-24T16:02:52Zen
dc.date.issued2020-01-23en
dc.description.abstractETDs are becoming a new genre of documents that is highly precious and worth preserving. This has resulted in a sustainable need to build an effective tool to facilitate retrieving ETD collections. While Arabic ETDs have gained increasing attention, many challenges ensued due to lack of resources and complexity of information retrieval in the Arabic language. Therefore, this project focuses on making Arabic ETDs more accessible by facilitating browsing and searching. The aim is to build an automated classifier that categorizes an Arabic ETD based on its abstract. Our raw dataset was obtained by crawling the AskZad digital library website. Then, we conducted some pre-processing techniques on the dataset to make it suitable for our classification process. We developed automatic classification methods using various classifiers: Support Vector Machines and SVC, Random Forest, and Decision Trees. We then used an ensemble classifier of the two classifiers that generated the highest accuracy. Then, we applied evaluation techniques commonly used such as including 10-fold cross-validation. The results show better performance for the binary classification with average accuracy 68%per category, where multiclass classification performed poorly with average accuracy 24%.en
dc.description.notesArabicETDs_Code.zip: This is the Python code that includes Data Scraping from AskZad Digital Library, Preprocessing the raw data, and the classification process ArabicETDs-Data.zip: This is the data scraped from AskZad Digital Library (Original version of abstracts of ETDs and Preprocessed "lemmatized and filtered" abstracts) ArabicETDs-presentation.pdf: Final Presentation of Otrouha project in PDF format ArabicETDs-presentation.pptx: Final Presentation of Otrouha project in pptx format ArabicETDs-Report.zip: Final report of Otrouha project ArabicETDs-Report.pdf: Final report of Otrouha project in PDF format ArabicETDs-AdditionalWork.docx: Additional work that was done and isn't included in the report, in an editable format ArabicETDs-AdditionalWork.pdf: Additional work that was done and isn't included in the report, in PDF formaten
dc.description.sponsorshipIMLS LG-37-19-0078-19en
dc.identifier.urihttp://hdl.handle.net/10919/96571en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectArabic ETDsen
dc.subjectArabic Text Classificationen
dc.subjectMachine learningen
dc.subjectNLPen
dc.titleOtrouha: Automatic Classification of Arabic ETDsen
dc.typeOtheren

Files

Original bundle
Now showing 1 - 5 of 8
Name:
ArabicETDs_Code.zip
Size:
25.46 KB
Format:
Name:
ArabicETDs-Data.zip
Size:
2.07 MB
Format:
Loading...
Thumbnail Image
Name:
ArabicETDs-presentation.pdf
Size:
1.12 MB
Format:
Adobe Portable Document Format
Name:
ArabicETDs-presentation.pptx
Size:
1.31 MB
Format:
Microsoft Powerpoint XML
Name:
ArabicETDs-Report.zip
Size:
1.32 MB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: