Otrouha: Automatic Classification of Arabic ETDs
dc.contributor.author | Alotaibi, Fatimah | en |
dc.contributor.author | Abdelrahman, Eman | en |
dc.date.accessioned | 2020-01-24T16:02:52Z | en |
dc.date.available | 2020-01-24T16:02:52Z | en |
dc.date.issued | 2020-01-23 | en |
dc.description.abstract | ETDs are becoming a new genre of documents that is highly precious and worth preserving. This has resulted in a sustainable need to build an effective tool to facilitate retrieving ETD collections. While Arabic ETDs have gained increasing attention, many challenges ensued due to lack of resources and complexity of information retrieval in the Arabic language. Therefore, this project focuses on making Arabic ETDs more accessible by facilitating browsing and searching. The aim is to build an automated classifier that categorizes an Arabic ETD based on its abstract. Our raw dataset was obtained by crawling the AskZad digital library website. Then, we conducted some pre-processing techniques on the dataset to make it suitable for our classification process. We developed automatic classification methods using various classifiers: Support Vector Machines and SVC, Random Forest, and Decision Trees. We then used an ensemble classifier of the two classifiers that generated the highest accuracy. Then, we applied evaluation techniques commonly used such as including 10-fold cross-validation. The results show better performance for the binary classification with average accuracy 68%per category, where multiclass classification performed poorly with average accuracy 24%. | en |
dc.description.notes | ArabicETDs_Code.zip: This is the Python code that includes Data Scraping from AskZad Digital Library, Preprocessing the raw data, and the classification process ArabicETDs-Data.zip: This is the data scraped from AskZad Digital Library (Original version of abstracts of ETDs and Preprocessed "lemmatized and filtered" abstracts) ArabicETDs-presentation.pdf: Final Presentation of Otrouha project in PDF format ArabicETDs-presentation.pptx: Final Presentation of Otrouha project in pptx format ArabicETDs-Report.zip: Final report of Otrouha project ArabicETDs-Report.pdf: Final report of Otrouha project in PDF format ArabicETDs-AdditionalWork.docx: Additional work that was done and isn't included in the report, in an editable format ArabicETDs-AdditionalWork.pdf: Additional work that was done and isn't included in the report, in PDF format | en |
dc.description.sponsorship | IMLS LG-37-19-0078-19 | en |
dc.identifier.uri | http://hdl.handle.net/10919/96571 | en |
dc.language.iso | en_US | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Arabic ETDs | en |
dc.subject | Arabic Text Classification | en |
dc.subject | Machine learning | en |
dc.subject | NLP | en |
dc.title | Otrouha: Automatic Classification of Arabic ETDs | en |
dc.type | Other | en |
Files
Original bundle
1 - 5 of 8
Loading...
- Name:
- ArabicETDs-presentation.pdf
- Size:
- 1.12 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: