Arabic News Article Summarization

TR Number

Date

2015-05-14

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This project involves taking Arabic PDF news articles to produce results from our new program that indexes, categorizes, and summarizes them. We fill out a template to summarize news articles with predetermined attributes. These values will be extracted using a named entity recognizer (NER) which will recognize organizations and people, topic generation using an LDA algorithm, and direct information extraction from news articles’ authors and dates. We use Fusion LucidWorks (a Solr based system) to help with the indexing of our data and provide an interface for the user to search and browse the articles with their summaries. Solr is used for information retrieval. The final program should enable end users to sift through news articles quickly.

Description

Keywords

Arabic, Fusion, Solr, Classification, Newspaper articles, Named Entity Recognizer, LDA

Citation