Arabic News Article Summarization

TR Number
Date
2015-05-14
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract

This project involves taking Arabic PDF news articles to produce results from our new program that indexes, categorizes, and summarizes them. We fill out a template to summarize news articles with predetermined attributes. These values will be extracted using a named entity recognizer (NER) which will recognize organizations and people, topic generation using an LDA algorithm, and direct information extraction from news articles’ authors and dates. We use Fusion LucidWorks (a Solr based system) to help with the indexing of our data and provide an interface for the user to search and browse the articles with their summaries. Solr is used for information retrieval. The final program should enable end users to sift through news articles quickly.

Description
Keywords
Arabic, Fusion, Solr, Classification, Newspaper articles, Named Entity Recognizer, LDA
Citation