Named Entity Recognition for IDEAL

dc.contributor.authorDu, Qianzhouen
dc.contributor.authorZhang, Xuanen
dc.date.accessioned2015-05-13T01:34:58Zen
dc.date.available2015-05-13T01:34:58Zen
dc.date.issued2015-05-10en
dc.descriptionThis project explored how to apply Named Entity Recognition to large Twitter and web page datasets to extract useful entities such as people, organization, location, and date. In addition, this NER utility has been scaled to the MapReduce framework on the Hadoop cluster. A schema and software allow this to be integrated with IDEAL.en
dc.description.abstractThe term “Named Entity”, which was first introduced by Grishman and Sundheim, is widely used in Natural Language Processing (NLP). The researchers were focusing on the information extraction task, that is extracting structured information of company activities and defense related activities from unstructured text, such as newspaper articles. The essential part of “Named Entity” is to recognize information elements, such as location, person, organization, time, date, money, percent expression, etc. To identify these entities from unstructured text, some researchers called this sub-task of information extraction as “Named Entity Recognition” (NER). Now, NER technology has become mature and there are good tools to implement this task, such as the Stanford Named Entity Recognizer (SNER), Illinois Named Entity Tagger (INET), Alias-i LingPipe (LIPI), and OpenCalasi (OCWS). Each of these has some advantages and is designed for some special data. In this term project, our final goal is to build a NER module for the IDEAL project based on a particular NER tool, such as SNER, to apply NER to the Twitter and web pages data sets. This project report presents our work towards this goal, including literature review, requirements, algorithm, development plan, system architecture, implementation, user manual, and development manual. Further, results are given with regard to multiple collections, along with discussion and plans for the future.en
dc.description.sponsorshipNSF grant IIS - 1319578, III: Small: Integrated Digital Event Archiving and Library (IDEAL)en
dc.identifier.urihttp://hdl.handle.net/10919/52254en
dc.language.isoen_USen
dc.rightsCreative Commons Attribution-ShareAlike 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/en
dc.subjectNamed Entity Recognitionen
dc.subjectInformation Extractionen
dc.subjectInformation Retrievalen
dc.subjectMapReduceen
dc.subjectHadoopen
dc.titleNamed Entity Recognition for IDEALen
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 6
Name:
avro-schema.zip
Size:
1.41 KB
Format:
Unknown data format
Name:
CS5604-NER.zip
Size:
295.87 MB
Format:
Unknown data format
Description:
Source code project (in Java)
Name:
ReportNER.docx
Size:
678.77 KB
Format:
Microsoft Word XML
Description:
Project report (docx version)
Loading...
Thumbnail Image
Name:
ReportNER.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format
Description:
Project report (pdf version)
Name:
PresentationNER.pptx
Size:
245.61 KB
Format:
Microsoft Powerpoint XML
Description:
Presentation slides (pptx version)
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: