Show simple item record

dc.contributor.authorGruss, Richard
dc.contributor.authorMorgado, Daniel
dc.contributor.authorCraun, Nate
dc.contributor.authorShea-Blymyer, Colin
dc.date.accessioned2014-12-13T22:25:37Z
dc.date.available2014-12-13T22:25:37Z
dc.date.issued2014-12
dc.identifier.urihttp://hdl.handle.net/10919/51133
dc.descriptionFor OutbreakSum, there are Word and PDF versions of the final report, as well as PowerPoint and PDF versions of the final presentation. A ZIP file with an extensive code base also is included.en_US
dc.description.abstractThe goal of the fall 2014 Disease Outbreak Project (OutbreakSum) was to develop software for automatically analyzing and summarizing large collections of texts pertaining to disease outbreaks. Although our code was tested on collections about specific diseases--a small one about Encephalitis and a large one about Ebola--most of our tools would work on texts about any infectious disease, where the key information relates to locations, dates, number of cases, symptoms, prognosis, and government and healthcare organization interventions. In the course of the project, we developed a code base that performs several key Natural Language Processing (NLP) functions. Some of the tools that could potentially be useful for other Natural Language Generation (NLG) projects include: 1. A framework for developing MapReduce programs in Python that allows for local running and debugging; 2. Tools for document collection cleanup procedures such as small-file removal, duplicate-file removal (based on content hashes), sentence and paragraph tokenization, nonrelevant file removal, and encoding translation; 3. Utilities to simplify and speed up Named Entity Recognition with Stanford NER by using the Java API directly; 4. Utilities to leverage the full extent of the Stanford CoreNLP library, which include tools for parsing and coreference resolution; 5. Utilities to simplify using the OpenNLP Java library for text processing. By configuring and running a single Java class, you can use OpenNLP to perform part-of-speech tagging and named entity recognition on your entire collection in minutes. We’ve classified the tools available in OutbreakSum into four major modules: 1. Collection Processing; 2. Local Language Processing; 3. MapReduce with Apache Hadoop; 4. Summarization.en_US
dc.description.sponsorshipNSF DUE-1141209 and IIS-1319578en_US
dc.language.isoen_USen_US
dc.subjectdisease outbreaken_US
dc.subjecttext classificationen_US
dc.subjectinformation retrievalen_US
dc.subjectmachine learningen_US
dc.subjectsummarizationen_US
dc.subjectnatural language processingen_US
dc.subjectnatural language generationen_US
dc.subjecttext extractionen_US
dc.titleOutbreakSum: Automatic Summarization of Texts Relating to Disease Outbreaksen_US
dc.typeArticleen_US
dc.typePresentationen_US
dc.typeSoftwareen_US
dc.typeTechnical reporten_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

VersionItemDateSummary

*Selected version