VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

OutbreakSum: Automatic Summarization of Texts Relating to Disease Outbreaks

dc.contributor.authorGruss, Richarden
dc.contributor.authorMorgado, Danielen
dc.contributor.authorCraun, Nateen
dc.contributor.authorShea-Blymyer, Colinen
dc.date.accessioned2014-12-13T22:25:37Zen
dc.date.available2014-12-13T22:25:37Zen
dc.date.issued2014-12en
dc.descriptionFor OutbreakSum, there are Word and PDF versions of the final report, as well as PowerPoint and PDF versions of the final presentation. A ZIP file with an extensive code base also is included.en
dc.description.abstractThe goal of the fall 2014 Disease Outbreak Project (OutbreakSum) was to develop software for automatically analyzing and summarizing large collections of texts pertaining to disease outbreaks. Although our code was tested on collections about specific diseases--a small one about Encephalitis and a large one about Ebola--most of our tools would work on texts about any infectious disease, where the key information relates to locations, dates, number of cases, symptoms, prognosis, and government and healthcare organization interventions. In the course of the project, we developed a code base that performs several key Natural Language Processing (NLP) functions. Some of the tools that could potentially be useful for other Natural Language Generation (NLG) projects include: 1. A framework for developing MapReduce programs in Python that allows for local running and debugging; 2. Tools for document collection cleanup procedures such as small-file removal, duplicate-file removal (based on content hashes), sentence and paragraph tokenization, nonrelevant file removal, and encoding translation; 3. Utilities to simplify and speed up Named Entity Recognition with Stanford NER by using the Java API directly; 4. Utilities to leverage the full extent of the Stanford CoreNLP library, which include tools for parsing and coreference resolution; 5. Utilities to simplify using the OpenNLP Java library for text processing. By configuring and running a single Java class, you can use OpenNLP to perform part-of-speech tagging and named entity recognition on your entire collection in minutes. We’ve classified the tools available in OutbreakSum into four major modules: 1. Collection Processing; 2. Local Language Processing; 3. MapReduce with Apache Hadoop; 4. Summarization.en
dc.description.sponsorshipNSF DUE-1141209 and IIS-1319578en
dc.identifier.urihttp://hdl.handle.net/10919/51133en
dc.language.isoen_USen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectdisease outbreaken
dc.subjecttext classificationen
dc.subjectinformation retrievalen
dc.subjectMachine learningen
dc.subjectsummarizationen
dc.subjectnatural language processingen
dc.subjectnatural language generationen
dc.subjecttext extractionen
dc.titleOutbreakSum: Automatic Summarization of Texts Relating to Disease Outbreaksen
dc.typeArticleen
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 5
Name:
outbreaksum_code.zip
Size:
265.05 MB
Format:
Description:
OutbreakSum Code
Name:
OutbreakSum-FinalReport.docx
Size:
1.2 MB
Format:
Microsoft Word XML
Description:
OutbreakSum Final Report, Word version
Loading...
Thumbnail Image
Name:
OutbreakSum-FinalReport.pdf
Size:
1.63 MB
Format:
Adobe Portable Document Format
Description:
OutbreakSum Final Report, PDF Version
Name:
OutbreakSum-FinalPresentation.pptx
Size:
353.21 KB
Format:
Microsoft Powerpoint XML
Description:
OutbreakSum Final Presentation, Powerpoint Version
Loading...
Thumbnail Image
Name:
OutbreakSum-FinalPresentation.pdf
Size:
440.14 KB
Format:
Adobe Portable Document Format
Description:
OutbreakSum Final Presentation, PDF Version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description:

Version History

Now showing 1 - 1 of 1
VersionDateSummary
1*
2014-12-13 22:25:37
* Selected version