Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning

dc.contributor.authorLi, Liuqingen
dc.contributor.authorGeissinger, Jack H.en
dc.contributor.authorIngram, William A.en
dc.contributor.authorFox, Edward A.en
dc.date.accessioned2020-10-12T15:26:33Zen
dc.date.available2020-10-12T15:26:33Zen
dc.date.issued2020en
dc.description.abstractNatural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching technique specifically designed to motivate students to learn efficiently, work collaboratively, and communicate effectively. With this aim, we developed a problem-based learning course for both undergraduate and graduate students to teach NLP. We provided student teams with big data sets, basic guidelines, cloud computing resources, and other aids to help different teams in summarizing two types of big collections: Web pages related to events, and electronic theses and dissertations (ETDs). Student teams then deployed different libraries, tools, methods, and algorithms to solve the task of big data text summarization. Summarization is an ideal problem to address learning NLP since it involves all levels of linguistics, as well as many of the tools and techniques used by NLP practitioners. The evaluation results showed that all teams generated coherent and readable summaries. Many summaries were of high quality and accurately described their corresponding events or ETD chapters, and the teams produced them along with NLP pipelines in a single semester. Further, both undergraduate and graduate students gave statistically significant positive feedback, relative to other courses in the Department of Computer Science. Accordingly, we encourage educators in the data and information management field to use our approach or similar methods in their teaching and hope that other researchers will also use our data sets and synergistic solutions to approach the new and challenging tasks we addressed.en
dc.description.sponsorshipThanks go to the US National Science Foundation for its support of the Coordinated, Behaviorally Aware Recovery for Transportation and Power Disruptions (CBAR-tpd), Global Event and Trend Archive Research (GETAR), and Integrated Digital Event Archiving and Library (IDEAL) projects, through grants CMMI-1638207, IIS-1319578, and IIS-1619028, as well as IIS-1619371 to partner Internet Archive. Thanks go to the Institute for Museum and Library Services (IMLS), for the support of Opening Books and the National Corpus of Graduate Research, through LG-37-19-0078-19. We also thank the student teams in CS4984/CS5984 for their work.en
dc.format.mimetypeapplication/pdfen
dc.identifier.doihttps://doi.org/10.2478/dim-2020-0003en
dc.identifier.issue1en
dc.identifier.urihttp://hdl.handle.net/10919/100454en
dc.identifier.volume4en
dc.language.isoenen
dc.publisherSciendoen
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/en
dc.subjectinformation system educationen
dc.subjectcomputer science educationen
dc.subjectproblem-based learningen
dc.subjectnatural language processingen
dc.subjectNLPen
dc.subjectbig data text analyticsen
dc.subjectMachine learningen
dc.subjectdeep learningen
dc.titleTeaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learningen
dc.title.serialData and Information Managementen
dc.typeArticle - Refereeden
dc.type.dcmitypeTexten

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
out.pdf
Size:
2.11 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: