Text Data Mining Studio ETDs

dc.contributor.authorRielage, Roberten
dc.contributor.authorBrinkley, Hannahen
dc.contributor.authorDy, Christianen
dc.contributor.authorVaanjilnorov, Jasonen
dc.date.accessioned2020-05-13T23:17:40Zen
dc.date.available2020-05-13T23:17:40Zen
dc.date.issued2020-05-11en
dc.description.abstractThe goal of the Text Data Mining Studio ETD project was to develop a software system that allows less technically-minded researchers to be able to easily access a vast amount of data to be used for text data analytics. Specifically, the problem that our team addressed was the lack of a centralized tool for analysis of a large amount of text files that would be valuable data to be used for machine learning and other forms of analytics for long form text documents. Our team created a centralized tool using ElasticSearch, JupyterHub, and Jupyter Notebooks, in the Docker Compose architecture, to provide this service to researchers. We envision this tool being used by researchers whose work involves the ingestion and analysis of large bodies of text. This work could involve producing deep learning based systems for textual analysis and search, for automatic abstract production, or perhaps for trend tracking across highly temporally spaced documents. The tool is intended to be a flexible foundation for these and other research tasks. During the process of producing this tool, our team learned a great deal about Docker systems, compartmentalization, and JupyterHub, as well as working in a widely distributed team using virtual meeting analogs. Unfortunately the tool is as of yet, incomplete, as the ElasticSearch systems are not linked to the JupyterHub frontend. We are hopeful that given the information outlined in the report that completing the tool will be possible with future work.en
dc.description.notesThe following documents are included in this item: Project Report: This report includes an explanation of the requirements given to us by our client, the design of our project, specific implementation information, a User’s Manual, a Developer’s Manual, as well as the lessons we learned as a team in the process of working on this project, and a short explanation of recommended extensibility. Project Presentation: A short presentation on the project, containing a demonstration, and an overview of much of the content in the report. Project Demonstration: A short demo of the project, originally used as a prepackaged part of the presentation. Project Files: - Dockerfile: Docker execution file - Docker-compose.yml: Docker configuration file - load_metadata.ipynb: Base Python code for JupyterHub system.en
dc.description.sponsorshipIMLS LG-37-19-0078-19en
dc.identifier.urihttp://hdl.handle.net/10919/98252en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectETDen
dc.subjectOCRen
dc.subjectElasticsearchen
dc.subjectJupyteren
dc.subjectJupyterHuben
dc.subjectJupyter Notebooksen
dc.subjectDockeren
dc.subjectDocker Composeen
dc.titleText Data Mining Studio ETDsen
dc.typePresentationen
dc.typeReporten
dc.typeVideoen

Files

Original bundle
Now showing 1 - 5 of 6
Name:
TextDataMiningStudioETDAnalysisToolFiles.zip
Size:
4.79 KB
Format:
Name:
TextDataMiningStudioETDAnalysisToolDemonstration.mp4
Size:
5.32 MB
Format:
MP4 Container format for video files
Loading...
Thumbnail Image
Name:
TextDataMiningStudioETDAnalysisToolPresentation.pdf
Size:
755.45 KB
Format:
Adobe Portable Document Format
Name:
TextDataMiningStudioETDAnalysisToolPresentation.pptx
Size:
694.45 KB
Format:
Microsoft Powerpoint XML
Name:
TextDataMiningStudioETDAnalysisToolReport.docx
Size:
519.95 KB
Format:
Microsoft Word XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: