Elasticsearch (ELS) CS5604 Fall 2019

dc.contributor.authorLi, Yuanen
dc.contributor.authorChekuri, Satviken
dc.contributor.authorHu, Tianruien
dc.contributor.authorKumar, Soumya Arvinden
dc.contributor.authorGill, Nicholasen
dc.date.accessioned2020-01-07T20:33:47Zen
dc.date.available2020-01-07T20:33:47Zen
dc.date.issued2019-12-12en
dc.description.abstractWe are building an Information and Retrieval System that will work as a search engine to support searching, ranking, browsing, and recommendations for two large collections of data. The first collection is part of Virginia Tech's collection of Electronic Theses and Dissertations (ETDs). The Virginia Tech Library has a large collection of ETDs. Currently, there is an effort being made to digitize the pre-1997 theses and dissertations and load them into VTechWorks. Our data set contains over 30K ETDs. The second collection is of tobacco settlement documents. There are 14 million documents in this data set. We are using a CEPH container to store and retrieve information. To achieve its goals, the project has six teams: Collection Management ETDs, Collection Management Tobacco Settlement Documents, Elasticsearch, Front-end and Kibana, Integration and Implementation, and Text Analytics and Machine Learning. This report addresses the work performed by the Elasticsearch team. The Elasticsearch team helps to enable searching and browsing, which are supported based on: facets associated with information extracted from documents, analysis, classification, clustering, summarization, and other processing. The report describes goals, overview, and the process of implementation with Elasticsearch. The Elasticsearch team works closely with the Kibana and Text Machine Learning groups. The data ingested in Elasticsearch is provided to the Front End team for further visualization. Thus, the report also describes the connections established with the other groups, as a high-level overview of the course project. The user manuals have been provided for the reference of other groups.en
dc.description.notesELSFinalReport.pdf - PDF file of the final report ELSFinalReport.zip - LaTeX source of the final report ELSPresentation.pdf - PDF file of the final presentation ELSPresentation.pptx - Editable file of the final presentation ELSSourceCode.zip - Package of all the Python scripts, HTTP Queries, Shell scripts associated with this projecten
dc.description.sponsorshipIMLS LG-37-19-0078-19en
dc.identifier.urihttp://hdl.handle.net/10919/96310en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-NonCommercial-NoDerivs 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/us/en
dc.subjectInformation Retrievalen
dc.subjectElasticsearchen
dc.titleElasticsearch (ELS) CS5604 Fall 2019en
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
ELSSourceCode.zip
Size:
70.83 KB
Format:
Name:
ELSPresentation.pptx
Size:
3.75 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
ELSPresentation.pdf
Size:
2.14 MB
Format:
Adobe Portable Document Format
Name:
ELSFinalReport.zip
Size:
8.27 MB
Format:
Loading...
Thumbnail Image
Name:
ELSFinalReport.pdf
Size:
5.67 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: