Integration and Implementation (INT) CS5604 Fall 2019

dc.contributor.authorAgarwal, Rahulen
dc.contributor.authorAlbahar, Hadeelen
dc.contributor.authorRoth, Ericen
dc.contributor.authorSen, Malabikaen
dc.contributor.authorYu, Lixingen
dc.date.accessioned2020-01-18T19:03:07Zen
dc.date.available2020-01-18T19:03:07Zen
dc.date.issued2019-12-11en
dc.description.abstractThe first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish the second major goal, supporting modern search and browse capabilities for two major content collections: (1) 200,000 ETDs (electronic theses and dissertations), and (2) 14 million settlement documents from the lawsuit wherein 39 U.S. states sued the major tobacco companies. The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system exercises text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by 6 teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. These are the teams on this project: Collection Management ETDs (CME), Collection Management Tobacco Settlement Documents (CMT), Elasticsearch (ELS), Front-end and Kibana (FEK), Integration and Implementation (INT), and Text Analysis and Machine Learning (TML). This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. For example, the ELS team container environment shall include Elasticsearch with its internal associated database. INT also administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and Ceph. During formative stages of development, INT also has a role in guiding team evaluations of prospective container components. Beyond the project formative stages, INT has the responsibility of deploying containers in a development environment according to mutual specifications agreed upon with each team. The development process is fluid. INT services team requests for new containers and updates to existing containers in a continuous integration process until the first system testing environment is completed. During the development stage INT also collaborates with the CME and CMT teams on the data pipeline subsystems for the ingestion and processing of new collection documents. With the testing environment established, the focus of the INT team shifts toward gathering of system performance data and making any systemic adjustments necessary based on the analysis of testing results. Finally, INT provides a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. INT archives this distribution on Docker Hub and deploys it on the Virginia Tech CS Cloud.en
dc.description.notesThis submission includes INT final report in pdf along with its Overleaf source and INT final class presentation in pdf and pptx.en
dc.description.sponsorshipIMLS LG-37-19-0078-19en
dc.identifier.urihttp://hdl.handle.net/10919/96488en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons Attribution-ShareAlike 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/en
dc.subjectCICDen
dc.subjectcontainerizationen
dc.subjectIR on a Kubernetes Clusteren
dc.subjectContainerized IRen
dc.subjectIR Rancher Administrationen
dc.subjectTobacco Settlement Documents IRen
dc.subjectETDs IRen
dc.subjectDockeren
dc.subjectIR on containersen
dc.subjectDevOps in IRen
dc.subjectCI/CD in IRen
dc.subjectCS Clouden
dc.subjectKafkaen
dc.subjectVirginia Tech CS Clouden
dc.subjectRancheren
dc.subjectCI/CDen
dc.titleIntegration and Implementation (INT) CS5604 Fall 2019en
dc.typePresentationen
dc.typeReporten

Files

Original bundle
Now showing 1 - 4 of 4
Name:
INTpresentation.pptx
Size:
1.82 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
INTpresentation.pdf
Size:
975.11 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
INTreport.pdf
Size:
2.55 MB
Format:
Adobe Portable Document Format
Name:
INTreportOverleaf.zip
Size:
3.84 MB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: