Show simple item record

dc.contributor.authorCao, Yushengen
dc.contributor.authorMazloom, Rezaen
dc.contributor.authorOgunleye, Makanjuolaen
dc.date.accessioned2020-12-18T03:23:12Zen
dc.date.available2020-12-18T03:23:12Zen
dc.date.issued2020-12-16en
dc.identifier.urihttp://hdl.handle.net/10919/101526en
dc.description.abstractWith the demand and abundance of information increasing over the last two decades, generations of computer scientists are trying to improve the whole process of information searching, retrieval, and storage. With the diversification of the information sources, users' demand for various requirements of the data has also changed drastically both in terms of usability and performance. Due to the growth of the source material and requirements, correctly sorting, filtering, and storing has given rise to many new challenges in the field. With the help of all four other teams on this project, we are developing an information retrieval, analysis, and storage system to retrieve data from Virginia Tech's Electronic Thesis and Dissertation (ETD), Twitter, and Web Page archives. We seek to provide an appropriate data research and management tool to the users to access specific data. The system will also give certain users the authority to manage and add more data to the system. This project's deliverable will be combined with four others to produce a system usable by Virginia Tech's library system to manage, maintain, and analyze these archives. This report attempts to introduce the system components and design decisions regarding how it has been planned and implemented. Our team has developed a front end web interface that is able to search, retrieve, and manage three important content collection types: ETDs, tweets, and web pages. The interface incorporates a simple hierarchical user permission system, providing different levels of access to its users. In order to facilitate the workflow with other teams, we have containerized this system and made it available on the Virginia Tech cloud server. The system also makes use of a dynamic workflow system using a KnowledgeGraph and Apache Airflow, providing high levels of functional extensibility to the system. This allows curators and researchers to use containerised services for crawling, pre-processing, parsing, and indexing their custom corpora and collections that are available to them in the system.en
dc.description.sponsorshipNSF: CMMI-1638207en
dc.description.sponsorshipIMLS: LG-37-19-0078-19en
dc.description.sponsorshipNSF: OAC-1835660en
dc.language.isoenen
dc.publisherVirginia Techen
dc.subjectFront-enden
dc.subjectInformation Storage and Retrievalen
dc.subjectCS5604en
dc.subjectTweetsen
dc.subjectAirflowen
dc.subjectWeb pagesen
dc.subjectETDen
dc.subjectElasticsearchen
dc.subjectElectronic Theses and Dissertationsen
dc.subjectKnowledgeGraphen
dc.titleCS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Projecten
dc.typePresentationen
dc.typeReporten
dc.typeVideoen
dc.description.notesCS5604F2020FEreport.zip: Project documentation in LaTeX CS5604F2020FEreport.pdf: Project documentation CS5604F2020FEpresentation.pdf: Presentation slides CS5604F2020FEpresentation.pptx: Presentation slides CS5604F2020FEpresentation-demo.mp4: Video of the prototype demonstrationen


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record