CS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Project

dc.contributor.authorCao, Yushengen
dc.contributor.authorMazloom, Rezaen
dc.contributor.authorOgunleye, Makanjuolaen
dc.date.accessioned2020-12-18T03:23:12Zen
dc.date.available2020-12-18T03:23:12Zen
dc.date.issued2020-12-16en
dc.description.abstractWith the demand and abundance of information increasing over the last two decades, generations of computer scientists are trying to improve the whole process of information searching, retrieval, and storage. With the diversification of the information sources, users' demand for various requirements of the data has also changed drastically both in terms of usability and performance. Due to the growth of the source material and requirements, correctly sorting, filtering, and storing has given rise to many new challenges in the field. With the help of all four other teams on this project, we are developing an information retrieval, analysis, and storage system to retrieve data from Virginia Tech's Electronic Thesis and Dissertation (ETD), Twitter, and Web Page archives. We seek to provide an appropriate data research and management tool to the users to access specific data. The system will also give certain users the authority to manage and add more data to the system. This project's deliverable will be combined with four others to produce a system usable by Virginia Tech's library system to manage, maintain, and analyze these archives. This report attempts to introduce the system components and design decisions regarding how it has been planned and implemented. Our team has developed a front end web interface that is able to search, retrieve, and manage three important content collection types: ETDs, tweets, and web pages. The interface incorporates a simple hierarchical user permission system, providing different levels of access to its users. In order to facilitate the workflow with other teams, we have containerized this system and made it available on the Virginia Tech cloud server. The system also makes use of a dynamic workflow system using a KnowledgeGraph and Apache Airflow, providing high levels of functional extensibility to the system. This allows curators and researchers to use containerised services for crawling, pre-processing, parsing, and indexing their custom corpora and collections that are available to them in the system.en
dc.description.notesCS5604F2020FEreport.zip: Project documentation in LaTeX CS5604F2020FEreport.pdf: Project documentation CS5604F2020FEpresentation.pdf: Presentation slides CS5604F2020FEpresentation.pptx: Presentation slides CS5604F2020FEpresentation-demo.mp4: Video of the prototype demonstrationen
dc.description.sponsorshipNSF: CMMI-1638207en
dc.description.sponsorshipIMLS: LG-37-19-0078-19en
dc.description.sponsorshipNSF: OAC-1835660en
dc.identifier.urihttp://hdl.handle.net/10919/101526en
dc.language.isoenen
dc.publisherVirginia Techen
dc.subjectFront-enden
dc.subjectInformation Storage and Retrievalen
dc.subjectCS5604en
dc.subjectTweetsen
dc.subjectAirflowen
dc.subjectWeb pagesen
dc.subjectETDen
dc.subjectElasticsearchen
dc.subjectElectronic Theses and Dissertationsen
dc.subjectKnowledgeGraphen
dc.titleCS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Projecten
dc.typePresentationen
dc.typeReporten
dc.typeVideoen

Files

Original bundle
Now showing 1 - 5 of 5
Name:
CS5604F2020FEpresentation-demo.mp4
Size:
244.5 MB
Format:
MP4 Container format for video files
Name:
CS5604F2020FEreport.zip
Size:
26.23 MB
Format:
Loading...
Thumbnail Image
Name:
CS5604F2020FEreport.pdf
Size:
23.63 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
CS5604F2020FEpresentation.pdf
Size:
174.72 KB
Format:
Adobe Portable Document Format
Name:
CS5604F2020FEpresentation.pptx
Size:
1016.87 KB
Format:
Microsoft Powerpoint XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: