This collection contains the final projects of the students in various offerings of the course Computer Science 5604: Information Retrieval. This course is taught by Professor Ed Fox. Analyzing, indexing, representing, storing, searching, retrieving, processing and presenting information and documents using fully automatic systems. The information may be in the form of text, hypertext, multimedia, or hypermedia. The systems are based on various models, e.g., Boolean logic, fuzzy logic, probability theory, etc., and they are implemented using inverted files, relational thesauri, special hardware, and other approaches. Evaluation of the systems' efficiency and effectiveness.

Recent Submissions

  • Integration and Implementation (INT) CS 5604 F2020 

    Hicks, Alexander; Thazhath, Mohit; Gupta, Suraj; Long, Xingyu; Poland, Cherie; Hsieh, Hsinhan; Mahajan, Yash (Virginia Tech, 2020-12-18)
    The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish ...
  • CS 5604: Information Storage and Retrieval - Webpages (WP) Team 

    Barry-Straume, Jostein; Vives, Cristian; Fan, Wentao; Tan, Peng; Zhang, Shuaicheng; Hu, Yang; Wilson, Tishauna (Virginia Tech, 2020-12-18)
    The first major goal of this project is to build a state-of-the-art information retrieval engine for searching webpages and for opening up access to existing and new webpage collections resulting from Digital Library ...
  • CS5604 (Information Retrieval) Fall 2020 Front-end (FE) Team Project 

    Cao, Yusheng; Mazloom, Reza; Ogunleye, Makanjuola (Virginia Tech, 2020-12-16)
    With the demand and abundance of information increasing over the last two decades, generations of computer scientists are trying to improve the whole process of information searching, retrieval, and storage. With the ...
  • CS 5604 2020: Information Storage and Retrieval TWT - Tweet Collection Management Team 

    Baadkar, Hitesh; Chimote, Pranav; Hicks, Megan; Juneja, Ikjot; Kusuma, Manisha; Mehta, Ujjval; Patil, Akash; Sharma, Irith (Virginia Tech, 2020-12-16)
    The Tweet Collection Management (TWT) Team aims to ingest 5 billion tweets, clean this data, analyze the metadata present, extract key information, classify tweets into categories, and finally, index these tweets into ...
  • CS5604 Fall 2020: Electronic Thesis and Dissertation (ETD) Team 

    Fan, Jiahui; Hardy, Nicolas; Furman, Samuel; Manzoor, Javaid; Nguyen, Alexander; Raghuraman, Aarathi (Virginia Tech, 2020-12-16)
    The Fall 2020 CS 5604 (Information Storage and Retrieval) class, led by Dr. Edward Fox, is building an information retrieval and analysis system that supports electronic theses and dissertations, tweets, and webpages. We ...
  • Integration and Implementation (INT) CS5604 Fall 2019 

    Agarwal, Rahul; Albahar, Hadeel; Roth, Eric; Sen, Malabika; Yu, Lixing (Virginia Tech, 2019-12-11)
    The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish the ...
  • Collection Management of Electronic Theses and Dissertations (CME) CS5604 Fall 2019 

    Kaushal, Kulendra Kumar; Kulkarni, Rutwik; Sumant, Aarohi; Wang, Chaoran; Yuan, Chenhan; Yuan, Liling (Virginia Tech, 2019-12-23)
    The class ``CS 5604: Information Storage and Retrieval'' in the fall of 2019 is divided into six teams to enhance the usability of the corpus of electronic theses and dissertations maintained by Virginia Tech University ...
  • Collection Management Tobacco Settlement Documents (CMT) CS5604 Fall 2019 

    Muhundan, Sushmethaa; Bendelac, Alon; Zhao, Yan; Svetovidov, Andrei; Biswas, Debasmita; Marin Thomas, Ashin (Virginia Tech, 2019-12-11)
    Consumption of tobacco causes health issues, both mental and physical. Despite this widely known fact, tobacco companies had sustained their huge presence in the market over the past century owing to a variety of successful ...
  • Front-End Kibana (FEK) CS5604 Fall 2019 

    Powell, Edward; Liu, Han; Huang, Rong; Sun, Yanshen; Xu, Chao (Virginia Tech, 2020-01-13)
    During the last two decades, web search engines have been driven to new quality levels due to the continuous efforts made to optimize the effectiveness of information retrieval. More and more people are becoming satisfied ...
  • Elasticsearch (ELS) CS5604 Fall 2019 

    Li, Yuan; Chekuri, Satvik; Hu, Tianrui; Kumar, Soumya Arvind; Gill, Nicholas (Virginia Tech, 2019-12-12)
    We are building an Information and Retrieval System that will work as a search engine to support searching, ranking, browsing, and recommendations for two large collections of data. The first collection is part of Virginia ...
  • Text Analytics and Machine Learning (TML) CS5604 Fall 2019 

    Mansur, Rifat Sabbir; Mandke, Prathamesh; Gong, Jiaying; Bharadwaj, Sandhya M.; Juvekar, Adheesh Sunil; Chougule, Sharvari (Virginia Tech, 2019-12-29)
    In order to use the burgeoning amount of data for knowledge discovery, it is becoming increasingly important to build efficient and intelligent information retrieval systems.The challenge in informational retrieval lies ...
  • Collection Management Tweets Project Fall 2017 

    Khaghani, Farnaz; Zeng, Junkai; Bhuiyan, Momen; Tabassum, Anika; Bandyopadhyay, Payel (Virginia Tech, 2018-01-17)
    The report included in this submission documents the work by the Collection Management Tweets (CMT) team, which is a part of the bigger effort in CS5604 on building a state-of-the-art information retrieval and analysis ...
  • CS5604 Information Storage and Retrieval Fall 2017 Solr Report 

    Kumar, Abhinav; Bangad, Anand; Robertson, Jeff; Garg, Mohit; Ramesh, Shreyas; Mi, Siyu; Wang, Xinyue; Wang, Yu (Virginia Tech, 2018-01-15)
    The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets and millions of webpages for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) ...
  • CS5604 Fall 2017 Clustering and Topic Analysis 

    Baghudana, Ashish; Ahuja, Aman; Bellam, Pavan; Chintha, Rammohan; Sambaturu, Pratyush; Malpani, Ashish; Shetty, Shruti; Yang, Mo (Virginia Tech, 2018-01-13)
    One of the key objectives of the CS-5604 course titled Information Storage and Retrieval is to build a pipeline for a state-of-the-art retrieval system for the Integrated Digital Event Archiving and Library (IDEAL) and ...
  • CS5604 Fall 2017 Classification Team Submission 

    Azizi, Ahmadreza; Mulchandani, Deepika; Naik, Amit; Ngo, Khai; Patil, Suraj; Vezvaee, Arian; Yang, Robin (Virginia Tech, 2018-01-03)
    This project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to ...
  • Collection Management Webpages 

    Eagan, Mackenzie; Liang, Xiao; Michael, Louis; Patil, Supritha (Virginia Polytechnic Institute and State University, 2017-12-25)
    The Collection Management Webpages team is responsible for collecting, processing, and storing webpages from different sources. Our team worked on familiarizing ourselves with the necessary tools and data required to produce ...
  • CS5604: Information and Storage Retrieval ​Fall 2017 - FE (Front-End Team)  

    Chon, Jieun; Wang, Haitao; Bian, Yali; Niu, Shuo (Virginia Tech, 2017-12-24)
    Social media and Web data are becoming important sources of information for researchers to monitor and study global events. GETAR, led by Dr. Edward Fox, is a project aiming to collect, organize, browse, visualize, ...
  • Collection Management Webpages - Fall 2016 CS5604 

    Dao, Tung; Wakeley, Christopher; Weigang, Liu (Virginia Tech, 2017-03-23)
    The Collection Management Webpages (CMW) team is responsible for collecting, processing and storing webpages from different sources including tweets from multiple collections and contributors, such as those related to ...
  • CS5604: Information and Storage Retrieval ​Fall 2016 - CMT (Collection Management Tweets) 

    Wagner, Mitchell J.; Abidi, Faiz; Fan, Shuangfei (Virginia Tech, 2016-12-08)
    As the Collection Management Tweets team in the Fall 2016 CS5604 class, we were responsible for processing >1.2 billion tweets, including data transfer, noise reduction, tweet augmentation, and storage via several technologies. ...
  • CS5604 Fall 2016 Classification Team Final Report 

    Williamson, Eric R.; Chakravarty, Saurabh (Virginia Tech, 2016-12-08)
    Content is generated on the Web at an exponential rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social ...

View more