This collection contains the final projects of the students in various offerings of the course Computer Science 5604: Information Retrieval. This course is taught by Professor Ed Fox. Analyzing, indexing, representing, storing, searching, retrieving, processing and presenting information and documents using fully automatic systems. The information may be in the form of text, hypertext, multimedia, or hypermedia. The systems are based on various models, e.g., Boolean logic, fuzzy logic, probability theory, etc., and they are implemented using inverted files, relational thesauri, special hardware, and other approaches. Evaluation of the systems' efficiency and effectiveness.

Recent Submissions

  • Collection Management Tweets Project Fall 2017 

    Khaghani, Farnaz; Zeng, Junkai; Bhuiyan, Momen; Tabassum, Anika; Bandyopadhyay, Payel (Virginia Tech, 2018-01-17)
    The report included in this submission documents the work by the Collection Management Tweets (CMT) team, which is a part of the bigger effort in CS5604 on building a state-of-the-art information retrieval and analysis ...
  • CS5604 Information Storage and Retrieval Fall 2017 Solr Report 

    Kumar, Abhinav; Bangad, Anand; Robertson, Jeff; Garg, Mohit; Ramesh, Shreyas; Mi, Siyu; Wang, Xinyue; Wang, Yu (Virginia Tech, 2018-01-15)
    The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets and millions of webpages for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) ...
  • CS5604 Fall 2017 Clustering and Topic Analysis 

    Baghudana, Ashish; Ahuja, Aman; Bellam, Pavan; Chintha, Rammohan; Sambaturu, Pratyush; Malpani, Ashish; Shetty, Shruti; Yang, Mo (Virginia Tech, 2018-01-13)
    One of the key objectives of the CS-5604 course titled Information Storage and Retrieval is to build a pipeline for a state-of-the-art retrieval system for the Integrated Digital Event Archiving and Library (IDEAL) and ...
  • CS5604 Fall 2017 Classification Team Submission 

    Azizi, Ahmadreza; Mulchandani, Deepika; Naik, Amit; Ngo, Khai; Patil, Suraj; Vezvaee, Arian; Yang, Robin (Virginia Tech, 2018-01-03)
    This project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to ...
  • Collection Management Webpages 

    Eagan, Mackenzie; Liang, Xiao; Michael, Louis; Patil, Supritha (Virginia Polytechnic Institute and State University, 2017-12-25)
    The Collection Management Webpages team is responsible for collecting, processing, and storing webpages from different sources. Our team worked on familiarizing ourselves with the necessary tools and data required to produce ...
  • CS5604: Information and Storage Retrieval ​Fall 2017 - FE (Front-End Team)  

    Chon, Jieun; Wang, Haitao; Bian, Yali; Niu, Shuo (Virginia Tech, 2017-12-24)
    Social media and Web data are becoming important sources of information for researchers to monitor and study global events. GETAR, led by Dr. Edward Fox, is a project aiming to collect, organize, browse, visualize, ...
  • Collection Management Webpages - Fall 2016 CS5604 

    Dao, Tung; Wakeley, Christopher; Weigang, Liu (Virginia Tech, 2017-03-23)
    The Collection Management Webpages (CMW) team is responsible for collecting, processing and storing webpages from different sources including tweets from multiple collections and contributors, such as those related to ...
  • CS5604: Information and Storage Retrieval ​Fall 2016 - CMT (Collection Management Tweets) 

    Wagner, Mitchell J.; Abidi, Faiz; Fan, Shuangfei (Virginia Tech, 2016-12-08)
    As the Collection Management Tweets team in the Fall 2016 CS5604 class, we were responsible for processing >1.2 billion tweets, including data transfer, noise reduction, tweet augmentation, and storage via several technologies. ...
  • CS5604 Fall 2016 Classification Team Final Report 

    Williamson, Eric R.; Chakravarty, Saurabh (Virginia Tech, 2016-12-08)
    Content is generated on the Web at an exponential rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social ...
  • Clustering and Topic Analysis in CS 5604 Information Retrieval Fall 2016 

    Bartolome, Abigail; Islam, MD; Vundekode, Soumya (Virginia Tech, 2016-12-08)
    The IDEAL (Integrated Digital Event Archiving and Library) and Global Event and Trend Archive Research (GETAR) projects aim to build a robust Information Retrieval (IR) system by retrieving tweets and webpages from social ...
  • CS 5604 INFORMATION STORAGE AND RETRIEVAL Front-End Team Fall 2016 Final Report 

    Kohler, Rachel; Tasooji, Reza; Sullivan, Patrick (Virginia Tech, 2016-12-08)
    Information Retrieval systems are a common tool for building research and disseminating knowledge. For this to be possible, these systems must be able to effectively show varying amounts of relevant information to the ...
  • CS5604 Fall 2016 Solr Team Project Report 

    Li, Liuqing; Pillai, Anusha; Wang, Ye; Tian, Ke (Virginia Tech, 2016-12-07)
    This submission describes the work the SOLR team completed in Fall 2016. It includes the final report and presentation, as well as key relevant materials (indexing scripts & Java code). Based on the work in Spring 2016, ...
  • Collaborative Filtering for IDEAL 

    Li, Tianyi; Nakate, Pranav; Song, Ziqian (2016-05-04)
    The students of CS5604 (Information Retrieval and Storage), have been building an Information Retrieval System based on tweet and webpage collections of the Digital Library Research Laboratory (DLRL). The students have ...
  • CS5604: Clustering and Social Networks for IDEAL 

    Vishwasrao, Saket; Thorve, Swapna; Tang, Lijie (2016-05-03)
    The Integrated Digital Event Archiving and Library (IDEAL) project of Virginia Tech provides services for searching, browsing, analysis, and visualization of over 1 billion tweets and over 65 million webpages. The project ...
  • CS5604 Front-End User Interface Team 

    Masiane, Moeti; Warren, Lawrence (2016-05-03)
    This project is part of a wider research project whose focus is developing an information retrieval and analysis system in support of the IDEAL (Integrated Digital Event Archiving and Library) project. The search engine ...
  • Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEAL 

    Mehta, Sneha; Vinayagam, Radha Krishnan (2016-05-04)
    The IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level ...
  • Collection Management for IDEAL 

    Ma, Yufeng; Nan, Dong (2016-05-04)
    The collection management portion of the information retrieval system has three major tasks. The first task is to perform incremental update of the new data flow from the tweet MySQL database to HDFS and then to HBase. ...
  • Classification Project in CS5604, Spring 2016 

    Bock, Matthew; Cantrell, Michael; Shahin, Hossameldin (2016-05-04)
    In the grand scheme of a large Information Retrieval project, the work of our team was that of performing text classification on both tweet collections and their associated webpages. In order to accomplish this task, we ...
  • Solr Project with IDEAL, in CS5604 (Information Storage and Retrieval) 

    Xia, Long; Jiang, Tingting; Galad, Andrej; Maharshi, Shivam (2016-05-04)
    This submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations, as ...
  • LDA Team Project in CS5604, Spring 2015: Extracting Topics from Tweets and Webpages for IDEAL 

    Pumma, Sarunya; Liu, Xiaoyang (2015-05-10)
    IDEAL or Integrated Digital Event Archiving and Library is a project of Virginia Tech to implement a state-of-the-art event-based information retrieval system. A practice project of CS 5604 Information Retrieval is a part ...

View more