This collection contains the final projects of the students in in the course Computer Science 5604: Information Retrieval, taught in Fall, 2012 at Virginia Tech. This course was taught by Professor Ed Fox. Analyzing, indexing, representing, storing, searching, retrieving, processing and presenting information and documents using fully automatic systems. The information may be in the form of text, hypertext, multimedia, or hypermedia. The systems are based on various models, e.g., Boolean logic, fuzzy logic, probability theory, etc., and they are implemented using inverted files, relational thesauri, special hardware, and other approaches. Evaluation of the systems' efficiency and effectiveness.

Recent Submissions

  • Collaborative Filtering for IDEAL 

    Li, Tianyi; Nakate, Pranav; Song, Ziqian (2016-05-04)
    The students of CS5604 (Information Retrieval and Storage), have been building an Information Retrieval System based on tweet and webpage collections of the Digital Library Research Laboratory (DLRL). The students have ...
  • CS5604: Clustering and Social Networks for IDEAL 

    Vishwasrao, Saket; Thorve, Swapna; Tang, Lijie (2016-05-03)
    The Integrated Digital Event Archiving and Library (IDEAL) project of Virginia Tech provides services for searching, browsing, analysis, and visualization of over 1 billion tweets and over 65 million webpages. The project ...
  • CS5604 Front-End User Interface Team 

    Masiane, Moeti; Warren, Lawrence (2016-05-03)
    This project is part of a wider research project whose focus is developing an information retrieval and analysis system in support of the IDEAL (Integrated Digital Event Archiving and Library) project. The search engine ...
  • Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEAL 

    Mehta, Sneha; Vinayagam, Radha Krishnan (2016-05-04)
    The IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level ...
  • Collection Management for IDEAL 

    Ma, Yufeng; Nan, Dong (2016-05-04)
    The collection management portion of the information retrieval system has three major tasks. The first task is to perform incremental update of the new data flow from the tweet MySQL database to HDFS and then to HBase. ...
  • Classification Project in CS5604, Spring 2016 

    Bock, Matthew; Cantrell, Michael; Shahin, Hossameldin (2016-05-04)
    In the grand scheme of a large Information Retrieval project, the work of our team was that of performing text classification on both tweet collections and their associated webpages. In order to accomplish this task, we ...
  • LDA Team Project in CS5604, Spring 2015: Extracting Topics from Tweets and Webpages for IDEAL 

    Pumma, Sarunya; Liu, Xiaoyang (2015-05-10)
    IDEAL or Integrated Digital Event Archiving and Library is a project of Virginia Tech to implement a state-of-the-art event-based information retrieval system. A practice project of CS 5604 Information Retrieval is a part ...
  • Hadoop Project for IDEAL in CS5604 

    Cadena, Jose; Chen, Mengsu; Wen, Chengyuan (Virginia Tech, 2015-05-11)
    The Integrated Digital Event Archive and Library (IDEAL) system addresses the need for combining the best of digital library and archive technologies in support of stakeholders who are remembering and/or studying important ...
  • Document Clustering for IDEAL 

    Thumma, Sujit Reddy; Kalidas, Rubasri; Torkey, Hanaa (2015-05-13)
    Document clustering is an unsupervised classification of text documents into groups (clusters). The documents with similar properties are grouped together into one cluster. Documents which have dissimilar patterns are ...
  • Reducing Noise for IDEAL 

    Wang, Xiangwen; Chandrasekar, Prashant (2015-05-12)
    The corpora for which we are building an information retrieval system consists of tweets and web pages (extracted from URL links that might be included in the tweets) that have been selected based on rudimentary string ...
  • Solr Team Project Report 

    Gruss, Richard; Choudhury, Ananya; Komawar, Nikhil (2015-05-13)
    The Integrated Digital Event Archive and Library (IDEAL) is a Digital Library project that aims to collect, index, archive and provide access to digital contents related to important events, including disasters, man-made ...
  • Social Network Project for IDEAL in CS5604 

    Harb, Islam; Jin, Yilong; Cedeno, Vanessa; Mallampati, Sai Ravi Kiran; Bulusu, Bhaskara Srinivasa Bharadwaj (2015-05-11)
    The IDEAL (Integrated Digital Event Archiving and Library) project involves VT faculty, staff, and students, along with collaborators around the world, in archiving important events and integrating the digital library, ...
  • Named Entity Recognition for IDEAL 

    Du, Qianzhou; Zhang, Xuan (2015-05-10)
    The term “Named Entity”, which was first introduced by Grishman and Sundheim, is widely used in Natural Language Processing (NLP). The researchers were focusing on the information extraction task, that is extracting ...
  • Classification Team Project for IDEAL in CS5604, Spring 2015 

    Cui, Xuewen; Tao, Rongrong; Zhang, Ruide (2015-05-10)
    Given the tweets from the instructor and cleaned webpages from the Reducing Noise team, the planned tasks for our group were to find the best: (1) way to extract information that will be used for document representation; ...
  • Classification of Arabic Documents 

    Elbery, Ahmed (2012-12-19)
    Arabic language is a very rich language with complex morphology, so it has a very different and difficult structure than other languages. So it is important to build an Arabic Text Classifier (ATC) to deal with this complex ...
  • CINETGraphCrawl - Constructing graphs from blogs 

    Kaw, Rushi; Subbiah, Rajesh; Makkapati, Hemanth (2012-12-11)
    Internet forums, weblogs, social networks, and photo and video sharing websites are some forms of social media that are at the forefront of enabling communication among individuals. The rich information captured in ...
  • Large Scale Network Visualization with Gephi 

    Alam, Maksudul; Arifuzzaman, S M; Bhuiyan, Md Hasanuzzaman (2012-12-11)
    The notion of graphs or networks is sufficiently pervasive since it can be used to model various types of data sources. Social, biological, and other networks capture the underlying structural and relational properties. ...
  • ProjOpenDSA - OpenDSA Log Support 

    Wei, Shiyi; Suwardiman, Victoria; Swaminathan, Anand (2012-12-11)
    The OpenDSA project is an online eTextbook project that includes not only literature but other dynamic content to be used in Data Structures and Algorithms courses. OpenDSA contains exercises of various types to go along ...
  • Focused Crawling 

    Farag, Mohamed Magdy Gharib; Khan, Mohammed Saquib Akmal; Mishra, Gaurav; Ganesh, Prasad Krishnamurthi (2012-12-11)
    Finding information on WWW is difficult and challenging task because of the extremely large volume of the WWW. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages on the ...
  • Analyzing and Visualizing Disaster Phases from Social Media Streams 

    Lin, Xiao; Chen, Lianghze; Wood, Andrew (2012-12-11)
    Working under the direction of CTRNet, we developed a procedure for classifying Twitter data related to natural/man-made disasters into one of the Four Phases of Emergency Management (response, recovery, mitigation, and ...

View more