Now showing items 12-31 of 35

    • Collection Management Webpages - Fall 2016 CS5604 

      Dao, Tung; Wakeley, Christopher; Weigang, Liu (Virginia Tech, 2017-03-23)
      The Collection Management Webpages (CMW) team is responsible for collecting, processing and storing webpages from different sources including tweets from multiple collections and contributors, such as those related to ...
    • CS 5604 INFORMATION STORAGE AND RETRIEVAL Front-End Team Fall 2016 Final Report 

      Kohler, Rachel; Tasooji, Reza; Sullivan, Patrick (Virginia Tech, 2016-12-08)
      Information Retrieval systems are a common tool for building research and disseminating knowledge. For this to be possible, these systems must be able to effectively show varying amounts of relevant information to the ...
    • CS5604 Fall 2016 Classification Team Final Report 

      Williamson, Eric R.; Chakravarty, Saurabh (Virginia Tech, 2016-12-08)
      Content is generated on the Web at an exponential rate. The type of content varies from text on a traditional webpage to text on social media portals (e.g., social network sites and microblogs). One such example of social ...
    • CS5604 Fall 2016 Solr Team Project Report 

      Li, Liuqing; Pillai, Anusha; Wang, Ye; Tian, Ke (Virginia Tech, 2016-12-07)
      This submission describes the work the SOLR team completed in Fall 2016. It includes the final report and presentation, as well as key relevant materials (indexing scripts & Java code). Based on the work in Spring 2016, ...
    • CS5604 Fall 2017 Classification Team Submission 

      Azizi, Ahmadreza; Mulchandani, Deepika; Naik, Amit; Ngo, Khai; Patil, Suraj; Vezvaee, Arian; Yang, Robin (Virginia Tech, 2018-01-03)
      This project submission includes the work of the 'Classification' team of the CS5604 'Information Storage and Retrieval' course of Fall 2017 towards the GETAR project. Classification of the GETAR data would allow users to ...
    • CS5604 Fall 2017 Clustering and Topic Analysis 

      Baghudana, Ashish; Ahuja, Aman; Bellam, Pavan; Chintha, Rammohan; Sambaturu, Pratyush; Malpani, Ashish; Shetty, Shruti; Yang, Mo (Virginia Tech, 2018-01-13)
      One of the key objectives of the CS-5604 course titled Information Storage and Retrieval is to build a pipeline for a state-of-the-art retrieval system for the Integrated Digital Event Archiving and Library (IDEAL) and ...
    • CS5604 Front-End User Interface Team 

      Masiane, Moeti; Warren, Lawrence (2016-05-03)
      This project is part of a wider research project whose focus is developing an information retrieval and analysis system in support of the IDEAL (Integrated Digital Event Archiving and Library) project. The search engine ...
    • CS5604 Information Storage and Retrieval Fall 2017 Solr Report 

      Kumar, Abhinav; Bangad, Anand; Robertson, Jeff; Garg, Mohit; Ramesh, Shreyas; Mi, Siyu; Wang, Xinyue; Wang, Yu (Virginia Tech, 2018-01-15)
      The Digital Library Research Laboratory (DLRL) has collected over 1.5 billion tweets and millions of webpages for the Integrated Digital Event Archiving and Library (IDEAL) and Global Event Trend Archive Research (GETAR) ...
    • CS5604: Clustering and Social Networks for IDEAL 

      Vishwasrao, Saket; Thorve, Swapna; Tang, Lijie (2016-05-03)
      The Integrated Digital Event Archiving and Library (IDEAL) project of Virginia Tech provides services for searching, browsing, analysis, and visualization of over 1 billion tweets and over 65 million webpages. The project ...
    • CS5604: Information and Storage Retrieval ​Fall 2016 - CMT (Collection Management Tweets) 

      Wagner, Mitchell J.; Abidi, Faiz; Fan, Shuangfei (Virginia Tech, 2016-12-08)
      As the Collection Management Tweets team in the Fall 2016 CS5604 class, we were responsible for processing >1.2 billion tweets, including data transfer, noise reduction, tweet augmentation, and storage via several technologies. ...
    • CS5604: Information and Storage Retrieval ​Fall 2017 - FE (Front-End Team)  

      Chon, Jieun; Wang, Haitao; Bian, Yali; Niu, Shuo (Virginia Tech, 2017-12-24)
      Social media and Web data are becoming important sources of information for researchers to monitor and study global events. GETAR, led by Dr. Edward Fox, is a project aiming to collect, organize, browse, visualize, ...
    • Document Clustering for IDEAL 

      Thumma, Sujit Reddy; Kalidas, Rubasri; Torkey, Hanaa (2015-05-13)
      Document clustering is an unsupervised classification of text documents into groups (clusters). The documents with similar properties are grouped together into one cluster. Documents which have dissimilar patterns are ...
    • Focused Crawling 

      Farag, Mohamed Magdy Gharib; Khan, Mohammed Saquib Akmal; Mishra, Gaurav; Ganesh, Prasad Krishnamurthi (2012-12-11)
      Finding information on WWW is difficult and challenging task because of the extremely large volume of the WWW. Search engine can be used to facilitate this task, but it is still difficult to cover all the webpages on the ...
    • Hadoop Project for IDEAL in CS5604 

      Cadena, Jose; Chen, Mengsu; Wen, Chengyuan (Virginia Tech, 2015-05-11)
      The Integrated Digital Event Archive and Library (IDEAL) system addresses the need for combining the best of digital library and archive technologies in support of stakeholders who are remembering and/or studying important ...
    • Large Scale Network Visualization with Gephi 

      Alam, Maksudul; Arifuzzaman, S M; Bhuiyan, Md Hasanuzzaman (2012-12-11)
      The notion of graphs or networks is sufficiently pervasive since it can be used to model various types of data sources. Social, biological, and other networks capture the underlying structural and relational properties. ...
    • LDA Team Project in CS5604, Spring 2015: Extracting Topics from Tweets and Webpages for IDEAL 

      Pumma, Sarunya; Liu, Xiaoyang (2015-05-10)
      IDEAL or Integrated Digital Event Archiving and Library is a project of Virginia Tech to implement a state-of-the-art event-based information retrieval system. A practice project of CS 5604 Information Retrieval is a part ...
    • Leveraging eXist-db for Efficient TEI Document Management 

      Schutt, Kyle; Morgan, Kyle (2012-12-10)
      Professor David Radcliffe has created Lord Byron and his Times (LBT), a large digital archive of works surrounding Lord Byron and his contemporaries. The original website was unusable slow due to the expensive XSLT ...
    • Named Entity Recognition for IDEAL 

      Du, Qianzhou; Zhang, Xuan (2015-05-10)
      The term “Named Entity”, which was first introduced by Grishman and Sundheim, is widely used in Natural Language Processing (NLP). The researchers were focusing on the information extraction task, that is extracting ...
    • ProjOpenDSA - OpenDSA Log Support 

      Wei, Shiyi; Suwardiman, Victoria; Swaminathan, Anand (2012-12-11)
      The OpenDSA project is an online eTextbook project that includes not only literature but other dynamic content to be used in Data Structures and Algorithms courses. OpenDSA contains exercises of various types to go along ...
    • Reducing Noise for IDEAL 

      Wang, Xiangwen; Chandrasekar, Prashant (2015-05-12)
      The corpora for which we are building an information retrieval system consists of tweets and web pages (extracted from URL links that might be included in the tweets) that have been selected based on rudimentary string ...