Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • Student Works
    • CS5604: Information Retrieval
    • View Item
    •   VTechWorks Home
    • Student Works
    • CS5604: Information Retrieval
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    CS5604 Fall 2016 Solr Team Project Report

    Thumbnail
    View/Open
    SOLR_Report.pdf (9.546Mb)
    Downloads: 1604
    SOLR_Report.docx (12.14Mb)
    Downloads: 568
    SOLR_Presentation.pdf (2.750Mb)
    Downloads: 593
    SOLR_Presentation.pptx (2.510Mb)
    Downloads: 100
    SOLR_Code.zip (9.096Kb)
    Downloads: 24
    Date
    2016-12-07
    Author
    Li, Liuqing
    Pillai, Anusha
    Wang, Ye
    Tian, Ke
    Metadata
    Show full item record
    Abstract
    This submission describes the work the SOLR team completed in Fall 2016. It includes the final report and presentation, as well as key relevant materials (indexing scripts & Java code). Based on the work in Spring 2016, the SOLR team improved the general search infrastructure supporting the IDEAL and GETAR projects, both funded by NSF. The main responsibility was to configure the Basic Indexing and Incremental Indexing (Near Real Time, NRT Indexing) for tweets and web page collections in DLRL's Hadoop Cluster. The goal of Basic Indexing was to index the big collection that contains more than 1.2 billion tweets. The idea of NRT Indexing was to monitor real-time changes in HBase and update the Solr results as appropriate. The main motivation behind the Custom Ranking was to design and implement a new scoring function to re-rank the retrieved results in Solr. Based on the text similarity, a basic document recommender was also created to retrieve the similar documents related to a specific one. Finally, new well written manuals could be easier for users and developers to read and get familiar with Solr and its relevant tools. Throughout the semester we closely collaborated with the Collection Management Tweets (CMT), Collection Management Webpages (CMW), Classification (CLA), Clustering and Topic Analysis (CTA), and Front-End (FE) teams in getting requirements, input data, and suggestions for data visualization.
    URI
    http://hdl.handle.net/10919/73710
    Collections
    • CS5604: Information Retrieval [51]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us