CS5604 Fall 2016 Solr Team Project Report

dc.contributor.authorLi, Liuqingen
dc.contributor.authorPillai, Anushaen
dc.contributor.authorWang, Yeen
dc.contributor.authorTian, Keen
dc.date.accessioned2016-12-17T19:50:16Zen
dc.date.available2016-12-17T19:50:16Zen
dc.date.issued2016-12-07en
dc.description.abstractThis submission describes the work the SOLR team completed in Fall 2016. It includes the final report and presentation, as well as key relevant materials (indexing scripts & Java code). Based on the work in Spring 2016, the SOLR team improved the general search infrastructure supporting the IDEAL and GETAR projects, both funded by NSF. The main responsibility was to configure the Basic Indexing and Incremental Indexing (Near Real Time, NRT Indexing) for tweets and web page collections in DLRL's Hadoop Cluster. The goal of Basic Indexing was to index the big collection that contains more than 1.2 billion tweets. The idea of NRT Indexing was to monitor real-time changes in HBase and update the Solr results as appropriate. The main motivation behind the Custom Ranking was to design and implement a new scoring function to re-rank the retrieved results in Solr. Based on the text similarity, a basic document recommender was also created to retrieve the similar documents related to a specific one. Finally, new well written manuals could be easier for users and developers to read and get familiar with Solr and its relevant tools. Throughout the semester we closely collaborated with the Collection Management Tweets (CMT), Collection Management Webpages (CMW), Classification (CLA), Clustering and Topic Analysis (CTA), and Front-End (FE) teams in getting requirements, input data, and suggestions for data visualization.en
dc.description.notesSOLR_Presentation.pptx -- SOLR team final presentation in PPTX format SOLR_Presentation.pdf -- SOLR team final presentation in PDF format SOLR_Report.docx -- SOLR team final report in DOCX format SOLR_Report.pdf -- SOLR team final report in PDF format SOLR_Code.zip -- SOLR team software code package (including indexing scripts and custom ranking source code)en
dc.description.sponsorshipNSF:IIS-1319578en
dc.description.sponsorshipNSF:IIS-1619028en
dc.identifier.urihttp://hdl.handle.net/10919/73710en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectSolren
dc.subjectClouderaen
dc.subjectHadoop Clusteren
dc.subjectIDEALen
dc.subjectGETARen
dc.subjectCustom Rankingen
dc.subjectIncremental Indexingen
dc.subjectRecommendationen
dc.titleCS5604 Fall 2016 Solr Team Project Reporten
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 5
Loading...
Thumbnail Image
Name:
SOLR_Report.pdf
Size:
9.55 MB
Format:
Adobe Portable Document Format
Name:
SOLR_Report.docx
Size:
12.14 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
SOLR_Presentation.pdf
Size:
2.75 MB
Format:
Adobe Portable Document Format
Name:
SOLR_Presentation.pptx
Size:
2.51 MB
Format:
Microsoft Powerpoint XML
Name:
SOLR_Code.zip
Size:
9.1 KB
Format:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: