Solr Project with IDEAL, in CS5604 (Information Storage and Retrieval)

dc.contributor.authorXia, Longen
dc.contributor.authorJiang, Tingtingen
dc.contributor.authorGalad, Andrejen
dc.contributor.authorMaharshi, Shivamen
dc.date.accessioned2016-05-07T13:24:33Zen
dc.date.available2016-05-07T13:24:33Zen
dc.date.issued2016-05-04en
dc.descriptionThis submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations as well as the solutions (configuration files & Java code) developed.en
dc.description.abstractThis submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations, as well as the solutions (configuration files & Java code) developed. The main responsibility of our team was to configure Near Real Time Indexing and implement Custom Ranking for tweets and web page collections. The idea behind NRT Indexing is to help perform incremental updates from an HBase table into the Solr index, thereby optimizing time utilized and compute resources. The main motivation behind the Custom Ranking solution is to improve system precision and recall by transforming user queries with the use of the metadata provided by the other teams. The implementation leverages these three techniques: Query Expansion, Psuedo Relevance Feedback and Query Boosting. Throughout the semester we closely collaborated with several other teams both in getting requirements and the input data.en
dc.description.sponsorshipNSF grant IIS - 1319578, III: Small: Integrated Digital Event Archiving and Library (IDEAL)en
dc.identifier.urihttp://hdl.handle.net/10919/70928en
dc.language.isoen_USen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectIDEALen
dc.subjectSolren
dc.subjectLuceneen
dc.subjectCustom Rankingen
dc.subjectQuery Expansionen
dc.subjectNear Real Time Indexingen
dc.subjectBatch-Indexingen
dc.subjectMorphlineen
dc.subjectLily Indexeren
dc.subjectCloudera Searchen
dc.subjectPseudo relevance feedbacken
dc.titleSolr Project with IDEAL, in CS5604 (Information Storage and Retrieval)en
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 5
Name:
Solr_code.zip
Size:
7.56 MB
Format:
Description:
Solr team Software Code package including Solr schema (Schema.xml), Morphline configuration (Morphlines.conf), batch indexing script (batch_indexing.sh), Lily indexer script (add-indexer.sh), and Java code.
Name:
Solr_Final_Report.docx
Size:
7.83 MB
Format:
Microsoft Word XML
Description:
Solr team final report in Word version
Loading...
Thumbnail Image
Name:
Solr_Final_Report.pdf
Size:
7.38 MB
Format:
Adobe Portable Document Format
Description:
Solr team final report in PDF version
Name:
Solr_Presentation.pptx
Size:
2.99 MB
Format:
Microsoft Powerpoint XML
Description:
Solr team final presentation in PowerPoint version
Loading...
Thumbnail Image
Name:
Solr_Presentation.pdf
Size:
2.12 MB
Format:
Adobe Portable Document Format
Description:
Solr team final presentation in PDF version
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: