Solr Project with IDEAL, in CS5604 (Information Storage and Retrieval)
dc.contributor.author | Xia, Long | en |
dc.contributor.author | Jiang, Tingting | en |
dc.contributor.author | Galad, Andrej | en |
dc.contributor.author | Maharshi, Shivam | en |
dc.date.accessioned | 2016-05-07T13:24:33Z | en |
dc.date.available | 2016-05-07T13:24:33Z | en |
dc.date.issued | 2016-05-04 | en |
dc.description | This submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations as well as the solutions (configuration files & Java code) developed. | en |
dc.description.abstract | This submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations, as well as the solutions (configuration files & Java code) developed. The main responsibility of our team was to configure Near Real Time Indexing and implement Custom Ranking for tweets and web page collections. The idea behind NRT Indexing is to help perform incremental updates from an HBase table into the Solr index, thereby optimizing time utilized and compute resources. The main motivation behind the Custom Ranking solution is to improve system precision and recall by transforming user queries with the use of the metadata provided by the other teams. The implementation leverages these three techniques: Query Expansion, Psuedo Relevance Feedback and Query Boosting. Throughout the semester we closely collaborated with several other teams both in getting requirements and the input data. | en |
dc.description.sponsorship | NSF grant IIS - 1319578, III: Small: Integrated Digital Event Archiving and Library (IDEAL) | en |
dc.identifier.uri | http://hdl.handle.net/10919/70928 | en |
dc.language.iso | en_US | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | IDEAL | en |
dc.subject | Solr | en |
dc.subject | Lucene | en |
dc.subject | Custom Ranking | en |
dc.subject | Query Expansion | en |
dc.subject | Near Real Time Indexing | en |
dc.subject | Batch-Indexing | en |
dc.subject | Morphline | en |
dc.subject | Lily Indexer | en |
dc.subject | Cloudera Search | en |
dc.subject | Pseudo relevance feedback | en |
dc.title | Solr Project with IDEAL, in CS5604 (Information Storage and Retrieval) | en |
dc.type | Presentation | en |
dc.type | Software | en |
dc.type | Technical report | en |
Files
Original bundle
1 - 5 of 5
- Name:
- Solr_code.zip
- Size:
- 7.56 MB
- Format:
- Description:
- Solr team Software Code package including Solr schema (Schema.xml), Morphline configuration (Morphlines.conf), batch indexing script (batch_indexing.sh), Lily indexer script (add-indexer.sh), and Java code.
- Name:
- Solr_Final_Report.docx
- Size:
- 7.83 MB
- Format:
- Microsoft Word XML
- Description:
- Solr team final report in Word version
Loading...
- Name:
- Solr_Final_Report.pdf
- Size:
- 7.38 MB
- Format:
- Adobe Portable Document Format
- Description:
- Solr team final report in PDF version
- Name:
- Solr_Presentation.pptx
- Size:
- 2.99 MB
- Format:
- Microsoft Powerpoint XML
- Description:
- Solr team final presentation in PowerPoint version
Loading...
- Name:
- Solr_Presentation.pdf
- Size:
- 2.12 MB
- Format:
- Adobe Portable Document Format
- Description:
- Solr team final presentation in PDF version
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: