Show simple item record

dc.contributor.authorVenkatachalam, Logambigaien_US
dc.date.accessioned2014-03-14T20:35:31Z
dc.date.available2014-03-14T20:35:31Z
dc.date.issued2008-04-21en_US
dc.identifier.otheretd-05072008-225923en_US
dc.identifier.urihttp://hdl.handle.net/10919/32326
dc.description.abstractInformation Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. â Searchâ is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP â proof-of-conceptâ implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents.en_US
dc.publisherVirginia Techen_US
dc.relation.haspartSSPScalabilityThesis.pdfen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectScalabilityen_US
dc.subjectLuceneen_US
dc.subjectCiteSeeren_US
dc.subjectConnection finding search frameworken_US
dc.titleScalability of Stepping Stones and Pathwaysen_US
dc.typeThesisen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreeMaster of Scienceen_US
thesis.degree.nameMaster of Scienceen_US
thesis.degree.levelmastersen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Scienceen_US
dc.contributor.committeechairFox, Edward Alanen_US
dc.contributor.committeememberFan, Weiguo Patricken_US
dc.contributor.committeememberKholief, Mohameden_US
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-05072008-225923/en_US
dc.date.sdate2008-05-07en_US
dc.date.rdate2008-05-30
dc.date.adate2008-05-30en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record