Scalability of Stepping Stones and Pathways

dc.contributor.authorVenkatachalam, Logambigaien
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberFan, Weiguo Patricken
dc.contributor.committeememberKholief, Mohameden
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:35:31Zen
dc.date.adate2008-05-30en
dc.date.available2014-03-14T20:35:31Zen
dc.date.issued2008-04-21en
dc.date.rdate2008-05-30en
dc.date.sdate2008-05-07en
dc.description.abstractInformation Retrieval (IR) plays a key role in serving large communities of users who are in need of relevant answers for their search queries. IR encompasses various search models to address different requirements and has introduced a variety of supporting tools to improve effectiveness and efficiency. "Search" is the key focus of IR. The classic search methodology takes an input query, processes it, and returns the result as a ranked list of documents. However, this approach is not the most effective method to support the task of finding document associations (relationships between concepts or queries) both for direct or indirect relationships. The Stepping Stones and Pathways (SSP) retrieval methodology supports retrieval of ranked chains of documents that support valid relationships between any two given concepts. SSP has many potential practical and research applications, which are in need of a tool to find connections between two concepts. The early SSP "proof-of-concept" implementation could handle only 6000 documents. However, commercial search applications will have to deal with millions of documents. Hence, addressing the scalability limitation becomes extremely important in the current SSP implementation in order to overcome the limitations on handling large datasets. Research on various commercial search applications and their scalability indicates that the Lucene search tool kit is widely used due to its support for scalability, performance, and extensibility features. Many web-based and desktop applications have used this search tool kit to great success, including Wikipedia search, job search sites, digital libraries, e-commerce sites, and the Eclipse Integrated Development Environment (IDE). The goal of this research is to re-implement SSP in a scalable way, so that it can work for larger datasets and also can be deployed commercially. This work explains the approach adopted for re-implementation focusing on scalable indexing, searching components, new ways to process citations (references), a new approach for query expansion, document clustering, and document similarity calculation. The experiments performed to test the factors such as runtime and storage proved that the system can be scaled up to handle up to millions of documents.en
dc.description.degreeMaster of Scienceen
dc.identifier.otheretd-05072008-225923en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-05072008-225923/en
dc.identifier.urihttp://hdl.handle.net/10919/32326en
dc.publisherVirginia Techen
dc.relation.haspartSSPScalabilityThesis.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectScalabilityen
dc.subjectLuceneen
dc.subjectCiteSeeren
dc.subjectConnection finding search frameworken
dc.titleScalability of Stepping Stones and Pathwaysen
dc.typeThesisen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
SSPScalabilityThesis.pdf
Size:
1.34 MB
Format:
Adobe Portable Document Format

Collections