Parallel Inverted Indices for Large-Scale, Dynamic Digital Libraries

dc.contributor.authorSornil, Ohmen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberEdwards, Stephen H.en
dc.contributor.committeememberRamakrishnan, Narenen
dc.contributor.committeememberKoelling, C. Patricken
dc.contributor.committeememberVaradarajan, Srinidhien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:07:11Zen
dc.date.adate2001-02-09en
dc.date.available2014-03-14T20:07:11Zen
dc.date.issued2001-01-25en
dc.date.rdate2002-02-09en
dc.date.sdate2001-02-06en
dc.description.abstractThe dramatic increase in the amount of content available in digital forms gives rise to large-scale digital libraries, targeted to support millions of users and terabytes of data. Retrieving information from a system of this scale in an efficient manner is a challenging task due to the size of the collection as well as the index. This research deals with the design and implementation of an inverted index that supports searching for information in a large-scale digital library, implemented atop a massively parallel storage system. Inverted index partitioning is studied in a simulation environment, aiming at a terabyte of text. As a result, a high performance partitioning scheme is proposed. It combines the best qualities of the term and document partitioning approaches in a new Hybrid Partitioning Scheme. Simulation experiments show that this organization provides good performance over a wide range of conditions. Further, the issues of creation and incremental updates of the index are considered. A disk-based inversion algorithm and an extensible inverted index architecture are described, and experimental results with actual collections are presented. Finally, distributed algorithms to create a parallel inverted index partitioned according to the hybrid scheme are proposed, and performance is measured on a portion of the equipment that normally makes up the 100 node Virginia Tech PetaPlex™ system. NOTE: (02/2007) An updated copy of this ETD was added after there were patron reports of problems with the file.en
dc.description.degreePh. D.en
dc.identifier.otheretd-02062001-114915en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-02062001-114915/en
dc.identifier.urihttp://hdl.handle.net/10919/26131en
dc.publisherVirginia Techen
dc.relation.haspartdissertation_printTo7.pdfen
dc.relation.haspartdissertation.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectSimulationen
dc.subjectincremental updateen
dc.subjectinformation retrievalen
dc.subjectparallel inverted indexen
dc.subjecthybrid partitioningen
dc.subjectPerformanceen
dc.subjectdigital libraryen
dc.subjectterabyte text collectionen
dc.titleParallel Inverted Indices for Large-Scale, Dynamic Digital Librariesen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
dissertation_printTo7.pdf
Size:
1.08 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
dissertation.pdf
Size:
1.02 MB
Format:
Adobe Portable Document Format