Effective, Efficient Retrieval in a Network of Digital Information Objects

dc.contributor.authorFrance, Robert Karlen
dc.contributor.committeechairFox, Edward A.en
dc.contributor.committeememberHitchingham, Eileenen
dc.contributor.committeememberRamakrishnan, Narenen
dc.contributor.committeememberHeath, Lenwood S.en
dc.contributor.committeememberKafura, Dennis G.en
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:19:15Zen
dc.date.adate2001-11-27en
dc.date.available2014-03-14T20:19:15Zen
dc.date.issued2001-11-26en
dc.date.rdate2002-11-27en
dc.date.sdate2001-11-27en
dc.description.abstractAlthough different authors mean different thing by the term "digital libraries," one common thread is that they include or are built around collections of digital objects. Digital libraries also provide services to large communities, one of which is almost always search. Digital library collections, however, have several characteristic features that make search difficult. They are typically very large. They typically involve many different kinds of objects, including but not limited to books, e-published documents, images, and hypertexts, and often including items as esoteric as subtitled videos, simulations, and entire scientific databases. Even within a category, these objects may have widely different formats and internal structure. Furthermore, they are typically in complex relationships with each other and with such non-library objects as persons, institutions, and events. Relationships are a common feature of traditional libraries in the form of "See / See also" pointers, hierarchical relationships among categories, and relations between bibliographic and non-bibliographic objects such as having an author or being on a subject. Binary relations (typically in the form of directed links) are a common representational tool in computer science for structures from trees and graphs to semantic networks. And in recent years the World-Wide Web has made the construct of linked information objects commonplace for millions. Despite this, relationships have rarely been given "first-class" treatment in digital library collections or software. MARIAN is a digital library system designed and built to store, search over, and retrieve large numbers of diverse objects in a network of relationships. It is designed to run efficiently over large collections of digital library objects. It addresses the problem of object diversity through a system of classes unified by common abilities including searching and presentation. Divergent internal structure is exposed and interpreted using a simple and powerful graphical representation, and varied format through a unified system of presentation. Most importantly, MARIAN collections are designed to specifically include relations in the form of an extensible collection of different sorts of links. This thesis presents MARIAN and argues that it is both effective and efficient. MARIAN is effective in that it provides new and useful functionality to digital library end-users, and in that it makes constructing, modifying, and combining collections easy for library builders and maintainers. MARIAN is efficient since it works from an abstract presentation of search over networked collections to define on the one hand common operations required to implement a broad class of search engines, and on the other performance standards for those operations. Although some operations involve a high minimum cost under the most general assumptions, lower costs can be achieved when additional constraints are present. In particular, it is argued that the statistics of digital library collections can be exploited to obtain significant savings. MARIAN is designed to do exactly that, and in evidence from early versions appears to succeed. In conclusion, MARIAN presents a powerful and flexible platform for retrieval on large, diverse collections of networked information, significantly extending the representation and search capabilities of digital libraries.en
dc.description.degreePh. D.en
dc.identifier.otheretd-11272001-124212en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-11272001-124212/en
dc.identifier.urihttp://hdl.handle.net/10919/29754en
dc.publisherVirginia Techen
dc.relation.haspartRKFetd.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectweighted setsen
dc.subjectclass manageren
dc.subjectinformation retrievalen
dc.subjectinformation networken
dc.subjectindexingen
dc.subjectdigital libraryen
dc.subjectMARIANen
dc.subjectstopping rulesen
dc.subjectsearchingen
dc.subjectNDLTDen
dc.titleEffective, Efficient Retrieval in a Network of Digital Information Objectsen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
RKFetd.pdf
Size:
2.58 MB
Format:
Adobe Portable Document Format