Iterative Computing over a Unified Relationship Matrix for Information Integration

dc.contributor.authorXi, Wensien
dc.contributor.committeechairFan, Weiguo Patricken
dc.contributor.committeememberRamakrishnan, Narenen
dc.contributor.committeememberPonte, Jay M.en
dc.contributor.committeememberLu, Chang-Tienen
dc.contributor.committeememberSandu, Adrianen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:13:36Zen
dc.date.adate2006-09-06en
dc.date.available2014-03-14T20:13:36Zen
dc.date.issued2006-06-20en
dc.date.rdate2009-09-06en
dc.date.sdate2006-06-29en
dc.description.abstractIn this dissertation I use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects and their inter-relationships. I argue that integrated and iterative computations over the Unified Relationship Matrix can help overcome the data sparseness problem (a common situation in various information application scenarios), and detect latent relationships (such as latent term associations discovered by LSI) among heterogeneous data objects. Thus, this kind of computation can be used to improve the quality of various information applications that require combining information from heterogeneous data sources. To support the argument, I further develop a unified link analysis algorithm, the Link Fusion algorithm, and a unified similarity-calculating algorithm, the SimFusion algorithm. Both algorithms attempt to better integrate information from heterogeneous sources by iteratively computing over the Unified Relationship Matrix in order to calculate some specific property of data object(s); such as the importance of a data object (as in the Link Fusion algorithm) and the similarity between a pair of data objects (as in the SimFusion algorithm). Then, I develop two set of experiments on real-world datasets to investigate whether the algorithms proposed in this dissertation can better integrate information from multiple sources. The performance of the algorithms is compared to that of traditional link analysis and similarity-calculating algorithms. Experimental results show that the algorithms developed can significantly outperform the traditional link analysis and similarity-calculating algorithms. I further investigate various pruning technologies aiming at improving efficiency and investigating the scalability of the algorithms designed. Experimental results showed that pruning technology can effectively be used to improve the efficiency of the algorithms.en
dc.description.degreePh. D.en
dc.identifier.otheretd-06292006-202356en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-06292006-202356/en
dc.identifier.urihttp://hdl.handle.net/10919/28158en
dc.publisherVirginia Techen
dc.relation.haspartDissertationWensiXi.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectUnified Relationship Matrixen
dc.subjectInformation Retrievalen
dc.subjectInformation Integrationen
dc.subjectSimFusionen
dc.subjectSearchen
dc.subjectLink Fusionen
dc.titleIterative Computing over a Unified Relationship Matrix for Information Integrationen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
DissertationWensiXi.pdf
Size:
1.06 MB
Format:
Adobe Portable Document Format