Show simple item record

dc.contributor.authorXi, Wensien_US

In this dissertation I use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects and their inter-relationships. I argue that integrated and iterative computations over the Unified Relationship Matrix can help overcome the data sparseness problem (a common situation in various information application scenarios), and detect latent relationships (such as latent term associations discovered by LSI) among heterogeneous data objects. Thus, this kind of computation can be used to improve the quality of various information applications that require combining information from heterogeneous data sources.

To support the argument, I further develop a unified link analysis algorithm, the Link Fusion algorithm, and a unified similarity-calculating algorithm, the SimFusion algorithm. Both algorithms attempt to better integrate information from heterogeneous sources by iteratively computing over the Unified Relationship Matrix in order to calculate some specific property of data object(s); such as the importance of a data object (as in the Link Fusion algorithm) and the similarity between a pair of data objects (as in the SimFusion algorithm).

Then, I develop two set of experiments on real-world datasets to investigate whether the algorithms proposed in this dissertation can better integrate information from multiple sources. The performance of the algorithms is compared to that of traditional link analysis and similarity-calculating algorithms. Experimental results show that the algorithms developed can significantly outperform the traditional link analysis and similarity-calculating algorithms.

I further investigate various pruning technologies aiming at improving efficiency and investigating the scalability of the algorithms designed. Experimental results showed that pruning technology can effectively be used to improve the efficiency of the algorithms.

dc.publisherVirginia Techen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectUnified Relationship Matrixen_US
dc.subjectInformation Retrievalen_US
dc.subjectInformation Integrationen_US
dc.subjectLink Fusionen_US
dc.titleIterative Computing over a Unified Relationship Matrix for Information Integrationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US D.en_US Polytechnic Institute and State Universityen_US Scienceen_US
dc.contributor.committeechairFan, Weiguo Patricken_US
dc.contributor.committeememberRamakrishnan, Narenen_US
dc.contributor.committeememberPonte, Jay M.en_US
dc.contributor.committeememberLu, Chang-Tienen_US
dc.contributor.committeememberSandu, Adrianen_US

Files in this item


This item appears in the following Collection(s)

Show simple item record