Issues of Real Time Information Retrieval in Large, Dynamic and Heterogeneous Search Spaces
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Increasing size and prevalence of real time information have become important characteristics of databases found on the internet. Due to changing information, the relevancy ranking of the search results also changes. Current methods in information retrieval, which are based on offline indexing, are not efficient in such dynamic search spaces and cannot quickly provide the most current results. Due to the explosive growth of the internet, stove-piped approaches for dealing with dynamism by simply employing large computational resources are ultimately not scalable. A new processing methodology that incorporates intelligent resource allocation strategies is required. Also, modeling the dynamism in the search space in real time is essential for effective resource allocation. In order to support multi-grained dynamic resource allocation, we propose to use a partial processing approach that uses anytime algorithms to process the documents in multiple steps. At each successive step, a more accurate approximation of the final similarity values of the documents is produced. Resource allocation algorithm use these partial results to select documents for processing, decide on the number of processing steps and the computation time allocated for each step. We validate the processing paradigm by demonstrating its viability with image documents. We design an anytime image algorithm that uses a combination of wavelet transforms and machine learning techniques to map low level visual features to higher level concepts. Experimental validation is done by implementing the image algorithm within an established multiagent information retrieval framework called I-FGM. We also formulate a multiagent resource allocation framework for design and performance analysis of resource allocation with partial processing. A key aspect of the framework is modeling changes in the search space as external and internal dynamism using a grid-based search space model. The search space model divides the documents or candidates into groups based on its partial-value and portion processed. Hence the changes in the search space can be effectively represented in the search space model as flow of agents and candidates between the grids. Using comparative experimental studies and detailed statistical analysis we validate the search space model and demonstrate the effectiveness of the resource allocation framework.