Multiset Model Selection and Averaging, and Interactive Storytelling
MetadataShow full item record
The Multiset Sampler [Leman et al., 2009] has previously been deployed and developed for efficient sampling from complex stochastic processes. We extend the sampler and the surrounding theory to model selection problems. In such problems efficient exploration of the model space becomes a challenge since independent and ad-hoc proposals might not be able to jointly propose multiple parameter sets which correctly explain a new pro- posed model. In order to overcome this we propose a multiset on the model space to en- able efficient exploration of multiple model modes with almost no tuning. The Multiset Model Selection (MSMS) framework is based on independent priors for the parameters and model indicators on variables. We show that posterior model probabilities can be easily obtained from multiset averaged posterior model probabilities in MSMS. We also obtain typical Bayesian model averaged estimates for the parameters from MSMS. We apply our algorithm to linear regression where it allows easy moves between parame- ter modes of different models, and in probit regression where it allows jumps between widely varying model specific covariance structures in the latent space of a hierarchical model. The Storytelling algorithm [Kumar et al., 2006] constructs stories by discovering and con- necting latent connections between documents in a network. Such automated algorithms often do not agree with userâ s mental map of the data. Hence systems that incorporate feedback through visual interaction from the user are of immediate importance. We pro- pose a visual analytic framework in which such interactions are naturally incorporated in to the existing Storytelling algorithm through a redefinition of the latent topic space used in the similarity measure of the network. The document network can be explored us- ing the newly learned normalized topic weights for each document. Hence our algorithm augments the limitations of human sensemaking capabilities in large document networks by providing a collaborative framework between the underlying model and the user. Our formulation of the problem is a supervised topic modeling problem where the supervi- sion is based on relationships imposed by the user as a set of inequalities derived from tolerances on edge costs from inverse shortest path problem. We show a probabilistic modeling of the relationships based on auxiliary variables and propose a Gibbs sampling based strategy. We provide detailed results from a simulated data and the Atlantic Storm data set.
- Doctoral Dissertations