Linkage Based Dirichlet Processes

TR Number
Journal Title
Journal ISSN
Volume Title
Virginia Tech

We live in the era of textit{Big Data} with significantly richer computational resources than the last two decades. The concurrence of computation resources and a large volume of data has boosted researchers' desire for developing feasible Markov Chain Monte Carlo (MCMC) algorithms for large parameter spaces. Dirichlet Process Mixture Models (DPMMs) have become a Bayesian mainstay for modeling heterogeneous structures, namely clusters, especially when the quantity of clusters is not known with the established MCMC methods. As opposed to many ad-hoc clustering methods, using Dirichlet Processes (DPs) in models provide a flexible and probabilistic approach for automatically estimating both cluster structure and quantity. While DPs are not fully parameterized, they depend on both a base measure and a concentration parameter that can heavily impact inferences.

Determining the concentration parameter is critical and essential, since it adjusts the a-priori cluster expectation, but typical approaches for specifying this parameter are rather cavalier. In this work, we propose a new method for automatically and adaptively determining this parameter, which directly calibrates distances between clusters through an explicit link function within the DP. Furthermore, we extend our method to mixture models with Nested Dirichlet Processes (NDPs) that cluster the multilevel data and depend on the specification of a vector of concentration parameters. In this work, we detail how to incorporate our method in Markov chain Monte Carlo algorithms, and illustrate our findings through a series of comparative simulation studies and applications.

concentration parameter, Dirichlet processes, nested Dirichlet processes