Capsule Networks: Framework and Application to Disentanglement for Generative Models

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Generative models are one of the most prominent components of unsupervised learning models that have a plethora of applications in various domains such as image-to-image translation, video prediction, and generating synthetic data where accessing real data is expensive, unethical, or compromising privacy. One of the main challenges in designing a generative model is creating a disentangled representation of generative factors which gives control over various characteristics of the generated data. Since the architecture of variational autoencoders is centered around latent variables and their objective function directly governs the generative factors, they are the perfect choice for creating a more disentangled representation. However, these architectures generate samples that are blurry and of lower quality compared to other state-of-the-art generative models such as generative adversarial networks. Thus, we attempt to increase the disentanglement of latent variables in variational autoencoders without compromising the generated image quality.

In this thesis, a novel generative model based on capsule networks and a variational autoencoder is proposed. Motivated by the concept of capsule neural networks and their vectorized output, these structures are employed to create a disentangled representation of latent features in variational autoencoders. In particular, the proposed structure, called CapsuleVAE, utilizes a capsule encoder whose vector outputs can translate to latent variables in a meaningful way. It is shown that CapsuleVAE generates results that are sharper and more diverse based on FID score and a metric inspired by the inception score. Furthermore, two different methods for training CapsuleVAE are proposed, and the generated results are investigated. In the first method, an objective function with regularization is proposed, and the optimal regularization hyperparameter is derived. In the second method, called sequential optimization, a novel training technique for training CapsuleVAE is proposed and the results are compared to the first method. Moreover, a novel metric for measuring disentanglement in latent variables is introduced. Based on this metric, it is shown that the proposed CapsuleVAE creates more disentangled representations. In summary, our proposed generative model enhances the disentanglement of latent variables which contributes to the model's generalizing well to new tasks and more control over the generated data. Our model also increases the generated image quality which addresses a common disadvantage in variational autoencoders.



Deep Learning, Generative models, Capsule Networks, Disentanglement