The Cauchy-Net Mixture Model for Clustering with Anomalous Data
dc.contributor.author | Slifko, Matthew D. | en |
dc.contributor.committeechair | Leman, Scotland C. | en |
dc.contributor.committeemember | Bieri, David Stephan | en |
dc.contributor.committeemember | Smith, Eric P. | en |
dc.contributor.committeemember | Ranganathan, Shyam | en |
dc.contributor.department | Statistics | en |
dc.date.accessioned | 2019-09-12T13:55:32Z | en |
dc.date.available | 2019-09-12T13:55:32Z | en |
dc.date.issued | 2019-09-11 | en |
dc.description.abstract | We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible Bayesian nonparametric tool that employs a mixture between a Dirichlet Process Mixture Model (DPMM) and a Cauchy distributed component, which we call the Cauchy-Net (CN). Each portion of the model offers benefits, as the DPMM eliminates the limitation of requiring a fixed number of a components and the CN captures observations that do not belong to the well-defined components by leveraging its heavy tails. Through isolating the anomalous observations in a single component, we simultaneously identify the observations in the net as warranting further inspection and prevent them from interfering with the formation of the remaining components. The result is a framework that allows for simultaneously clustering observations and making predictions in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia. | en |
dc.description.abstractgeneral | We live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible tool for identifying and isolating the anomalies, while simultaneously discovering cluster structure and making predictions among the nonanomalous observations. The result is a framework that allows for simultaneously clustering and predicting in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia. | en |
dc.description.degree | Doctor of Philosophy | en |
dc.format.medium | ETD | en |
dc.identifier.other | vt_gsexam:22011 | en |
dc.identifier.uri | http://hdl.handle.net/10919/93576 | en |
dc.publisher | Virginia Tech | en |
dc.rights | In Copyright | en |
dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
dc.subject | Bayesian Nonparametrics | en |
dc.subject | Dirichlet Process Mixture Model | en |
dc.subject | Clustering | en |
dc.subject | Anomaly Detection | en |
dc.title | The Cauchy-Net Mixture Model for Clustering with Anomalous Data | en |
dc.type | Dissertation | en |
thesis.degree.discipline | Statistics | en |
thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
thesis.degree.level | doctoral | en |
thesis.degree.name | Doctor of Philosophy | en |
Files
Original bundle
1 - 1 of 1