The Cauchy-Net Mixture Model for Clustering with Anomalous Data

dc.contributor.authorSlifko, Matthew D.en
dc.contributor.committeechairLeman, Scotland C.en
dc.contributor.committeememberBieri, David Stephanen
dc.contributor.committeememberSmith, Eric P.en
dc.contributor.committeememberRanganathan, Shyamen
dc.contributor.departmentStatisticsen
dc.date.accessioned2019-09-12T13:55:32Zen
dc.date.available2019-09-12T13:55:32Zen
dc.date.issued2019-09-11en
dc.description.abstractWe live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible Bayesian nonparametric tool that employs a mixture between a Dirichlet Process Mixture Model (DPMM) and a Cauchy distributed component, which we call the Cauchy-Net (CN). Each portion of the model offers benefits, as the DPMM eliminates the limitation of requiring a fixed number of a components and the CN captures observations that do not belong to the well-defined components by leveraging its heavy tails. Through isolating the anomalous observations in a single component, we simultaneously identify the observations in the net as warranting further inspection and prevent them from interfering with the formation of the remaining components. The result is a framework that allows for simultaneously clustering observations and making predictions in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia.en
dc.description.abstractgeneralWe live in the data explosion era. The unprecedented amount of data offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. Mistakes stemming from anomalous data have the potential for severe, real-world consequences, such as when building prediction models for housing prices. To combat anomalies, we develop the Cauchy-Net Mixture Model (CNMM). The CNMM is a flexible tool for identifying and isolating the anomalies, while simultaneously discovering cluster structure and making predictions among the nonanomalous observations. The result is a framework that allows for simultaneously clustering and predicting in the face of the anomalous data. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia.en
dc.description.degreeDoctor of Philosophyen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:22011en
dc.identifier.urihttp://hdl.handle.net/10919/93576en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectBayesian Nonparametricsen
dc.subjectDirichlet Process Mixture Modelen
dc.subjectClusteringen
dc.subjectAnomaly Detectionen
dc.titleThe Cauchy-Net Mixture Model for Clustering with Anomalous Dataen
dc.typeDissertationen
thesis.degree.disciplineStatisticsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.nameDoctor of Philosophyen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Slifko_MD_D_2019.pdf
Size:
1.95 MB
Format:
Adobe Portable Document Format