Efficient Algorithms for Mining Data Streams

dc.contributor.authorBoedihardjo, Arnold Prigunaen
dc.contributor.committeechairLu, Chang-Tienen
dc.contributor.committeememberRamakrishnan, Narenen
dc.contributor.committeememberChen, Ing-Rayen
dc.contributor.committeememberFan, Weiguo Patricken
dc.contributor.committeememberLiang, Yaoen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-03-14T20:15:19Zen
dc.date.adate2010-09-06en
dc.date.available2014-03-14T20:15:19Zen
dc.date.issued2010-08-10en
dc.date.rdate2012-05-08en
dc.date.sdate2010-08-16en
dc.description.abstractData streams are ordered sets of values that are fast, continuous, mutable, and potentially unbounded. Examples of data streams include the pervasive time series which span domains such as finance, medicine, and transportation. Mining data streams require approaches that are efficient, adaptive, and scalable. For several stream mining tasks, knowledge of the data's probability density function (PDF) is essential to deriving usable results. Providing an accurate model for the PDF benefits a variety of stream mining applications and its successful development can have far-reaching impact to the general discipline of stream analysis. Therefore, this research focuses on the construction of efficient and effective approaches for estimating the PDF of data streams. In this work, kernel density estimators (KDEs) are developed that satisfy the stringent computational stipulations of data streams, model unknown and dynamic distributions, and enhance the estimation quality of complex structures. Contributions of this work include: (1) theoretical development of the local region based KDE; (2) construction of a local region based estimation algorithm; (3) design of a generalized local region approach that can be applied to any global bandwidth KDE to enhance estimation accuracy; and (4) application extension of the local region based KDE to multi-scale outlier detection. Theoretical development includes the formulation of the local region concept to effectively approximate the computationally intensive adaptive KDE. This work also analyzes key theoretical properties of the local region based approach which include (amongst others) its expected performance, an alternative local region construction criterion, and its robustness under evolving distributions. Algorithmic design includes the development of a specific estimation technique that reduces the time/space complexities of the adaptive KDE. In order to accelerate mining tasks such as outlier detection, an integrated set of optimizations are proposed for estimating multiple density queries. Additionally, the local region concept is extended to an efficient algorithmic framework which can be applied to any global bandwidth KDEs. The combined solution can significantly improve estimation accuracy while retaining overall linear time/space costs. As an application extension, an outlier detection framework is designed which can effectively detect outliers within multiple data scale representations.en
dc.description.degreePh. D.en
dc.identifier.otheretd-08162010-212505en
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-08162010-212505/en
dc.identifier.urihttp://hdl.handle.net/10919/28686en
dc.publisherVirginia Techen
dc.relation.haspartBoedihardjo_AP_D_2010.pdfen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectData Miningen
dc.subjectMachine learningen
dc.subjectKernel Density Estimationen
dc.subjectOutlier Detectionen
dc.subjectData Streamen
dc.titleEfficient Algorithms for Mining Data Streamsen
dc.typeDissertationen
thesis.degree.disciplineComputer Scienceen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Boedihardjo_AP_D_2010.pdf
Size:
2.83 MB
Format:
Adobe Portable Document Format