Show simple item record

dc.contributor.authorGandra, Harshithaen
dc.date.accessioned2022-01-26T09:00:22Zen
dc.date.available2022-01-26T09:00:22Zen
dc.date.issued2022-01-25en
dc.identifier.othervt_gsexam:33779en
dc.identifier.urihttp://hdl.handle.net/10919/107925en
dc.description.abstractTime series anomaly detection can prove to be a very useful tool to inspect and maintain the health and quality of an infrastructure system. While tackling such a problem, the main concern lies in the imbalanced nature of the dataset. In order to mitigate this problem, this thesis proposes two unsupervised anomaly detection frameworks. The first one is an architecture which leverages the concept of matrix profile which essentially refers to a data structure containing the euclidean scores of the subsequences of two time series that is obtained through a similarity join.It is an architecture comprising of a data fusion technique coupled with using matrix profile analysis under the constraints of varied sampling rate for different time series. To this end, we have proposed a framework, through which a time series that is being evaluated for anomalies is quantitatively compared with a benchmark (anomaly-free) time series using the proposed asynchronous time series comparison that was inspired by matrix profile approach for anomaly detection on time series . In order to evaluate the efficacy of this framework, it was tested on a case study comprising of a Class I Rail road dataset. The data collection system integrated into this railway system collects data through different data acquisition channels which represent different transducers. This framework was applied to all the channels and the best performing channels were identified. The average Recall and Precision achieved on the single channel evaluation through this framework was 93.5% and 55% respectively with an error threshold of 0.04 miles or 211 feet. A limitation that was noticed in this framework was that there were some false positive predictions. In order to overcome this problem, a second framework has been proposed which incorporates the idea of extracting signature patterns in a time series also known as motifs which can be leveraged to identify anomalous patterns. This second framework proposed is a motif based framework which operates under the same constraints of a varied sampling rate. Here, a feature extraction method and a clustering method was used in the training process of a One Class Support Vector Machine (OCSVM) coupled with a Kernel Density Estimation (KDE) technique. The average Recall and Precision achieved on the same case study through this frame work was 74% and 57%. In comparison to the first, the second framework does not perform as well. There will be future efforts focused on improving this classification-based anomaly detection methoden
dc.format.mediumETDen
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectAnomaly Detectionen
dc.subjectAsynchronousen
dc.subjectUnsuperviseden
dc.subjectMatrix Profileen
dc.subjectOCSVMen
dc.titleAnomaly Detection for Smart Infrastructure: An Unsupervised Approach for Time Series Comparisonen
dc.typeThesisen
dc.contributor.departmentElectrical and Computer Engineeringen
dc.description.degreeMaster of Scienceen
thesis.degree.nameMaster of Scienceen
thesis.degree.levelmastersen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.disciplineComputer Engineeringen
dc.contributor.committeechairChantem, Thidapaten
dc.contributor.committeechairJazizadeh Karimi, Farrokhen
dc.contributor.committeememberJia, Ruoxien
dc.description.abstractgeneralTime series anomaly detection refers to the identification of any outliers or deviations present in a time series data. This technique could prove to be useful to mitigate any unplanned events by facilitating early maintenance. The first method proposed involves comparing an anomaly-free dataset with the time series of interest. The difference between these two time series are noted and the point with the highest difference will be considered to be an anomaly. The performance of this model was evaluated on a Rail road dataset and the cumuluative average Recall (how useful the predictions are) and average Precison (how accurate the predictions are) 93.5% and 55% respectively with an acceptable error range of 0.04 miles or 211 feet. The second method proposed involves extracting all segments in the anomaly-free dataset and grouping them according to their similarity. Here, a OCSVM is used to train these individual groups. OCSVM is a machine learning algorithm which learns to classify a data as either anomalous or normal. It is then coupled with the KDE which creates a distribution across all the anomalies and identifies the anomaly as one with a high distribution of predictions.The performance of this model was evaluated on a Rail road dataset and the cumulative average Recall and cumulative average Precision 74% and 57% respectively with an acceptable error range of 0.04 miles or 211 feet.en


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record