Statistical Evaluation of Deep Learning for Event Detection in Time Series: Quantifying Uncertainty, Efficiency, and Adaptation with Applications to Seismic Data

Loading...
Thumbnail Image

TR Number

Date

2026-01-14

Journal Title

Journal ISSN

Volume Title

Publisher

Virginia Tech

Abstract

Rapid developments in deep learning have led to their widespread use in domains that rely on time series, largely because of their strong performance and flexibility. Yet evaluation practices have not kept pace. Deep learning models are often assessed using a few performance metrics computed on benchmark datasets, which ignores important questions about how predictive performance varies with data availability, how uncertainty is communicated in both predictions and aggregate metrics, and how shifting data distributions impact model reliability.

Presented as three studies, this dissertation develops principled statistical approaches for deep learning model evaluation that addresses these challenges in the context of time-series-based, scientific problems. The first study introduces an evaluation framework for seismic deep learning models where I assess learning efficiency while mitigating data leakage and quantify benchmark uncertainty by attributing variation to both training stochasticity and data sampling through an expansive design of experiments. The second study compares meta-learning techniques across data regimes and analyzes how consistently they perform under data shift. As part of this study, I contribute SeisTask, a semi-synthetic benchmark dataset with controlled, physically meaningful sources of shift for future study on adaptive learning approaches. The third study provides an empirical comparison of meta-learning and hierarchical Bayesian modeling and highlights their theoretical connection. I compare these methods in terms of interpretability, performance under shift, and predictive uncertainty.

In combination, these studies offer statistically grounded evaluations of deep learning models for event detection in time series and show how uncertainty, data requirements, and distributional shift influence model behavior in physical science applications.

Description

Keywords

Benchmark Variability, Data Leakage, Domain Adaptation, Experimental Design, Model Calibration

Citation