Data-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogates

dc.contributor.authorChakraborty, Prithwishen
dc.contributor.committeechairRamakrishnan, Narenen
dc.contributor.committeememberMarathe, Madhav Vishnuen
dc.contributor.committeememberBrownstein, John S.en
dc.contributor.committeememberLu, Chang-Tienen
dc.contributor.committeememberTandon, Ravien
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2017-12-28T07:00:26Zen
dc.date.available2017-12-28T07:00:26Zen
dc.date.issued2016-07-05en
dc.description.abstractModeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'.en
dc.description.degreePh. D.en
dc.format.mediumETDen
dc.identifier.othervt_gsexam:8179en
dc.identifier.urihttp://hdl.handle.net/10919/81432en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectMultivariate Time Seriesen
dc.subjectSurrogatesen
dc.subjectGeneralized Linear Modelsen
dc.subjectBayesian Sequential Analysisen
dc.subjectComputational Epidemiologyen
dc.titleData-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogatesen
dc.typeDissertationen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.leveldoctoralen
thesis.degree.namePh. D.en

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chakraborty_P_D_2016.pdf
Size:
6.31 MB
Format:
Adobe Portable Document Format