Show simple item record

dc.contributor.authorChakraborty, Prithwishen_US
dc.date.accessioned2017-12-28T07:00:26Z
dc.date.available2017-12-28T07:00:26Z
dc.date.issued2016-07-05en_US
dc.identifier.othervt_gsexam:8179en_US
dc.identifier.urihttp://hdl.handle.net/10919/81432
dc.description.abstractModeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis Item is protected by copyright and/or related rights. Some uses of this Item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectMultivariate Time Seriesen_US
dc.subjectSurrogatesen_US
dc.subjectGeneralized Linear Modelsen_US
dc.subjectBayesian Sequential Analysisen_US
dc.subjectComputational Epidemiologyen_US
dc.titleData-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogatesen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairRamakrishnan, Narendranen_US
dc.contributor.committeememberMarathe, Madhav Vishnuen_US
dc.contributor.committeememberBrownstein, John S.en_US
dc.contributor.committeememberLu, Chang Tienen_US
dc.contributor.committeememberTandon, Ravien_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record