Show simple item record

dc.contributor.authorChen, Liangzheen_US
dc.date.accessioned2018-06-20T08:02:14Z
dc.date.available2018-06-20T08:02:14Z
dc.date.issued2018-06-19
dc.identifier.othervt_gsexam:14751en_US
dc.identifier.urihttp://hdl.handle.net/10919/83573
dc.description.abstractTemporal data is ubiquitous nowadays and can be easily found in many applications. Consider the extensively studied social media website Twitter. All the information can be associated with time stamps, and thus form different types of data sequences: a sequence of feature values of users who retweet a message, a sequence of tweets from a certain user, or a sequence of the evolving friendship networks. Mining these data sequences is an important task, which reveals patterns in the sequences, and it is a very challenging task as it usually requires different techniques for different sequences. The problem becomes even more complicated when the sequences are correlated. In this dissertation, we study the following two types of data sequences, and we show how to carefully exploit within-sequence and across-sequence correlations to develop more effective and scalable algorithms. 1. Multi-dimensional value sequences: We study sequences of multi-dimensional values, where each value is associated with a time stamp. Such value sequences arise in many domains such as epidemiology (medical records), social media (keyword trends), etc. Our goals are: for individual sequences, to find a segmentation of the sequence to capture where the pattern changes; for multiple correlated sequences, to use the correlations between sequences to further improve our segmentation; and to automatically find explanations of the segmentation results. 2. Social media post sequences: Driven by applications from popular social media websites such as Twitter and Weibo, we study the modeling of social media post sequences. Our goal is to understand how the posts (like tweets) are generated and how we can gain understanding of the users behind these posts. For individual social media post sequences, we study a prediction problem to find the users' latent state changes over the sequence. For dependent post sequences, we analyze the social influence among users, and how it affects users in generating posts and links. Our models and algorithms lead to useful discoveries, and they solve real problems in Epidemiology, Social Media and Critical Infrastructure Systems. Further, most of the algorithms and frameworks we propose can be extended to solve sequence mining problems in other domains as well.en_US
dc.format.mediumETDen_US
dc.publisherVirginia Techen_US
dc.rightsThis item is protected by copyright and/or related rights. Some uses of this item may be deemed fair and permitted by law even without permission from the rights holder(s), or the rights holder(s) may have licensed the work for use under certain conditions. For other uses you need to obtain permission from the rights holder(s).en_US
dc.subjectSequence Miningen_US
dc.subjectSegmentationen_US
dc.subjectTopic Modelingen_US
dc.titleSegmenting, Summarizing and Predicting Data Sequencesen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Science and Applicationsen_US
dc.contributor.committeechairPrakash, Bodicherla Adityaen_US
dc.contributor.committeememberLiu, Yanen_US
dc.contributor.committeememberRamakrishnan, Narendranen_US
dc.contributor.committeememberLu, Chang Tienen_US
dc.contributor.committeememberFox, Edward A.en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record