Automated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analytics

dc.contributor.authorMahendiran, Aravindanen
dc.contributor.committeechairRamakrishnan, Narenen
dc.contributor.committeememberRibbens, Calvin J.en
dc.contributor.committeememberPrakash, B. Adityaen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2014-02-13T09:00:12Zen
dc.date.available2014-02-13T09:00:12Zen
dc.date.issued2014-02-12en
dc.description.abstractTwitter has become a popular data source in the recent decade and garnered a significant amount of attention as a surrogate data source for many important forecasting problems. Strong correlations have been observed between Twitter indicators and real-world trends spanning elections, stock markets, book sales, and flu outbreaks. A key ingredient to all methods that use Twitter for forecasting is to agree on a domain-specific vocabulary to track the pertinent tweets, which is typically provided by subject matter experts (SMEs). The language used in Twitter drastically differs from other forms of online discourse, such as news articles and blogs. It constantly evolves over time as users adopt popular hashtags to express their opinions. Thus, the vocabulary used by forecasting algorithms needs to be dynamic in nature and should capture emerging trends over time. This thesis proposes a novel unsupervised learning algorithm that builds a dynamic vocabulary using Probabilistic Soft Logic (PSL), a framework for probabilistic reasoning over relational domains. Using eight presidential elections from Latin America, we show how our query expansion methodology improves the performance of traditional election forecasting algorithms. Through this approach we demonstrate how we can achieve close to a two-fold increase in the number of tweets retrieved for predictions and a 36.90% reduction in prediction error.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:2260en
dc.identifier.urihttp://hdl.handle.net/10919/25430en
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectElection Forecastingen
dc.subjectTwitteren
dc.subjectQuery Expansionen
dc.subjectSocial Group Modelingen
dc.subjectProbabilistic Soft Logicen
dc.titleAutomated Vocabulary Building for Characterizing and Forecasting Elections using Social Media Analyticsen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Mahendiran_A_T_2014.pdf
Size:
2.52 MB
Format:
Adobe Portable Document Format

Collections