Show simple item record

dc.contributor.authorRoble, Benjaminen
dc.contributor.authorCheng, Justinen
dc.contributor.authorSbitani, Marwanen
dc.date.accessioned2014-05-09T19:19:09Zen
dc.date.available2014-05-09T19:19:09Zen
dc.date.issued2014-05-09en
dc.identifier.urihttp://hdl.handle.net/10919/47937en
dc.descriptionThis collection contains the source code, programs, documentation, and example data used in the project. Please review the "Final Report and Technical Manual" for a comprehensive overview of the project. The open source library Mallet was used and is referenced here: McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.en
dc.description.abstractThe goal of this project was to associate existing data in the Virtual Town Square database from the New River Valley area with topical metadata. We took a database of approximately 360,000 tweets and 15,000 RSS news stories collected in the last two years and associated each RSS story and tweet with topics. The open-source natural language processing library Mallet was used to perform topical modeling on the data using Latent Dirichlet Allocation, which was then used to create a Solr instance of searchable tweets and news stories. Topical modeling was not done around specific events, instead the entire tweet data (and entire RSS data) was used as the corpus. The tweet data was analyzed separately from the RSS stories, so the generated topics are specific to each dataset. This report details the methodology used in our work in the Methodology section and contains a detailed Developer’s Guide and User’s Guide so that others may continue our work. The client was satisfied with the outcome of this project as, even though tweets have generally been considered too short to be run through a topical modeling process, we generated topics for each tweet that appear to be relevant and accurate.en
dc.description.sponsorshipVirginia Tech Center for Human-Computer Interaction Associate Director: Dr. Kavanaugh, kavan@vt.edu; Virginia Tech PhD Student: Ji Wang (InfoVis Lab), wji@cs.vt.edu; Virginia Tech PhD Student: Mohamed Magdy, mmagdy@vt.edu; Virginia Tech Professor: Dr. Edward Fox, fox@vt.eduen
dc.language.isoen_USen
dc.rightsCreative Commons Attribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/en
dc.subjectnlpen
dc.subjectnatural language processingen
dc.subjectldaen
dc.subjectlatent dirichlet allocationen
dc.subjectmalleten
dc.subjectopen sourceen
dc.subjecttweetsen
dc.subjectrssen
dc.subjectnrven
dc.subjectnew river valleyen
dc.subjectblacksburgen
dc.subjectIDEALen
dc.titleNRV Tweets and RSS feedsen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Creative Commons Attribution 3.0 United States
License: Creative Commons Attribution 3.0 United States