NRV Tweets and RSS feeds

dc.contributor.authorRoble, Benjaminen
dc.contributor.authorCheng, Justinen
dc.contributor.authorSbitani, Marwanen
dc.date.accessioned2014-05-09T19:19:09Zen
dc.date.available2014-05-09T19:19:09Zen
dc.date.issued2014-05-09en
dc.descriptionThis collection contains the source code, programs, documentation, and example data used in the project. Please review the "Final Report and Technical Manual" for a comprehensive overview of the project. The open source library Mallet was used and is referenced here: McCallum, Andrew Kachites. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu. 2002.en
dc.description.abstractThe goal of this project was to associate existing data in the Virtual Town Square database from the New River Valley area with topical metadata. We took a database of approximately 360,000 tweets and 15,000 RSS news stories collected in the last two years and associated each RSS story and tweet with topics. The open-source natural language processing library Mallet was used to perform topical modeling on the data using Latent Dirichlet Allocation, which was then used to create a Solr instance of searchable tweets and news stories. Topical modeling was not done around specific events, instead the entire tweet data (and entire RSS data) was used as the corpus. The tweet data was analyzed separately from the RSS stories, so the generated topics are specific to each dataset. This report details the methodology used in our work in the Methodology section and contains a detailed Developer’s Guide and User’s Guide so that others may continue our work. The client was satisfied with the outcome of this project as, even though tweets have generally been considered too short to be run through a topical modeling process, we generated topics for each tweet that appear to be relevant and accurate.en
dc.description.sponsorshipVirginia Tech Center for Human-Computer Interaction Associate Director: Dr. Kavanaugh, kavan@vt.edu; Virginia Tech PhD Student: Ji Wang (InfoVis Lab), wji@cs.vt.edu; Virginia Tech PhD Student: Mohamed Magdy, mmagdy@vt.edu; Virginia Tech Professor: Dr. Edward Fox, fox@vt.eduen
dc.identifier.urihttp://hdl.handle.net/10919/47937en
dc.language.isoen_USen
dc.rightsCreative Commons Attribution 3.0 United Statesen
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/en
dc.subjectnlpen
dc.subjectnatural language processingen
dc.subjectldaen
dc.subjectlatent dirichlet allocationen
dc.subjectmalleten
dc.subjectopen sourceen
dc.subjecttweetsen
dc.subjectrssen
dc.subjectnrven
dc.subjectnew river valleyen
dc.subjectblacksburgen
dc.subjectIDEALen
dc.titleNRV Tweets and RSS feedsen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 10
Name:
JSONLoader.tar.gz
Size:
1.24 MB
Format:
Unknown data format
Description:
JSONLoader java class and libraries
Name:
mallet.tar.gz
Size:
47.58 MB
Format:
Unknown data format
Description:
Mallet source code and script
Name:
nrvtweets_data.tar.gz
Size:
2.35 KB
Format:
Unknown data format
Description:
Tweets and RSS data (both raw and processed)
Name:
solr_data.tar.gz
Size:
14.91 KB
Format:
Unknown data format
Description:
Solr data and schema
Loading...
Thumbnail Image
Name:
CS 4624 NRV Tweets Midterm.pdf
Size:
679.43 KB
Format:
Adobe Portable Document Format
Description:
Midterm Presentation PDF
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: