VTechWorks staff will be away for the winter holidays starting Tuesday, December 24, 2024, through Wednesday, January 1, 2025, and will not be replying to requests during this time. Thank you for your patience, and happy holidays!
 

Topic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEAL

dc.contributor.authorMehta, Snehaen
dc.contributor.authorVinayagam, Radha Krishnanen
dc.date.accessioned2016-05-07T22:33:58Zen
dc.date.available2016-05-07T22:33:58Zen
dc.date.issued2016-05-04en
dc.descriptionThis submission includes the project report, final presentation, LDA code, test datasets and its results. In the compressed folder, "TopicAnalysis-code.zip", we have included the LDA Scala source code (lda_v1.scala) for processing Tweets and a JAR file for web page analysis. The compressed folder, "TopicAnalysis-TestData&Results.zip" contains cleaned Tweet collections and web pages from the Obamacare collection. In the same folder, we have also included the topic results for each collection and a PDF file to interpret the collection IDs.en
dc.description.abstractThe IDEAL (Integrated Digital Event Archiving and Library) project aims to ingest tweets and web-based content from social media and the web and index it for retrieval. One of the required milestones for a graduate-level course CS5604 on Information Storage and Retrieval is to implement a state-of-the-art information retrieval and analysis system in support of the IDEAL project. The overall objective of this project is to build a robust Information Retrieval system on top of Solr, a general purpose open-source search engine. To enable the search and retrieval process we use various approaches including Latent Dirichlet Allocation, Named-Entity Recognition, Clustering, Classification, Social Network Analysis and Front-end interface for search. The project has been divided into various segments and our team has been assigned Topic Analysis. A topic in this context is a set of words that can be used to represent a document. The output of our team will be a well-defined set of topics that describe each document in the collections we have. The topics will facilitate a facet based search in the frontend search interface. This submission includes the project report, final presentation, LDA code, test datasets, and results. In the project report,we introduce the relevant background, design & implementation, and the requirements to make our part functional. The developer’s manual describes our approach in detail. Walk-through tutorials for related software packages have been included in the user’s manual. Finally, we also provide exhaustive results and detailed evaluation methodologies for the topic quality.en
dc.description.sponsorshipNSF IIS - 1319578: III: Small: Integrated Digital Event Archiving and Library (IDEAL)en
dc.identifier.urihttp://hdl.handle.net/10919/70933en
dc.language.isoen_USen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectTopic Analysisen
dc.subjectLDAen
dc.subjectInformation Retrievalen
dc.subjectTweetsen
dc.subjectWebpagesen
dc.titleTopic Analysis project in CS5604, Spring 2016: Extracting Topics from Tweets and Webpages for IDEALen
dc.typeDataseten
dc.typePresentationen
dc.typeSoftwareen
dc.typeTechnical reporten

Files

Original bundle
Now showing 1 - 5 of 6
Name:
FinalPresentation-LDA.pptx
Size:
336.31 KB
Format:
Microsoft Powerpoint
Description:
Topic Analysis team final presentation .pptx
Loading...
Thumbnail Image
Name:
FinalPresentation-LDA.pdf
Size:
262.39 KB
Format:
Adobe Portable Document Format
Description:
Topic Analysis team final presentation .pdf
Name:
FinalReport-TopicAnalysis.docx
Size:
1.58 MB
Format:
Microsoft Word
Description:
Topic Analysis team final report .docx
Loading...
Thumbnail Image
Name:
FinalReport-TopicAnalysis.pdf
Size:
1.85 MB
Format:
Adobe Portable Document Format
Description:
Topic Analysis team final report .pdf
Name:
TopicAnalysis-code.zip
Size:
623.33 KB
Format:
Description:
Topic Analysis-LDA final code
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: