Show simple item record

dc.contributor.authorPumma, Sarunya
dc.contributor.authorLiu, Xiaoyang
dc.date.accessioned2015-05-15T04:06:51Z
dc.date.available2015-05-15T04:06:51Z
dc.date.issued2015-05-10
dc.identifier.urihttp://hdl.handle.net/10919/52343
dc.description.abstractIDEAL or Integrated Digital Event Archiving and Library is a project of Virginia Tech to implement a state-of-the-art event-based information retrieval system. A practice project of CS 5604 Information Retrieval is a part of the IDEAL project. The main objective of this project is to build a robust search engine on top of Solr, a general purpose open-source search engine, and Hadoop, a big data processing platform. The search engine can provide documents, which are tweets and webpages, that are relevant to a query that a user provides. To enhance the performance of the search engine, the documents in the archive have been indexed by various approaches including LDA (Latent Dirichlet Allocation), NER (Name-Entity Recognition), Clustering, Classification, and Social Network Analysis. As CS 5604 is a problem-based learning class, teams are responsible for implementation and development of solutions for each technique. In this report, the implementation of the LDA component is presented. LDA aids extracting collections of topics from the documents. A topic in this context is a set of words that can be used to represent a document. Details of how LDA worked with both small and large collections are described. Once the implementation of the LDA component is integrated with other processing and Solr, we are confident that performance of the information retrieval system of the IDEAL project will be enhanced.en_US
dc.description.sponsorshipNSF grant IIS - 1319578, III: Small: Integrated Digital Event Archiving and Library (IDEAL)en_US
dc.language.isoen_USen_US
dc.rightsAttribution-ShareAlike 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by-sa/3.0/us/*
dc.subjectLDAen_US
dc.subjectIDEAL Projecten_US
dc.subjectTopic Extractionen_US
dc.subjectTweetsen_US
dc.subjectWebpagesen_US
dc.titleLDA Team Project in CS5604, Spring 2015: Extracting Topics from Tweets and Webpages for IDEALen_US
dc.typePresentationen_US
dc.typeSoftwareen_US
dc.typeTechnical reporten_US


Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-ShareAlike 3.0 United States
License: Attribution-ShareAlike 3.0 United States