Social Network Project for IDEAL in CS5604

View/ Open
Downloads: 693
Downloads: 95
Downloads: 33
Downloads: 25
Downloads: 31
Downloads: 58
Downloads: 29
Downloads: 38
Downloads: 25
Downloads: 25
Downloads: 22
Downloads: 31
Downloads: 26
Downloads: 25
Downloads: 25
Downloads: 23
Downloads: 1538
Downloads: 1301
Date
2015-05-11Author
Harb, Islam
Jin, Yilong
Cedeno, Vanessa
Mallampati, Sai Ravi Kiran
Bulusu, Bhaskara Srinivasa Bharadwaj
Metadata
Show full item recordAbstract
The IDEAL (Integrated Digital Event Archiving and Library) project involves VT faculty, staff, and students, along with collaborators around the world, in archiving important events and integrating the digital library, and archiving approaches to support the Research and Development related to important events. An objective of the CS5604 (Information Retrieval), Spring 2015 course, was to build a state-of-the-art information retrieval system, in support of the IDEAL project. Students were divided into eight groups to become experts in a specific theme of high importance in the development of the tool. The identified themes were Classifying Types, Extraction and Feature Selection, Clustering, Hadoop, LDA, NER, Reducing Noise, Social Networks and Importance and Solr and Lucene.
Our goal as a class was to provide documents that were relevant to an arbitrary user query from within a collection of tweets and their referenced web pages. The goal of the Social Network and Importance group was to develop a query independent importance methodology for these tweets and web pages based on social network type considerations.
This report proposes a method to provide importance to the tweets and web pages by using non-content features. We define two features for the ranking, Twitter specific features and Account authority features. To determine the best set of features, the analysis of their individual effect in the output importance is also included. At the end, an “importance” value is associated with each document, to aid searching and browsing using Solr.
Collections
License files: