Social Network Project for IDEAL in CS5604

Abstract

The IDEAL (Integrated Digital Event Archiving and Library) project involves VT faculty, staff, and students, along with collaborators around the world, in archiving important events and integrating the digital library, and archiving approaches to support the Research and Development related to important events. An objective of the CS5604 (Information Retrieval), Spring 2015 course, was to build a state-of-the-art information retrieval system, in support of the IDEAL project. Students were divided into eight groups to become experts in a specific theme of high importance in the development of the tool. The identified themes were Classifying Types, Extraction and Feature Selection, Clustering, Hadoop, LDA, NER, Reducing Noise, Social Networks and Importance and Solr and Lucene.

Our goal as a class was to provide documents that were relevant to an arbitrary user query from within a collection of tweets and their referenced web pages. The goal of the Social Network and Importance group was to develop a query independent importance methodology for these tweets and web pages based on social network type considerations.

This report proposes a method to provide importance to the tweets and web pages by using non-content features. We define two features for the ranking, Twitter specific features and Account authority features. To determine the best set of features, the analysis of their individual effect in the output importance is also included. At the end, an “importance” value is associated with each document, to aid searching and browsing using Solr.

Description
Keywords
Tweets, Webpages, Ranking, Importance Value, Social Network
Citation