Browsing by Author "Tang, Lijie"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- CS5604: Clustering and Social Networks for IDEALVishwasrao, Saket; Thorve, Swapna; Tang, Lijie (2016-05-03)The Integrated Digital Event Archiving and Library (IDEAL) project of Virginia Tech provides services for searching, browsing, analysis, and visualization of over 1 billion tweets and over 65 million webpages. The project development involved a problem based learning approach which aims to build a state-of-the-art information retrieval system in support of IDEAL. With the primary objective of building a robust search engine on top of Solr, the entire project is divided into various segments like classification, clustering, topic modeling, etc., for improving search results. Our team focuses on two tasks: clustering and social networks. Both these tasks will be considered independent for now. The clustering task aims to congregate documents in groups such that documents within a cluster would be as similar as possible. Documents are tweets and webpages and we present results for different collections. The k-means algorithm is employed for clustering the documents. Two methods were employed for feature extraction, namely, TF-IDF score and the word2vec method. Evaluation of clusters is done by two methods – Within Set Sum of Squares (WSSE) and analyzing the output of the topic analysis team to extract cluster labels and find probability scores for a document. The later strategy is a novel approach for evaluation. This strategy can be used for assessing problems of cluster labeling, likelihood of a document belonging to a cluster, and hierarchical distribution of topics and cluster. The social networking task will extract information from Twitter data by building graphs. Graph theory concepts will be applied for accomplishing this task. Using dimensionality reduction techniques and probabilistic algorithms for clustering, as well as using improving on the cluster labelling and evaluation are some of the things that can be improved on our existing work in the future. Also, the clusters that we have generated can be used as an input source in Classification, Topic Analysis and Collaborative filtering for more accurate results.
- Identifying Product Defects from User Complaints: A Probabilistic Defect ModelZhang, Xuan; Qiao, Zhilei; Tang, Lijie; Fan, Weiguo Patrick; Fox, Edward A.; Wang, Gang Alan (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016-03-02)The recent surge in using social media has created a massive amount of unstructured textual complaints about products and services. However, discovering and quantifying potential product defects from large amounts of unstructured text is a nontrivial task. In this paper, we develop a probabilistic defect model (PDM) that identifies the most critical product issues and corresponding product attributes, simultaneously. We facilitate domain-oriented key attributes (e.g., product model, year of production, defective components, symptoms, etc.) of a product to identify and acquire integral information of defect. We conduct comprehensive evaluations including quantitative evaluations and qualitative evaluations to ensure the quality of discovered information. Experimental results demonstrate that our proposed model outperforms existing unsupervised method (K-Means Clustering), and could find more valuable information. Our research has significant managerial implications for mangers, manufacturers, and policy makers.
- Splash and Spray Assessment Tool Development Program: Final ReportFlintsch, Gerardo W.; Tang, Lijie; Katicha, Samer W.; de León Izeppi, Edgar; Viner, Helen; Dunford, Alan; Nesnas, Kamal; Coyle, Fiona; Sanders, Peter; Gibbons, Ronald B.; Williams, Brian M.; Hargreaves, David; Parry, Tony; McGhee, Kevin K.; Larson, Roger M.; Smith, Kelly L. (Virginia Tech. Virginia Tech Transportation Institute, 2014-10-07)The effects of vehicle splash and spray are well known to motorists who have driven in wet weather conditions. Research suggests that splash and spray contribute to a small but measurable portion of road traffic accidents and are the source of considerable nuisance to motorists. Splash and spray from highway pavements also can carry a number of pollutants and contaminants. When deposited, these contaminants can be poisonous to plant life and accelerate the corrosion of roadway appurtenances. Splash and spray are individually definable processes that are the product of a number of different factors. Many parties have gone to great lengths to reduce the splash and spray created by motor vehicles, especially that from heavy vehicles, by retrofitting devices that alter the vehicle’s aerodynamics. Another possible solution to the problem is to change the characteristics of the highway pavement. Previous research shows that pavement geometry, drainage, texture, and porosity all contribute to splash and spray generation, but the exact mechanisms are largely unknown. A model capable of predicting the splash and spray propensity of pavements can be used by highway engineers to support decisions in highway maintenance and design. The project objective was to develop a simple and practical assessment tool to characterize the propensity of highway sections to generate splash and spray during rainfall and the impact of splash and spray on road users. This report summarizes the development of the splash and spray model and its implementation in an easy-to-use, practical tool.