Using Dependency Parses to Augment Feature Construction for Text Mining

Guo, Sheng

Using Dependency Parses to Augment Feature Construction for Text Mining

dc.contributor.author	Guo, Sheng	en
dc.contributor.committeechair	Ramakrishnan, Naren	en
dc.contributor.committeemember	Fox, Edward A.	en
dc.contributor.committeemember	Helm, Richard F.	en
dc.contributor.committeemember	Murali, T. M.	en
dc.contributor.committeemember	Zaki, Mohammed J.	en
dc.contributor.department	Computer Science	en
dc.date.accessioned	2014-03-14T20:13:09Z	en
dc.date.adate	2012-06-18	en
dc.date.available	2014-03-14T20:13:09Z	en
dc.date.issued	2012-05-02	en
dc.date.rdate	2012-06-18	en
dc.date.sdate	2012-06-15	en
dc.description.abstract	With the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language. Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications. We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems.	en
dc.description.degree	Ph. D.	en
dc.identifier.other	etd-06152012-070746	en
dc.identifier.sourceurl	http://scholar.lib.vt.edu/theses/available/etd-06152012-070746/	en
dc.identifier.uri	http://hdl.handle.net/10919/28046	en
dc.publisher	Virginia Tech	en
dc.relation.haspart	Guo_Sheng_D_2012.pdf	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	dependency parsing	en
dc.subject	text mining	en
dc.subject	linguistic cues	en
dc.title	Using Dependency Parses to Augment Feature Construction for Text Mining	en
dc.type	Dissertation	en
thesis.degree.discipline	Computer Science	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Ph. D.	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Guo_Sheng_D_2012.pdf
Size:: 1.54 MB
Format:: Adobe Portable Document Format

Download

Collections

Doctoral Dissertations