Show simple item record

dc.contributor.authorGuo, Shengen_US
dc.date.accessioned2014-03-14T20:13:09Z
dc.date.available2014-03-14T20:13:09Z
dc.date.issued2012-05-02en_US
dc.identifier.otheretd-06152012-070746en_US
dc.identifier.urihttp://hdl.handle.net/10919/28046
dc.description.abstractWith the prevalence of large data stored in the cloud, including unstructured information in the form of text, there is now an increased emphasis on text mining. A broad range of techniques are now used for text mining, including algorithms adapted from machine learning, NLP, computational linguistics, and data mining. Applications are also multi-fold, including classification, clustering, segmentation, relationship discovery, and practically any task that discovers latent information from written natural language. Classical mining algorithms have traditionally focused on shallow representations such as bag-of-words and similar feature-based models. With the advent of modern high performance computing, deep sentence level linguistic analysis of large scale text corpora has become practical. In this dissertation, we evaluate the utility of dependency parses as textual features for different text mining applications. Dependency parsing is one form of syntactic parsing, based on the dependency grammar implicit in sentences. While dependency parsing has traditionally been used for text understanding, we investigate here its application to supply features for text mining applications. We specifically focus on three methods to construct textual features from dependency parses. First, we consider a dependency parse as a general feature akin to a traditional bag-of-words model. Second, we consider the dependency parse as the basis to build a feature graph representation. Finally, we use dependency parses in a supervised collocation mining method for feature selection. To investigate these three methods, several applications are studied, including: (i) movie spoiler detection, (ii) text segmentation, (iii) query expansion, and (iv) recommender systems.en_US
dc.publisherVirginia Techen_US
dc.relation.haspartGuo_Sheng_D_2012.pdfen_US
dc.rightsI hereby certify that, if appropriate, I have obtained and attached hereto a written permission statement from the owner(s) of each third party copyrighted matter to be included in my thesis, dissertation, or project report, allowing distribution as specified below. I certify that the version I submitted is the same as that approved by my advisory committee. I hereby grant to Virginia Tech or its agents the non-exclusive license to archive and make accessible, under the conditions specified below, my thesis, dissertation, or project report in whole or in part in all forms of media, now or hereafter known. I retain all other ownership rights to the copyright of the thesis, dissertation or project report. I also retain the right to use in future works (such as articles or books) all or part of this thesis, dissertation, or project report.en_US
dc.subjectdependency parsingen_US
dc.subjecttext miningen_US
dc.subjectlinguistic cuesen_US
dc.titleUsing Dependency Parses to Augment Feature Construction for Text Miningen_US
dc.typeDissertationen_US
dc.contributor.departmentComputer Scienceen_US
dc.description.degreePh. D.en_US
thesis.degree.namePh. D.en_US
thesis.degree.leveldoctoralen_US
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen_US
thesis.degree.disciplineComputer Scienceen_US
dc.contributor.committeechairRamakrishnan, Narendranen_US
dc.contributor.committeememberFox, Edward Alanen_US
dc.contributor.committeememberHelm, Richard Fredericken_US
dc.contributor.committeememberMurali, T. M.en_US
dc.contributor.committeememberZaki, Mohammed J.en_US
dc.identifier.sourceurlhttp://scholar.lib.vt.edu/theses/available/etd-06152012-070746/en_US
dc.date.sdate2012-06-15en_US
dc.date.rdate2012-06-18
dc.date.adate2012-06-18en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record