VTechWorks staff will be away for the Thanksgiving holiday beginning at noon on Wednesday, November 27, through Friday, November 29. We will resume normal operations on Monday, December 2. Thank you for your patience.
 

Leverage Fusion of Sentiment Features and Bert-based Approach to Improve Hate Speech Detection

dc.contributor.authorCheng, Kai Hsiangen
dc.contributor.committeechairLu, Chang Tienen
dc.contributor.committeememberChen, Ing Rayen
dc.contributor.committeememberCho, Jin-Heeen
dc.contributor.departmentComputer Scienceen
dc.date.accessioned2022-06-24T08:01:22Zen
dc.date.available2022-06-24T08:01:22Zen
dc.date.issued2022-06-23en
dc.description.abstractSocial media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, the Bert-based approach is promising and thus dominates SemEval-2019 Task 6, a hate speech detection competition. In this work, the method of fusion of sentiment features and Bert-based approach is proposed. The classic Bert architecture for hate speech detection is modified to fuse with additional sentiment features, provided by an extractor pre-trained on Sentiment140. The proposed model is compared with top-3 models in SemEval-2019 Task 6 Subtask A and achieves 83.1% F1 score that better than the models in the competition. Also, to see if additional sentiment features benefit the detectoin of hate speech, the features are fused with three kind of deep learning architectures respectively. The results show that the models with sentiment features perform better than those models without sentiment features.en
dc.description.abstractgeneralSocial media has become an important place for modern people to conveniently share and exchange their ideas and opinions. However, not all content on the social media have positive impact. Hate speech is one kind of harmful content that people use abusive speech attacking or promoting hate towards a specific group or an individual. With online hate speech on the rise these day, people have explored ways to automatically recognize the hate speech, and among the ways people have studied, Bert is one of promising approach for automatic hate speech recognition. Bert is a kind of deep learning model for natural language processing (NLP) that originated from Transformer developed by Google in 2017. The Bert has applied to many NLP tasks and achieved astonished results such as text classification, semantic similarity between pairs of sentences, question answering with given paragraph, and text summarization. So in this study, Bert will be adopted to learn the meaning of given text and distinguish the hate speech from tons of tweets automatically. In order to let Bert better capture hate speech, the approach in this work modifies Bert to take additional source of sentiment-related features for learning the pattern of hate speech, given that the emotion will be negative when people trying to put out abusive speech. For evaluation of the approach, our model is compared against those in SemEval-2019 Task 6, a famous hate speech detection competition, and the results show that the proposed model achieves 83.1\% F1 score better than the models in the competition. Also, to see if additional sentiment features benefit the detection of hate speech, the features are fused with three different kinds of deep learning architectures respectively, and the results show that the models with sentiment features perform better than those without sentiment features.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:35115en
dc.identifier.urihttp://hdl.handle.net/10919/110929en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjecthate speech detectionen
dc.subjectsentiment featuresen
dc.subjectBERTen
dc.titleLeverage Fusion of Sentiment Features and Bert-based Approach to Improve Hate Speech Detectionen
dc.typeThesisen
thesis.degree.disciplineComputer Science and Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cheng_K_T_2022.pdf
Size:
3.26 MB
Format:
Adobe Portable Document Format

Collections