CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse

Jafri, Farhan; Rauniyar, Kritesh; Thapa, Surendrabikram; Siddiqui, Mohammad; Khushi, Matloob; Naseem, Usman

CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse

dc.contributor.author	Jafri, Farhan	en
dc.contributor.author	Rauniyar, Kritesh	en
dc.contributor.author	Thapa, Surendrabikram	en
dc.contributor.author	Siddiqui, Mohammad	en
dc.contributor.author	Khushi, Matloob	en
dc.contributor.author	Naseem, Usman	en
dc.date.accessioned	2024-06-04T18:49:21Z	en
dc.date.available	2024-06-04T18:49:21Z	en
dc.date.issued	2024	en
dc.date.updated	2024-06-01T08:00:06Z	en
dc.description.abstract	In the ever-evolving landscape of online discourse and political dialogue, the rise of hate speech poses a significant challenge to maintaining a respectful and inclusive digital environment. The context becomes particularly complex when considering the Hindi language—a low-resource language with limited available data. To address this pressing concern, we introduce the CHUNAV dataset—a collection of 11,457 Hindi tweets gathered during assembly elections in various states. CHUNAV is purpose-built for hate speech categorization and the identification of target groups. The dataset is a valuable resource for exploring hate speech within the distinctive socio-political context of Indian elections. The tweets within CHUNAV have been meticulously categorized into "Hate" and "Non-Hate" labels, and further subdivided to pinpoint the specific targets of hate speech, including "Individual", "Organization", and "Community" labels (as shown in Figure 1). Furthermore, this paper presents multiple benchmark models for hate speech detection, along with an innovative ensemble and oversampling-based method. The paper also delves into the results of topic modeling, all aimed at effectively addressing hate speech and target identification in the Hindi language. This contribution seeks to advance the field of hate speech analysis and foster a safer and more inclusive online space within the distinctive realm of Indian Assembly Elections. The dataset is available at https://github.com/Farhan-jafri/Chunav	en
dc.description.version	Accepted version	en
dc.format.mimetype	application/pdf	en
dc.identifier.doi	https://doi.org/10.1145/3665245	en
dc.identifier.uri	https://hdl.handle.net/10919/119266	en
dc.language.iso	en	en
dc.publisher	ACM	en
dc.rights	In Copyright	en
dc.rights.holder	The author(s)	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.title	CHUNAV: Analyzing Hindi Hate Speech and Targeted Groups in Indian Election Discourse	en
dc.title.serial	ACM Transactions on Asian and Low-Resource Language Information Processing	en
dc.type	Article - Refereed	en
dc.type.dcmitype	Text	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 3665245.pdf
Size:: 2.32 MB
Format:: Adobe Portable Document Format
Description:: Accepted version

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Journal Articles, Association for Computing Machinery (ACM)
Scholarly Works, Computer Science