Real Memes In-The-Wild: Explainable Classification of Hateful vs. Non-Hateful Memes
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The virality of hateful or violent memes over the recent years has encouraged deep learning research on hateful meme classification. These models, however, are typically trained to classify memes based on synthetically generated data. Synthetically generated meme data, such as the widely used Hateful Memes Challenge dataset from Meta AI was created by interchanging random texts with random images. Such artificially generated memes often exclude neologisms, insider- expressions, slangs and other linguistic nuances, which are prevalent across real memes that actually circulate online. As a result, current state-of-the-art classifiers are limited in accurately predicting hateful memes in-the-wild. Furthermore, prior research tend to focus on the prediction task rather than explaining the characteristics that make memes hateful. Addressing these challenges, we introduce "RealMemes," a manually curated dataset comprising 3,142 in-the-wild memes collected from various social platforms including Instagram and Reddit, as well as WhatsApp and Telegram groups. Furthermore, we propose an interpretable multimodal classification system designed to not only distinguish between hateful and non-hateful memes, but also elucidate the specific textual and visual elements that contribute to a meme's classification.