Rare Category Analysis for Complex Data: A Review

TR Number
Journal Title
Journal ISSN
Volume Title

Despite the sheer volume of data being collected, it is often the rare categories that are of the most important in many high impact domains, ranging from financial fraud detection in online transaction networks to emerging trend detection in social networks, from spam image detection in social media to rare disease diagnosis in the medical decision support system. This survey aims to provide a concise review of the state-of-the-art techniques on complex rare category analysis, where the majority classes have a smooth distribution while the minority classes exhibit the compactness property in the feature space or subspace. More specifically, we start with the introduction, problem definition, and unique challenges of complex rare category analysis, then present a comprehensive review of recent advances that are designed for this problem setting, from rare category exploration without any label information to the exposition step that characterizes rare examples with a compact representation, from representing rare patterns in a salient embedding space to interpreting the prediction results and providing relevant clues for the end-users' interpretation; finally, we discuss the potential problems and shed light on the future directions of complex rare category analysis.