A Framework for Automated Discovery and Analysis of Suspicious Trade Records
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Illegal logging and timber trade presents a persistent threat to global biodiversity and national security due to its ties with illicit financial flows, and causes revenue loss. The scale of global commerce in timber and associated products, combined with the complexity and geographical spread of the supply chain entities present a non-trivial challenge in detecting such transactions. International shipment records, specifically those containing bill of lading is a key source of data which can be used to detect, investigate and act upon such transactions. The comprehensive problem can be described as building a framework that can perform automated discovery and facilitate actionability on detected transactions. A data driven machine learning based approach is necessitated due to the volume, velocity and complexity of international shipping data. Such an automated framework can immensely benefit our targeted end-users---specifically the enforcement agencies.
This overall problem comprises of multiple connected sub-problems with associated research questions. We incorporate crucial domain knowledge---in terms of data as well as modeling---through employing expertise of collaborating domain specialists from ecological conservationist agencies. The collaborators provide formal and informal inputs spanning across the stages---from requirement specification to the design. Following the paradigm of similar problems such as fraud detection explored in prior literature, we formulate the core problem of discovering suspicious transactions as an anomaly detection task. The first sub-problem is to build a system that can be used find suspicious transactions in shipment data pertaining to imports and exports of multiple countries with different country specific schema. We present a novel anomaly detection approach---for multivariate categorical data, following constraints of data characteristics, combined with a data pipeline that incorporates domain knowledge. The focus of the second problem is U.S. specific imports, where data characteristics differ from the prior sub-problem---with heterogeneous attributes present. This problem is important since U.S. is a top consumer and there is scope of actionable enforcement. For this we present a contrastive learning based anomaly detection model for heterogeneous tabular data, with performance and scalability characteristics applicable to real world trade data. While the first two problems address the task of detecting suspicious trades through anomaly detection, a practical challenge with anomaly detection based systems is that of relevancy or scenario specific precision. The third sub-problem addresses this through a human-in-the-loop approach augmented by visual analytics, to re-rank anomalies in terms of relevance---providing explanations for cause of anomalies and soliciting feedback. The last sub-problem pertains to explainability and actionability towards suspicious records, through algorithmic recourse. Algorithmic recourse aims to provides meaningful alternatives towards flagged anomalous records, such that those counterfactual examples are not judged anomalous by the underlying anomaly detection system. This can help enforcement agencies advise verified trading entities in modifying their trading patterns to avoid false detection, thus streamlining the process. We present a novel formulation and metrics for this unexplored problem of algorithmic recourse in anomaly detection. and a deep learning based approach towards explaining anomalies and generating counterfactuals.
Thus the overall research contributions presented in this dissertation addresses the requirements of the framework, and has general applicability in similar scenarios beyond the scope of this framework.