Semi supervised learning has attracted attention of AI researchers in the recent past, especially after the advent of deep learning methods and their success in several real world applications. Most deep learning models require large amounts of labelled data, which is expensive to obtain. Fraud detection is a very important problem for several industries and large amount of data is often available. However, obtaining labelled data is cumbersome and hence semi-supervised learning is perfectly positioned to aid us in building robust and accurate supervised models. In this work, we consider different kinds of fraud detection paradigms and show that a self-training based semi-supervised learning approach can produce significant improvements over a model that has been training on a limited set of labelled data. We propose a novel self-training approach by using a guided sharpening technique using a pair of autoencoders which provide useful cues for incorporating unlabelled data in the training process. We conduct thorough experiments on three different real world databases and analysis to showcase the effectiveness of the approach. On the elliptic bitcoin fraud dataset, we show that utilizing unlabelled data improves the F1 score of the model trained on limited labelled data by around 10%.
CITATION STYLE
Kumar, A., Ghosh, S., & Verma, J. (2022). Guided Self-Training based Semi-Supervised Learning for Fraud Detection. In Proceedings of the 3rd ACM International Conference on AI in Finance, ICAIF 2022 (pp. 148–155). Association for Computing Machinery, Inc. https://doi.org/10.1145/3533271.3561783
Mendeley helps you to discover research relevant for your work.