Embeddings of Categorical Variables for Sequential Data in Fraud Context

Yoan Russac; Olivier Caelen; Liyun He-Guelton

Conference Proceedings

Embeddings of Categorical Variables for Sequential Data in Fraud Context

Advances in Intelligent Systems and Computing (2018) 723 542-552

DOI: 10.1007/978-3-319-74690-6_53

16Citations

35Readers

Get full text

Abstract

In this paper we propose a new generic method to work with categorical variables in case of sequential data. Our main contributions are: (1) The use of unsupervised methods to extract sequential information, (2) The generation of embeddings including this sequential information for categorical variables using the well-known Word2Vec neural network. The use of embeddings not only reduced the memory usage but also improved the machine learning algorithms learning capacity from data compared with commonly used One-Hot encoding. We implemented those processes on a real world credit card fraud dataset, which represents more than 400 million transactions over a one year time window. We demonstrated that we were able to reduce the memory usage by 50% and to improve performance by 3% points while using only a small subset of features.

Author supplied keywords

Cite

CITATION STYLE

APA

Russac, Y., Caelen, O., & He-Guelton, L. (2018). Embeddings of Categorical Variables for Sequential Data in Fraud Context. In Advances in Intelligent Systems and Computing (Vol. 723, pp. 542–552). Springer Verlag. https://doi.org/10.1007/978-3-319-74690-6_53

Embeddings of Categorical Variables for Sequential Data in Fraud Context

Abstract

Author supplied keywords

Cite

Register to see more suggestions