Embeddings of Categorical Variables for Sequential Data in Fraud Context

16Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper we propose a new generic method to work with categorical variables in case of sequential data. Our main contributions are: (1) The use of unsupervised methods to extract sequential information, (2) The generation of embeddings including this sequential information for categorical variables using the well-known Word2Vec neural network. The use of embeddings not only reduced the memory usage but also improved the machine learning algorithms learning capacity from data compared with commonly used One-Hot encoding. We implemented those processes on a real world credit card fraud dataset, which represents more than 400 million transactions over a one year time window. We demonstrated that we were able to reduce the memory usage by 50% and to improve performance by 3% points while using only a small subset of features.

Cite

CITATION STYLE

APA

Russac, Y., Caelen, O., & He-Guelton, L. (2018). Embeddings of Categorical Variables for Sequential Data in Fraud Context. In Advances in Intelligent Systems and Computing (Vol. 723, pp. 542–552). Springer Verlag. https://doi.org/10.1007/978-3-319-74690-6_53

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free