Deep embeddings for brand detection in product titles

Andrey Kulagin; Yuriy Gavrilin; Yaroslav Kholodov

Conference Proceedings

Deep embeddings for brand detection in product titles

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11832 LNCS 155-165

DOI: 10.1007/978-3-030-37334-4_14

0Citations

6Readers

Get full text

Abstract

In this paper, we compare various techniques to learn expressive product title embeddings starting from TF-IDF and ending with deep neural architectures. The problem is to recognize brands from noisy retail product names coming from different sources such as receipts and supply documents. In this work we consider product titles written in English and Russian. To determine the state-of-the-art on openly accessed “Universe-HTT barcode reference” dataset, traditional machine learning models, such as SVMs, were compared to Neural Networks with classical softmax activation and cross entropy loss. Furthermore, the scalable variant of the problem was studied, where new brands are recognized without retraining the model. The approach is based on k-Nearest Neighbors, where the search space could be represented by either TF-IDF vectors or deep embeddings. For the latter we have considered two solutions: (1) pretrained FastText embeddings followed by LSTM with Attention and (2) character-level Convolutional Neural Network. Our research shows that deep embeddings significantly outperform TF-IDF vectors. Classification error was reduced from 13.2% for TF-IDF approach to 8.9% and to 7.3% for LSTM embeddings and character-level CNN embeddings correspondingly.

Author supplied keywords

Cite

CITATION STYLE

APA

Kulagin, A., Gavrilin, Y., & Kholodov, Y. (2019). Deep embeddings for brand detection in product titles. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11832 LNCS, pp. 155–165). Springer. https://doi.org/10.1007/978-3-030-37334-4_14

Deep embeddings for brand detection in product titles

Abstract

Author supplied keywords

Cite

Register to see more suggestions