Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Food retailers' scanner data provide unprecedented details on local consumption, provided that product identifiers allow a linkage with features of interest, such as nutritional information. In this paper, we enrich a large retailer dataset with nutritional information extracted from crowd-sourced and administrative nutritional datasets. To compensate for imperfect matching through the barcode, we develop a methodology to efficiently match short textual descriptions. After a preprocessing step to normalize short labels, we resort to fuzzy matching based on several tokenizers (including n-grams) by querying an ElasticSearch customized index and validate candidates echos as matches with a Levensthein edit-distance and an embedding-based similarity measure created from a siamese neural network model. The pipeline is composed of several steps successively relaxing constraints to find relevant matching candidates.

Cite

CITATION STYLE

APA

Galiana, L., & Suarez Castillo, M. (2022). Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets. In ACM International Conference Proceeding Series (pp. 331–337). Association for Computing Machinery. https://doi.org/10.1145/3524458.3547244

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free