Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets

Lino Galiana; Milena Suarez Castillo

Conference ProceedingsOPEN ACCESS

Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets

ACM International Conference Proceeding Series (2022) 331-337

DOI: 10.1145/3524458.3547244

0Citations

5Readers

Get full text

Abstract

Food retailers' scanner data provide unprecedented details on local consumption, provided that product identifiers allow a linkage with features of interest, such as nutritional information. In this paper, we enrich a large retailer dataset with nutritional information extracted from crowd-sourced and administrative nutritional datasets. To compensate for imperfect matching through the barcode, we develop a methodology to efficiently match short textual descriptions. After a preprocessing step to normalize short labels, we resort to fuzzy matching based on several tokenizers (including n-grams) by querying an ElasticSearch customized index and validate candidates echos as matches with a Levensthein edit-distance and an embedding-based similarity measure created from a siamese neural network model. The pipeline is composed of several steps successively relaxing constraints to find relevant matching candidates.

Author supplied keywords

Cite

CITATION STYLE

APA

Galiana, L., & Suarez Castillo, M. (2022). Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets. In ACM International Conference Proceeding Series (pp. 331–337). Association for Computing Machinery. https://doi.org/10.1145/3524458.3547244

Fuzzy matching on big-data: An illustration with scanner and crowd-sourced nutritional datasets

Abstract

Author supplied keywords

Cite

Register to see more suggestions