An active learning framework for duplicate detection in saas platforms

0Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the rapid growth of users' data in SaaS (Software-as-a-service) platforms using micro-services, it becomes essential to detect duplicated entities for ensuring the integrity and consistency of data in many companies and businesses (primarily multinational corporations). Due to the large volume of databases today, the expected duplicate detection algorithms need to be not only accurate but also practical, which means that it can release the detection results as fast as possible for a given request. Among existing algorithms for the deduplicate detection problem, using Siamese neural networks with the triplet loss has become one of the robust ways to measure the similarity of two entities (texts, paragraphs, or documents) for identifying all possible duplicated items. In this paper, we first propose a practical framework for building a duplicate detection system in a SaaS platform. Second, we present a new active learning schema for training and updating duplicate detection algorithms. In this schema, we not only allow the crowd to provide more annotated data for enhancing the chosen learning model but also use the Siamese neural networks as well as the triplet loss to construct an efficient model for the problem. Finally, we design a user interface of our proposed deduplicate detection system, which can easily apply for empirical applications in different companies.

Cite

CITATION STYLE

APA

Nguyen, Q. H., Nguyen, D., Dao, M. S., Dang-Nguyen, D. T., Gurrin, C., & Nguyen, B. T. (2020). An active learning framework for duplicate detection in saas platforms. In ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 412–415). Association for Computing Machinery. https://doi.org/10.1145/3372278.3391933

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free