An efficient approach for automated token formation for record de-duplication with special reference to real-time data-warehouse environment

ISSN: 22498958
2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

The record de-duplication is an important part of data cleaning process of a data-warehouse. Identification of multiple duplicate entries of a single entity in a data-warehouse is known as de-duplication. A lot of research is carried out on various aspects of record de-duplication such as use of blocking and indexing techniques, choice of blocking predicate, quality of blocking and optimization in comparison space. A special attention is required for de-duplication process in a Real-time Environment. This research attempts to address automatic token formation for real-time data de-duplication process. In the proposed approach no human intervention is required for the de-duplication process. Proposed Optimized Automated Token Formation (OATF) is a two-step approach where in the former step candidates of token are generated and in the later step, optimal candidates are selected which assure maximum true positive coverage. Experimentation shows that OATF outperforms manual token formation by 29 % and 14 % respectively for Cora and Restaurant data-sets. It also shows 40 % better results over existing FDY-SNI algorithm for Cora data-set. A framework for Real-time de-duplication is also proposed where dis-joint sorted indexes are used to accomplish real-time data update. Alike other existing methods it works well without any parameter setting by human experts for real-time de-duplication.

Cite

CITATION STYLE

APA

Wangikar, V. C., Deshmukh, S. N., & Bhirud, S. G. (2019). An efficient approach for automated token formation for record de-duplication with special reference to real-time data-warehouse environment. International Journal of Engineering and Advanced Technology, 8(4), 151–159.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free