A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures

  • Abdul-Jabbar S
  • et al.
N/ACitations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

This paper aims to introduce an optimized Damerau-Levenshtein and dice-coefficients using enumeration operations (ODADNEN) for providing fast string similarity measure with maintaining the results accuracy; searching to find specific words within a large text is a hard job which takes a lot of time and efforts. The string similarity measure plays a critical role in many searching problems. In this paper, different experiments were conducted to handle some spelling mistakes. An enhanced algorithm for string similarity assessment was proposed. This algorithm is a combined set of well-known algorithms with some improvements (e.g. the dice-coefficient was modified to deal with numbers instead of characters using certain conditions). These algorithms were adopted after conducting on a number of experimental tests to check its suitability. The ODADNN algorithm was tested using real data; its performance was compared with the original similarity measure. The results indicated that the most convincing measure is the proposed hybrid measure, which uses the Damerau-Levenshtein and dice-distance based on n-gram of each word to handle; also, it requires less processing time in comparison with the standard algorithms. Furthermore, it provides efficient results to assess the similarity between two words without the need to restrict the word length. Index Terms-Word classification, Word clustering, String distance, String matching operation, and String similarity metric.

Cite

CITATION STYLE

APA

Abdul-Jabbar, S., & George, L. (2017). A Comparative Study for String Metrics and the Feasibility of Joining them as Combined Text Similarity Measures. ARO-The Scientific Journal of Koya University, 5(2), 6–18. https://doi.org/10.14500/aro.10180

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free