Comparison of String Similarity Algorithm in post-processing OCR

Al Birr Karim Susanto; Nuraziz Muliadi; Bagus Nugroho; Muljono Muljono

Journal ArticleOPEN ACCESS

Comparison of String Similarity Algorithm in post-processing OCR

Susanto A
Muliadi N
Nugroho B
et al.

Journal of Applied Intelligent System (2023) 8(1) 25-32

DOI: 10.33633/jais.v8i1.7079

N/ACitations

11Readers

Abstract

The Optical Character Recognition (OCR) problem that often occurs is that the image used, has a lot of noise covering letters in a word partially. This can cause misspellings in the process of word recognition or detection in the image. After the OCR process, we must do some post-processing for correcting the word. The words will be corrected using a string similarity algorithm. So what is the best algorithm? We conducted a comparison algorithm including the Levenshtein distance, Hamming distance, Jaro-Winkler, and Sørensen – Dice coefficient. After testing, the most effective algorithm is the Sørensen-Dice coefficient with a value of 0.88 for the value of precision, recall, and F1 score

Cite

CITATION STYLE

APA

Susanto, A. B. K., Muliadi, N., Nugroho, B., & Muljono, M. (2023). Comparison of String Similarity Algorithm in post-processing OCR. Journal of Applied Intelligent System, 8(1), 25–32. https://doi.org/10.33633/jais.v8i1.7079

Comparison of String Similarity Algorithm in post-processing OCR

Abstract

Cite

Register to see more suggestions