The Optical Character Recognition (OCR) problem that often occurs is that the image used, has a lot of noise covering letters in a word partially. This can cause misspellings in the process of word recognition or detection in the image. After the OCR process, we must do some post-processing for correcting the word. The words will be corrected using a string similarity algorithm. So what is the best algorithm? We conducted a comparison algorithm including the Levenshtein distance, Hamming distance, Jaro-Winkler, and Sørensen – Dice coefficient. After testing, the most effective algorithm is the Sørensen-Dice coefficient with a value of 0.88 for the value of precision, recall, and F1 score
CITATION STYLE
Susanto, A. B. K., Muliadi, N., Nugroho, B., & Muljono, M. (2023). Comparison of String Similarity Algorithm in post-processing OCR. Journal of Applied Intelligent System, 8(1), 25–32. https://doi.org/10.33633/jais.v8i1.7079
Mendeley helps you to discover research relevant for your work.