Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors. © 2014 Springer International Publishing Switzerland.
CITATION STYLE
Tissot, H., Peschl, G., & Del Fabro, M. D. (2014). Fast phonetic similarity search over large repositories. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8645 LNCS, pp. 74–81). Springer Verlag. https://doi.org/10.1007/978-3-319-10085-2_6
Mendeley helps you to discover research relevant for your work.