Fast phonetic similarity search over large repositories

Hegler Tissot; Gabriel Peschl; Marcos Didonet Del Fabro

Conference Proceedings

Fast phonetic similarity search over large repositories

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8645 LNCS(PART 2) 74-81

DOI: 10.1007/978-3-319-10085-2_6

5Citations

7Readers

Get full text

Abstract

Analysis of unstructured data may be inefficient in the presence of spelling errors. Existing approaches use string similarity methods to search for valid words within a text, with a supporting dictionary. However, they are not rich enough to encode phonetic information to assist the search. In this paper, we present a novel approach for efficiently perform phonetic similarity search over large data sources, that uses a data structure called PhoneticMap to encode language-specific phonetic information. We validate our approach through an experiment over a data set using a Portuguese variant of a well-known repository, to automatically correct words with spelling errors. © 2014 Springer International Publishing Switzerland.

Author supplied keywords

Cite

CITATION STYLE

APA

Tissot, H., Peschl, G., & Del Fabro, M. D. (2014). Fast phonetic similarity search over large repositories. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8645 LNCS, pp. 74–81). Springer Verlag. https://doi.org/10.1007/978-3-319-10085-2_6

Fast phonetic similarity search over large repositories

Abstract

Author supplied keywords

Cite

Register to see more suggestions