Probabilistic automaton model for fuzzy english-text retrieval

Manabu Ohta; Atsuhiro Takasu; Jun Adachi

Journal Article

Probabilistic automaton model for fuzzy english-text retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2000) 1923 35-44

DOI: 10.1007/3-540-45268-0_4

5Citations

2Readers

Get full text

Abstract

Optical character reader (OCR) misrecognition is a serious problem when searching against OCR-scanned documents in databases such as digital libraries. This paper proposes fuzzy retrieval methods for English text that contains errors in the recognized text without cor-recting the errors manually. Costs are thereby reduced. The proposed methods generate multiple search terms for each input query term based on probabilistic automata reflecting both error-occurrence probabilities and character-connection probabilities. Experimental results of test-set retrieval indicate that one of the proposed methods improves the recall rate from 95.56% to 97.88% at the cost of a decrease in precision rate from 100.00% to 95.52% with 20 expanded search terms. © Springer-Verlag Berlin Heidelberg 2000.

Cite

CITATION STYLE

APA

Ohta, M., Takasu, A., & Adachi, J. (2000). Probabilistic automaton model for fuzzy english-text retrieval. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 1923, 35–44. https://doi.org/10.1007/3-540-45268-0_4

Probabilistic automaton model for fuzzy english-text retrieval

Abstract

Cite

Register to see more suggestions