Statistical analysis of bibliographic strings for constructing an integrated document space

Atsuhiro Takasu

Conference Proceedings

Statistical analysis of bibliographic strings for constructing an integrated document space

Takasu A

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2458 75-90

DOI: 10.1007/3-540-45747-x_6

1Citations

3Readers

Get full text

Abstract

It is important to utilize retrospective documents when constructing a large digital library. This paper proposes a method for analyzing recognized bibliographic strings using an extended hidden Markov model. The proposed method enables analysis of erroneous bibliographic strings and integrates many documents accumulated as printed articles in a citation index. The proposed method has the advantage of providing a robust bibliographic matching function using the statistical description of the syntax of bibliographic strings, a language model and an Optical Character Recognition (OCR) error model. The method also has the advantage of reducing the cost of preparing training data for parameter estimation, using records in the bibliographic database.

Cite

CITATION STYLE

APA

Takasu, A. (2002). Statistical analysis of bibliographic strings for constructing an integrated document space. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2458, pp. 75–90). Springer Verlag. https://doi.org/10.1007/3-540-45747-x_6

Statistical analysis of bibliographic strings for constructing an integrated document space

Abstract

Cite

Register to see more suggestions