We describe a simple procedure for the automatic creation of word-level alignments between printed documents and their respective full-text versions. The procedure is unsupervised, uses standard, off-the-shelf components only, and reaches an F-score of 85.01 in the basic setup and up to 86.63 when using pre- and post-processing. Potential areas of application are manual database curation (incl. document triage) and biomedical expression OCR.
CITATION STYLE
Müller, M. C., Ghosh, S., Wittig, U., & Rey, M. (2021). Word-Level Alignment of Paper Documents with their Electronic Full-Text Counterparts. In Proceedings of the 20th Workshop on Biomedical Language Processing, BioNLP 2021 (pp. 168–179). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.bionlp-1.19
Mendeley helps you to discover research relevant for your work.