Facing uncertainty in digitisation

Lukas Gander; Ulrich Reffle; Christoph Ringlstetter; Sven Schlarb; Klaus Schulz; Raphael Unterweger

Journal Article

Facing uncertainty in digitisation

Studies in Fuzziness and Soft Computing (2012) 273 195-207

DOI: 10.1007/978-3-642-24672-2_10

0Citations

3Readers

Get full text

Abstract

In actual practice, digitisation and text recognition (OCR) refers to a processing chain, starting with the scanning of original assets (newspaper, book, manuscript, etc.) and the creation of digital images of the asset's pages, which is the basis for producing digital text documents. The core process is Optical Character Recognition (OCR) which is preceded by image enhancement steps, like deskewing, denoising, etc., and is followed by post-processing steps, such as linguistic correction of OCR errors or enrichment of the OCR results, like adding layout information and identifying semantic units of a page (e.g. page number). In this paper, the focus lies on the post-processing steps. Two selected research areas of the European project IMPACT (IMProving ACcess to Text) will be outlined. Firstly, we present a technology for OCR and information retrieval on historical document collections, and discuss the potential use of fuzzy logic. Secondly, we present the Functional Extension Parser, a software that implements a fuzzy rule-based system for detecting and reconstructing some of the main features of a digitised book based on the OCR results of the digitised images. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Gander, L., Reffle, U., Ringlstetter, C., Schlarb, S., Schulz, K., & Unterweger, R. (2012). Facing uncertainty in digitisation. Studies in Fuzziness and Soft Computing, 273, 195–207. https://doi.org/10.1007/978-3-642-24672-2_10

Facing uncertainty in digitisation

Abstract

Cite

Register to see more suggestions