Toward a Dataset-Agnostic Word Segmentation Method

9Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Word segmentation in documents is a critical stage towards word and character recognition, as well as word spotting. Despite recent advancements in word segmentation and object detection, detecting instances of words in a cluttered handwritten document remains a non-trivial task that requires a large amount of labeled documents for training. We present a flexible and general framework for word segmentation in handwritten documents, which incorporates techniques from the recent object detection literature as well as document analysis tools. Our method utilizes information that is relevant for word segmentation and ignores other highly variable information contained in a handwritten text, thus allowing for efficient transfer learning between datasets and alleviating the need for labeled training data. Our approach efficiently detects words in a variety of scanned document images, including historical handwritten documents and modern day handwritten documents, presenting excellent results on existing benchmarks. In addition, we demonstrate the usefulness of our approach by achieving state-of-the-art results for segmentation-free word spotting tasks.

Cite

CITATION STYLE

APA

Axler, G., & Wolf, L. (2018). Toward a Dataset-Agnostic Word Segmentation Method. In Proceedings - International Conference on Image Processing, ICIP (pp. 2635–2639). IEEE Computer Society. https://doi.org/10.1109/ICIP.2018.8451124

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free