We present the curve transformer (CurT), a novel method of direct baseline detection that models document text line detection as set prediction of cubic Bézier curves, simplifying the layout analysis pipeline by removing the need for the laboriously hand-crafted postprocessing algorithms that are necessary with the current state of the art. CurT combines multiple appealing features: direct prediction enabling processing of material that is ill-suited for the prevailing methods adapting semantic segmentation backbones, a conceptually simple Transformer-based encoder-decoder architecture that can be extended to additional tasks beyond baseline detection, and increased computational efficiency in comparison to older approaches. In addition, we demonstrate that CurT achieves metrics that are competitive with methods based on semantic segmentation. Training and inference code is available under Apache 2.0 license at https://github.com/mittagessen/curt.
CITATION STYLE
Kiessling, B. (2022). CurT: End-to-End Text Line Detection in Historical Documents with Transformers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 34–48). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_3
Mendeley helps you to discover research relevant for your work.