Word and sentence extraction using irregular pyramid

15Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper presents the result of our continued work on a further enhancement to our previous proposed algorithm. Moving beyond the extraction of word groups and based on the same irregular pyramid structure the new proposed algorithm groups the extracted words into sentences. The uniqueness of the algorithm is in its ability to process text of a wide variation in terms of size, font, orientation and layout on the same document image. No assumption is made on any specified document type. The algorithm is based on the irregular pyramid structure with the application of four fundamental concepts. The first is the inclusion of background information. The second is the concept of closeness where text information within a group is close to each other, in terms of spatial distance, as compared to other text areas. The third is the "majority win" strategy that is more suitable under the greatly varying environment than a constant threshold value. The final concept is the uniformity and continuity among words belonging to the same sentence.

Cite

CITATION STYLE

APA

Loo, P. K., & Tan, C. L. (2002). Word and sentence extraction using irregular pyramid. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2423, pp. 307–318). Springer Verlag. https://doi.org/10.1007/3-540-45869-7_36

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free