Separation of foreground text from noisy or textured background is an important preprocessing step for many document image processing problems. In this work we focus on decorated background removal and the extraction of textual components from French university diploma. As far as we know, this is the very first attempt to resolve this kind of problem on French university diploma images. Hence, we make our dataset public for further research, related to French university diplomas. Although, this problem is similar to a classical document binarization problem, but we have experimentally observed that classical and recent state of the art binarization techniques fail due to the different complexity of our dataset. In French diplomas, the text is superimposed on decorated background and there is only a small difference of intensities between the character borders and the overlapped background. So, we propose an approach for the separation of textual and non-textual components, based on Fuzzy C-Means clustering. After obtaining clustered pixels, a local window based thresholding approach and the Savoula binarization technique is used to correctly classify pixels, into the category of text pixels. Experimental results show convincing accuracy and robustness of our method.
CITATION STYLE
Mondal, T., Coustaty, M., Gomez-Krämer, P., & Ogier, J. M. (2020). Background removal of french university diplomas. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12116 LNCS, pp. 182–196). Springer. https://doi.org/10.1007/978-3-030-57058-3_14
Mendeley helps you to discover research relevant for your work.