A rule-based method for table detection in website images

12Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Table detection is an essential part of a document analysis because tables are among the most efficient methods for systematically summarizing information. Therefore, numerous studies on detecting tables not only from documents but also from websites have been conducted. Although, the number of websites has been growing explosively recently, most of these studies suffer from detecting tables which are image types rather than tagging due to the variability of size, contents, color, and shapes. In this paper, we propose an efficient yet robust method for detecting tables in image formats, which can apply to both documents and websites. Instead of employing recently developed deep learning methods, which require extensive training for diversity, we apply a rule-based detection method by using key features of many tables, namely, the grid format of the text provided in the tables. The proposed method consists of two stages: a feature extraction stage and a grid pattern recognition stage. In the first stage, we extract the features of the contents in the tables. We then remove the features of non-text objects and texts not included in tables. In the second stage, we build tree structures from the features and apply a novel algorithm for determining the grid pattern. When we applied our method to a website dataset, the experimental results showed a precision, recall, and F1-measure of 84.5%, 72%, and 0.778, which are improvements of 3.6%, 24.16%, and 0.276 over a previous method, respectively, while also achieving the fastest processing time. In addition, the proposed rule-based method allows the structure of the contents in the table to be easily restored.

Cite

CITATION STYLE

APA

Kim, J., & Hwang, H. (2020). A rule-based method for table detection in website images. IEEE Access, 8, 81022–81033. https://doi.org/10.1109/ACCESS.2020.2990901

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free