Automatic Table Recognition and Extraction from Heterogeneous Documents

  • Babatunde F
  • Ojokoh B
  • Oluwadare S
N/ACitations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

This paper examines automatic recognition and extraction of tables from a large collection of heterogeneous documents. The heterogeneous documents are initially pre-processed and converted to HTML codes, after which an algorithm recognises the table portion of the documents. Hidden Markov Model (HMM) is then applied to the HTML code in order to extract the tables. The model was trained and tested with five hundred and twenty six self-generated tables (three hundred and twenty-one (321) tables for training and two hundred and five (205) tables for testing). Viterbi algorithm was implemented for the testing part. The system was evaluated in terms of accuracy, precision, recall and f-measure. The overall evaluation results show 88.8% accuracy, 96.8% precision, 91.7% recall and 88.8% F-measure revealing that the method is good at solving the problem of table extraction.

Cite

CITATION STYLE

APA

Babatunde, F. F., Ojokoh, B. A., & Oluwadare, S. A. (2015). Automatic Table Recognition and Extraction from Heterogeneous Documents. Journal of Computer and Communications, 03(12), 100–110. https://doi.org/10.4236/jcc.2015.312009

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free