Combining visual and textual features for information extraction from online flyers

17Citations
Citations of this article
84Readers
Mendeley users who have this article in their library.

Abstract

Information in visually rich formats such as PDF and HTML is often conveyed by a combination of textual and visual features. In particular, genres such as marketing flyers and info-graphics often augment textual information by its color, size, positioning, etc. As a result, traditional text-based approaches to information extraction (IE) could underperform. In this study, we present a supervised machine learning approach to IE from online commercial real estate flyers. We evaluated the performance of SVM classifiers on the task of identifying 12 types of named entities using a combination of textual and visual features. Results show that the addition of visual features such as color, size, and positioning significantly increased classifier performance.

Cite

CITATION STYLE

APA

Apostolova, E., & Tomuro, N. (2014). Combining visual and textual features for information extraction from online flyers. In EMNLP 2014 - 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1924–1929). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/d14-1206

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free