Learning Distributional Token Representations from Visual Features

2Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

In this study, we compare token representations constructed from visual features (i.e., pixels) with standard lookup-based embeddings. Our goal is to gain insight about the challenges of encoding a text representation from low-level features, e.g. from characters or pixels. We focus on Chinese, which-as a logographic language-has properties that make a representation via visual features challenging and interesting. To train and evaluate different models for the token representation, we chose the task of character-based neural machine translation (NMT) from Chinese to English. We found that a token representation computed only from visual features can achieve competitive results to lookup embeddings. However, we also show different strengths and weaknesses in the models' performance in a part-of-speech tagging task and also a semantic similarity task. In summary, we show that it is possible to achieve a text representation only from pixels. We hope that this is a useful stepping stone for future studies that exclusively rely on visual input, or aim at exploiting visual features of written language.

Cite

CITATION STYLE

APA

Broscheit, S., Gemulla, R., & Keuper, M. (2018). Learning Distributional Token Representations from Visual Features. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 187–194). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w18-3025

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free