Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Word embedding (i.e., word representation) transforms words into computable mathematical expressions (usually vectors) according to semantics. Compared with human semantic representation, these purely text-based models are severely deficient because they lack perceptual information attached to the physical world. This observation promotes the development of multimodal word representation models. Multimodal models have been proven to outperform text-based models on learning semantic word representations, and almost all previous multimodal models only focus on introducing perceptual information. However, it is obvious that syntactic information can effectively improve the performance of multimodal models on downstream tasks. Therefore, this article proposes an effective multimodal word representation model that uses two gate mechanisms to explicitly embed syntactic and phonetic information into multimodal representations and uses supervised learning to train the model. We select Chinese and English as examples and evaluate the model using several downstream tasks. The results show that our approach outperforms the existing models. We have made the source code of the model available to encourage reproducible research.

Cite

CITATION STYLE

APA

Zhu, W., Liu, S., Liu, C., Yin, X., & Xv, X. (2020). Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information. IEEE Access, 8, 223306–223315. https://doi.org/10.1109/ACCESS.2020.3042183

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free