Word embedding (i.e., word representation) transforms words into computable mathematical expressions (usually vectors) according to semantics. Compared with human semantic representation, these purely text-based models are severely deficient because they lack perceptual information attached to the physical world. This observation promotes the development of multimodal word representation models. Multimodal models have been proven to outperform text-based models on learning semantic word representations, and almost all previous multimodal models only focus on introducing perceptual information. However, it is obvious that syntactic information can effectively improve the performance of multimodal models on downstream tasks. Therefore, this article proposes an effective multimodal word representation model that uses two gate mechanisms to explicitly embed syntactic and phonetic information into multimodal representations and uses supervised learning to train the model. We select Chinese and English as examples and evaluate the model using several downstream tasks. The results show that our approach outperforms the existing models. We have made the source code of the model available to encourage reproducible research.
CITATION STYLE
Zhu, W., Liu, S., Liu, C., Yin, X., & Xv, X. (2020). Learning Multimodal Word Representations by Explicitly Embedding Syntactic and Phonetic Information. IEEE Access, 8, 223306–223315. https://doi.org/10.1109/ACCESS.2020.3042183
Mendeley helps you to discover research relevant for your work.