Abstract
In this paper, we propose a novel method for multimodal word embedding, which exploit a generalized framework of multi-view spectral graph embedding to take into account visual appearances or scenes denoted by words in a corpus. We evaluated our method through word similarity tasks and a concept-to-image search task, having found that it provides word representations that reflect visual information, while somewhat trading-off the performance on the word similarity tasks. Moreover, we demonstrate that our method captures multimodal linguistic regularities, which enable recovering relational similarities between words and images by vector arithmetic.
Cite
CITATION STYLE
Fukui, K., Oshikiri, T., & Shimodaira, H. (2020). Spectral graph-based method of multimodal word embedding. In Proceedings of TextGraphs@ACL 2017: The 11th Workshop on Graph-Based Methods for Natural Language Processing (pp. 39–44). Association for Computational Linguistics. https://doi.org/10.18653/v1/w17-2405
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.