Multimodal Aspect Extraction with Region-Aware Alignment Network

17Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Fueled by the rise of social media, documents on these platforms (e.g., Twitter, Weibo) are increasingly multimodal in nature, with images in addition to text. To well automatically analyze the opinion information inside multimodal data, it’s crucial to perform aspect term extraction (ATE) on them. However, until now, the researches focus on multimodal ATE are rare. In this study, we take a step further than previous studies by proposing a Region-aware Alignment Network (RAN) that aligns text with object regions that show in an image for the multimodal ATE task. Experiments on the Twitter dataset showcase the effectiveness of our proposed model. Further researches prove that our model has better performance when extracting emotion polarized aspect terms.

Cite

CITATION STYLE

APA

Wu, H., Cheng, S., Wang, J., Li, S., & Chi, L. (2020). Multimodal Aspect Extraction with Region-Aware Alignment Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12430 LNAI, pp. 145–156). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60450-9_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free