Research on feature extraction and multimodal fusion of video caption based on deep learning

Hongjun Chen; Hengyi Li; Xueqin Wu

Conference ProceedingsOPEN ACCESS

Research on feature extraction and multimodal fusion of video caption based on deep learning

ACM International Conference Proceeding Series (2020) 73-76

DOI: 10.1145/3380625.3380669

0Citations

6Readers

Get full text

Abstract

Video Caption shows the objects, attributes and their relationship in natural language. It has been a very challenging research topic in the field of computer and multimedia. In this paper, the method of deep learning is used to extract the video frame feature, motion information, video sequence feature. And the multi-modal feature fusion method: feature cascade, model weighted average fusion are studied, and then the valuation is also studied. The experimental results show that the score of each evaluation in the model of weighted average fusion method is higher than that of the feature cascade method. The feature extraction methods and multimodal fusion methods in this paper have certain value for the application of video caption.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, H., Li, H., & Wu, X. (2020). Research on feature extraction and multimodal fusion of video caption based on deep learning. In ACM International Conference Proceeding Series (pp. 73–76). Association for Computing Machinery. https://doi.org/10.1145/3380625.3380669

Research on feature extraction and multimodal fusion of video caption based on deep learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions