Mixing Deep Visual and Textual Features for Image Regression

Yuying Wu; Youshan Zhang

Conference Proceedings

Mixing Deep Visual and Textual Features for Image Regression

Advances in Intelligent Systems and Computing (2021) 1250 AISC 747-760

DOI: 10.1007/978-3-030-55180-3_57

1Citations

3Readers

Get full text

Abstract

Deep learning has been widely applied in the regression problem. However, little work addressed both visual and textual features in one unit frame. In this paper, we are the first to consider the deep feature, shallow convolutional neural network (CNN) feature, and textual feature in one unit deep neural network. Specifically, we propose a mixing deep visual and textual features model (MVTs) to combine all three features in one architecture, which enables the model to predict the house price. To train our model, we also collected large scale data from Los Angeles of California state, USA, which contains both visual images and textual attributes of 1000 houses. Extensive experiments show that our model achieves higher performance than state of the art.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, Y., & Zhang, Y. (2021). Mixing Deep Visual and Textual Features for Image Regression. In Advances in Intelligent Systems and Computing (Vol. 1250 AISC, pp. 747–760). Springer. https://doi.org/10.1007/978-3-030-55180-3_57

Mixing Deep Visual and Textual Features for Image Regression

Abstract

Author supplied keywords

Cite

Register to see more suggestions