Mixing Deep Visual and Textual Features for Image Regression

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Deep learning has been widely applied in the regression problem. However, little work addressed both visual and textual features in one unit frame. In this paper, we are the first to consider the deep feature, shallow convolutional neural network (CNN) feature, and textual feature in one unit deep neural network. Specifically, we propose a mixing deep visual and textual features model (MVTs) to combine all three features in one architecture, which enables the model to predict the house price. To train our model, we also collected large scale data from Los Angeles of California state, USA, which contains both visual images and textual attributes of 1000 houses. Extensive experiments show that our model achieves higher performance than state of the art.

Cite

CITATION STYLE

APA

Wu, Y., & Zhang, Y. (2021). Mixing Deep Visual and Textual Features for Image Regression. In Advances in Intelligent Systems and Computing (Vol. 1250 AISC, pp. 747–760). Springer. https://doi.org/10.1007/978-3-030-55180-3_57

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free