Deep learning has been widely applied in the regression problem. However, little work addressed both visual and textual features in one unit frame. In this paper, we are the first to consider the deep feature, shallow convolutional neural network (CNN) feature, and textual feature in one unit deep neural network. Specifically, we propose a mixing deep visual and textual features model (MVTs) to combine all three features in one architecture, which enables the model to predict the house price. To train our model, we also collected large scale data from Los Angeles of California state, USA, which contains both visual images and textual attributes of 1000 houses. Extensive experiments show that our model achieves higher performance than state of the art.
CITATION STYLE
Wu, Y., & Zhang, Y. (2021). Mixing Deep Visual and Textual Features for Image Regression. In Advances in Intelligent Systems and Computing (Vol. 1250 AISC, pp. 747–760). Springer. https://doi.org/10.1007/978-3-030-55180-3_57
Mendeley helps you to discover research relevant for your work.