Abstract
In this paper we focus on cross-‐modal (visual and textual) e-‐commerce search within the fashion domain. Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query ; and 2) given a textual query that may express an interest in specific visual product characteristics , we retrieve relevant images that exhibit the required visual attributes . To this end , we introduce a new dataset that consists of 53 , 689 images coupled with textual descriptions . The images contain fashion garments that display a great variety of visual attributes , such as different shapes , colors and textures in natural language . Unlike previous datasets , the text provides a rough and noisy description of the item in the image . We extensively analyze this dataset in the context of cross - ‐modal e - ‐commerce search . We investigate two state - ‐of - ‐the - ‐art latent variable models to bridge between textual and visual data : bilingual latent Dirichlet allocation and canonical correlation analysis . We use state - ‐of - ‐the - ‐art visual and textual features and report promising results .
Cite
CITATION STYLE
Zoghbi, S., Heyman, G., Gomez, J. C., & Moens, M.-F. (2016). Fashion Meets Computer Vision and NLP at e-Commerce Search. International Journal of Computer and Electrical Engineering, 8(1), 31–43. https://doi.org/10.17706/ijcee.2016.8.1.31-43
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.