Existing image search engines, whose ranking functions are built based on labeled images or wrap texts, have poor results on queries in new, or low-frequency keywords. In this paper, we put forward the zero-shot transfer learning (ZSTL), which aims to transfer networks from given classifiers to new zero-shot classifiers with little cost, and helps image searching perform better on new or low-frequency words. Content-based queries (i.e., ranking images was not only based on their visual looks but also depended on their contents) can also be enhanced by ZSTL. ZSTL was proposed after we found the resemblance between photographic composition and the description of objects in natural language. Both composition and description highlight the object by stressing the particularity, so we consider that there exists a resemblance between visual and textual space. We provide several ways to transfer from visual features into textual ones. The method of applying deep learning and Word2Vec models to Wikipedia yielded impressive results. Our experiments present evidence to support the existence of resemblance between composition and description and show the feasibility and effectiveness of transferring zero-shot classifiers. With these transferred zero-shot classifiers, problems of image ranking query with low-frequency or new words can be solved. The image search engine proposed adopts cosine distance ranking as the ranking algorithm. Experiments on image searching show the superior performance of ZSTL.
CITATION STYLE
Yang, G., & Xu, J. (2019). Zero-shot transfer learning based on visual and textual resemblance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11955 LNCS, pp. 353–362). Springer. https://doi.org/10.1007/978-3-030-36718-3_30
Mendeley helps you to discover research relevant for your work.