Collective deep quantization for efficient cross-modal retrieval

93Citations
Citations of this article
75Readers
Mendeley users who have this article in their library.

Abstract

Cross-modal similarity retrieval is a problem about designing a retrieval system that supports querying across content modalities, e.g., using an image to retrieve for texts. This paper presents a compact coding solution for efficient cross-modal retrieval, with a focus on the quantization approach which has already shown the superior performance over the hashing solutions in single-modal similarity retrieval. We propose a collective deep quantization (CDQ) approach, which is the first attempt to introduce quantization in end-to-end deep architecture for cross-modal retrieval. The major contribution lies in jointly learning deep representations and the quantizers for both modalities using carefully-crafted hybrid networks and well-specified loss functions. In addition, our approach simultaneously learns the common quantizer codebook for both modalities through which the crossmodal correlation can be substantially enhanced. CDQ enables efficient and effective cross-modal retrieval using inner product distance computed based on the common codebook with fast distance table lookup. Extensive experiments show that CDQ yields state of the art cross-modal retrieval results on standard benchmarks.

Cite

CITATION STYLE

APA

Cao, Y., Long, M., Wang, J., & Liu, S. (2017). Collective deep quantization for efficient cross-modal retrieval. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 3974–3980). AAAI press. https://doi.org/10.1609/aaai.v31i1.11218

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free