From generic to specific deep representations for visual recognition

  • Azizpour H
  • Razavian A
  • Sullivan J
 et al. 
  • 142


    Mendeley users who have this article in their library.
  • 78


    Citations of this article.


Evidence is mounting that ConvNets are the best repre- sentation learning method for recognition. In the common scenario, a ConvNet is trained on a large labeled dataset and the feed-forward units activation, at a certain layer of the network, is used as a generic representation of an input image. Recent studies have shown this form of representation to be astoundingly effective for a wide range of recognition tasks. This paper thoroughly investigates the transferability of such representations w.r.t. several factors. It includes parameters for training the network such as its architecture and parameters of feature extraction. We further show that different visual recognition tasks can be categorically ordered based on their distance from the source task. We then show interesting results indicating a clear correlation between the performance of tasks and their distance from the source task conditioned on proposed factors. Furthermore, by optimizing these factors, we achieve state- of-the-art performances on 16 visual recognition tasks.

Author-supplied keywords

  • Computer vision
  • Image recognition
  • Positron emission tomography
  • Standards
  • Sun
  • Training
  • Visualization

Get free article suggestions today

Mendeley saves you time finding and organizing research

Sign up here
Already have an account ?Sign in

Find this document


Cite this document

Choose a citation style from the tabs below

Save time finding and organizing research with Mendeley

Sign up for free