Joint multi-label multi-instance learning for image classification
- ISSN: 10636919
- ISBN: 9781424422425
- DOI: 10.1109/CVPR.2008.4587384
Abstract
In real world, an image is usually associated with multiple labels which are characterized by different regions in the image. Thus image classification is naturally posed as both a multi-label learning and multi-instance learning problem. Different from existing research which has considered these two problems separately, we propose an integrated multi- label multi-instance learning (MLMIL) approach based on hidden conditional random fields (HCRFs), which simultaneously captures both the connections between semantic labels and regions, and the correlations among the labels in a single formulation. We apply this MLMIL framework to image classification and report superior performance compared to key existing approaches over the MSR Cambridge (MSRC) and Corel data sets.
Joint multi-label multi-instance learning for image classification
∗Zheng-Jun Zha† Xian-Sheng Hua‡ Tao Mei‡ Jingdong Wang‡ Guo-Jun Qi† Zengfu Wang†
†Department of Automation ‡Internet Media Group
† University of Science and Technology of China ‡Microsoft Research Asia
Abstract
In real world, an image is usually associated with multi-
ple labels which are characterized by different regions in the
image. Thus image classification is naturally posed as both
a multi-label learning and multi-instance learning problem.
Different from existing research which has considered these
two problems separately, we propose an integrated multi-
label multi-instance learning (MLMIL) approach based on
hidden conditional random fields (HCRFs), which simulta-
neously captures both the connections between semantic la-
bels and regions, and the correlations among the labels in
a single formulation. We apply this MLMIL framework to
image classification and report superior performance com-
pared to key existing approaches over the MSR Cambridge
(MSRC) and Corel data sets.
1. Introduction
With the proliferation of digital photography, image un-
derstanding becomes increasingly important. Image seman-
tic understanding is typically formulated as a multi-class or
multi-label learning problem. In multi-class setting [18],
each image will be categorized into one and only one of a
set of predefined categories. In other words, only one label
will be assigned on each image in this setting. In multi-
label setting [1] [13] [16] [9], which is more challenging
but much closer to real world applications, each image will
be assigned with one or multiple labels from a predefined
label set, such as “sky,” “mountain,” and “water,” illustrated
in Figure 1. This paper is about multi-label learning (MLL)
for image classification.
Multi-label classification can be solved by transferring
it into a set of independent two-class (binary) classification
problems [1], while more sophisticated solutions also lever-
age the correlations of the labels (either after modeling each
individual label [9] or modeling the labels and the correla-
tions among labels simultaneously [13] [16]). However, all
∗This work was performed when Zheng-Jun Zha and Guo-Jun Qi were
visiting Microsoft Research Asia as research interns.
these approaches regard an image as one indiscrete entity
and neglect the fact that mostly each individual label of the
image is actually more closely related to one or more re-
gions instead of the entire image. In other words, the multi-
ple semantic meanings (labels) of an image arise from dif-
ferent components (regions) in it. As illustrated in Figure 1,
the three labels “sky,” “mountain,” and “water” are charac-
terized by three different regions, respectively, rather than
the entire image.
Modeling the relations between labels and regions (in-
stead of the entire image) will reduce the noises in the cor-
responding feature space, and hence the learned seman-
tic models will be more accurate. To address this issue,
many researchers formulate image classification as a multi-
instance learning (MIL) task. In MIL, an image is viewed
as a bag, which contains a number of instances correspond-
ing to the regions in the image [3] [20] [7] [19]. If any of
these instances is related to a label, the image will be asso-
ciated with the label. However, these methods mainly fo-
cus on single-label scenario and multi-label problems need
to be implemented label-by-label independently. That is to
say, the label correlations are not taken into account in these
MIL-based classification methods. However, researchers
have proved that exploiting label correlations will signifi-
cantly improve the performance of image classification [13]
[16].
To address the above issues of existing MLL and MIL
approaches, in this paper, we formulate image classifica-
tion as a joint multi-label multi-instance learning (MLMIL)
problem. Different from existing research which has not si-
multaneously considered the multi-label and multi-instance
problems, we model them in an integrated framework by
capturing both the connections between semantic labels and
regions, as well as the correlations among the labels in a sin-
gle formulation. Moreover, the proposed framework is also
able to capture other dependencies among the regions, such
as the spatial relations. Figure 1 illustrates the comparison
of MLL, MIL and MLMIL in terms of the modeled rela-
tions.
There is an initial attempt to address this problem [22].
However, as to be detailed, in that work this problem is
978-1-4244-2243-2/08/$25.00 ©2008 IEEE
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


