Learning to tag
- ISSN: 08963207
- ISBN: 9781605584874
- DOI: 10.1145/1526709.1526758
Abstract
Social tagging provides valuable and crucial information for large-scale web image retrieval. It is ontology-free and easy to obtain; however, irrelevant tags frequently appear, and users typically will not tag all semantic objects in the image, which is also called semantic loss. To avoid noises and compensate for the semantic loss, tag recommendation is proposed in literature. However, current recommendation simply ranks the related tags based on the single modality of tag co-occurrence on the whole dataset, which ignores other modalities, such as visual correlation. This paper proposes a multi-modality recommendation based on both tag and visual correlation, and formulates the tag recommendation as a learning problem. Each modality is used to generate a ranking feature, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities. Experiments on Flickr data demonstrate the effectiveness of this learning-based multi-modality recommendation strategy.
Author-supplied keywords
Learning to tag
Lei Wu∗
MOE-MS KeyLab of MCC
University of Science and
Technology of China
leiwu@live.com
Linjun Yang
Microsoft Research Asia
49 Zhichun Road,Beijing
100190, China
linjuny@microsoft.com
Nenghai Yu
MOE-MS KeyLab of MCC
University of Science and
Technology of China
ynh@ustc.edu.cn
Xian-Sheng Hua
Microsoft Research Asia
49 Zhichun Road,Beijing
100190, China
xshua@microsoft.com
ABSTRACT
Social tagging provides valuable and crucial information for
large-scale web image retrieval. It is ontology-free and easy
to obtain; however, irrelevant tags frequently appear, and
users typically will not tag all semantic objects in the im-
age, which is also called semantic loss. To avoid noises and
compensate for the semantic loss, tag recommendation is
proposed in literature. However, current recommendation
simply ranks the related tags based on the single modality of
tag co-occurrence on the whole dataset, which ignores other
modalities, such as visual correlation. This paper proposes
a multi-modality recommendation based on both tag and
visual correlation, and formulates the tag recommendation
as a learning problem. Each modality is used to generate a
ranking feature, and Rankboost algorithm is applied to learn
an optimal combination of these ranking features from dif-
ferent modalities. Experiments on Flickr data demonstrate
the effectiveness of this learning-based multi-modality rec-
ommendation strategy.
Categories and Subject Descriptors
H.3.1 [Information Storage and Retrieval]: Content
Analysis and Indexing-indexing methods; H.2.8 [Database
Applications]: Image databases
General Terms
Algorithms, Theory, Experimentation
Keywords
Tag recommendation; Learning to tag; multi-modality Rank-
boost; social tagging
1. INTRODUCTION
With the advance of Web2.0 technology, multimedia con-
tent creation and distribution are much easier than ever
∗This work was performed when Lei Wu was visiting Mi-
crosoft Research Asia as a research intern.
Copyright is held by the International World Wide Web Conference Com-
mittee (IW3C2). Distribution of these papers is limited to classroom use,
and personal use by others.WWW 2009, April 20–24, 2009, Madrid, Spain.
ACM 978-1-60558-487-4/09/04.
before [6]. Along with the proliferation of images on the
World-Wide-Web, effective image search approaches to ob-
tain targeted images have gradually become an urgent de-
mand. Currently, the performance of Web image search
mainly depends on the quality of the image annotations
or keywords (tags). Some methods automatically generate
metadata by analyzing the image content, or the surround-
ing text on the webpages; while others generate these textual
metadata by manual tagging. Most recently, social tagging
has become a popular means to annotate Web images.
Although the automatic creation of metadata costs lit-
tle human effort, the result of these statistical model based
automatic methods are generally unsatisfying [14][1]. Espe-
cially on web images, which are quite noisy. To improve the
performance of the automatic annotation, some approaches
combine both image content analysis and the surrounding
text on the image’s webpages, e.g., [11][20][16][19]. These
methods obtain some improvements over the purely content
based methods, but they are still unacceptable for practical
use.
The manual metadata generation is relatively more ac-
curate and practical than the automatic annotation. The
manual metadata generation is mainly based on the idea
of ontology based labeling, which firstly defines an ontol-
ogy and then let users label the web resources using the se-
mantic markups in the ontology. There are also some work
to mitigate the manually labeling work by semi-automatic
annotation [5]. Although these ontology based annotation
work is successful in some applications, e.g. bioinformatics
and knowledge management, there are several limitations.
Firstly, to build a semantic ontology that covers sufficient
descriptions for multimedia content itself is expensive, time
consuming and often requires domain knowledge [15]. Sec-
ondly, ontology based annotation usually requires users fa-
miliar with the ontology, which is too complicated for anyone
without specialized training and knowledge.
Recently, a promising approach for manual metadata gen-
eration is social tagging, which requires all the users in the
social network label the web resources with their own key-
words and share with others. This labeling operation is
named “tagging”. Different from ontology based annotation;
there is no pre-defined ontology or taxonomy in social tag-
ging. Thus this task is more convenient for ordinary users.
Social tagging has currently attracted huge amount of web
WWW 2009 MADRID! Track: Rich Media / Session: Tagging and Clustering
361
million images from Flickr.com. There are totally
1,300 million tags. Around 1% of the tags appearing
more than 20,000 times, which contain little infor-
mation. Around 5.82% of the tags have appeared
more than 5,000 in the collection, which are consid-
ered as popular tags. 33.21% of the tags appears
more than 50 and less than 5,000 times, which are
defined as specific tags. 60% of the tags have ap-
peared less than 50 times
users and effectively helps the organization of web resources.
This strategy is adopted by some famous websites (e.g. De-
licious, Flickr). This organic system of organization is also
called “folksonomy”.
Although social tagging is easy to perform, there are also
some drawbacks. Firstly, it suffers polysemy and synonyms
problem. As the users can use their own words to tag the
images, different users may tag similar images with different
words. So when querying “sea”, one may not find images
tagged “ocean” which represents the same concept. On the
other hand, it is difficult for the users to input all the tags
of the equivalent meaning. For this reason, lots of images
may not be effectively retrieved. Secondly, ambiguity is also
a problem. Users may use a general tag to represent differ-
ent things. For example, when an image is tagged “apple”,
maybe it refers to the fruit “apple”, or it could refer to the
corporation or the product. In general, it is also quite diffi-
cult for the web users to realize the existence of ambiguity
when tagging if they did not think of or even know the other
meanings of the query. With these ambiguous tags, lots of
irrelevant images may be retrieved.
To tackle the above problems, some researchers proposed
the query expansion and suggestion [9][23], which extend
the query to some related words to make the intention more
clear. However, it does not completely eliminate the syn-
onymy and tag ambiguity problems. The information in the
query is limited, and the query expansion frequently can-
not compensate the semantic loss in the tagging process,
when users may ignore some semantic objects in the im-
ages. Recently, Xirong et al. [10] proposed the neighbor
voting algorithm for image retrieval, which tried to predict
the relevance of the user contributed tags. However the
similarity between individual images is itself an open and
complex problem. In this paper, we propose to tackle the
semantic loss problem during the tagging process by com-
bining both visual correlation in concept level and tag co-
occurrence information. The semantically or visually related
tags are recommended to the users to improve the tagging
quality. The recommendation system will remind the users
of the alternative tags and it can also help clarify the true
semantic of the images. For example, when the user tags an
image with word “sea”, the recommendation system will list
more rich and precise tags based on the input tags, such as
“ocean”, “water”, “wave”, etc. These recommendations will
help users clarify the image content as well as reminding
them of related semantics which may otherwise be ignored.
The quality of tag recommendation is quite important
to social tagging and the consequent performance of im-
age search. Firstly, high quality tag recommendation will
motivate users to contribute more useful tags to an image
[13]. The average number of tags for each image on Flickr
is relatively small [2]. One of the reasons for that the users
did not make large amount of tags is that they generally
cannot think of too many words [17] in a short moment and
few people would like to spend much time thinking about
the alternative tags or more precise tags. With the help of
high quality tag recommendation, users can provide a lot
of useful tags in a short time. Also the spelling errors can
be effectively avoided. Thus the average number of correct
tags for each image is expected to increase. Secondly, tag
recommendation will remind the users of more rich and spe-
cific tags. The distribution of tags on Flickr follows a power
law distribution (1). Most of the users only use the popular
keywords, which are only 5.82% of the whole tag collection.
These tags are popular because they are common vocabu-
lary and easily come to mind. Another 33.21% of the tags
which appear 50-5,000 times are also informative while gen-
erally ignored by most users, because these words are more
professional terms or only used for specific object or situa-
tions. The tag recommendation will help remind the user to
use both popular and specific tags for social tagging. This
reminder also helps create more precise tags. Thirdly, tag
recommendation can depress the noise in social tagging sys-
tem. It shows in the tag distribution that there are around
60% of tags in the tag corpus are misspelling or meaningless
words. With the help of tag recommendation, users can tag
an image by choosing rather than typing, which effectively
avoids these spelling errors.
Existing tag recommendation approaches are performed
by ranking the related tags based on the tag co-occurrence
information. Much information is ignored in these meth-
ods, such as the visual correlation between tags, and the
image content. A better choice is to use correlation from
multi-modalities, such as tag co-occurrence, correlation be-
tween tag related images, the content of the target image,
etc. However, it is not easy to combine these multi-modality
correlations, since these modalities should be weighted dif-
ferently for different samples. The basic idea of this paper
is to learn an optimal combination of the multi-modality
correlations to generate a ranking function for tag recom-
mendation. Given the image and one or more initial tags,
the algorithm will rank and sort the rest of the tags based
on the tag correlation from each modality. Each is taken as
a weak ranker. Then Rankboost[7] is adopted to combine
weak rankers and form a better ranking function. Users
can click the tags on the ranking list to annotate the im-
age. After each click, the algorithm will update the ranking
function as well as the tag recommendation function. Since
the recommendation is based on the multi-modality corre-
lations and is dependent on the ever-increasing tags in the
database, it seems the users are using an selected ontology
for tagging. The proposed method actually regularizes the
WWW 2009 MADRID! Track: Rich Media / Session: Tagging and Clustering
362
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


