Content-Based Clustering for Tag Cloud Visualization
- ISBN: 9780769536897
- DOI: 10.1109/ASONAM.2009.19
Abstract
Social tagging systems are becoming an interesting way to retrieve web information from previously annotated data. These sites present a tag cloud made up by the most popular tags, where neither tag grouping nor their corresponding content is considered. We present a methodology to obtain and visualize a cloud of related tags based on the use of self-organizing maps, and where the relations among tags are established taking into account the textual content of tagged documents. Each map unit can be represented by the most relevant terms of the tags it contains, so that it is possible to study and analyze the groups as well as to visualize and navigate through the relevant terms and tags.
Author-supplied keywords
Content-Based Clustering for Tag Cloud Visualization
Arkaitz Zubiaga, Alberto P. Garcı´a-Plaza, Vı´ctor Fresno, Raquel Martı´nez
Dpto. Lenguajes y Sistemas Informa´ticos
Universidad Nacional de Educacio´n a Distancia
Madrid, Spain
{azubiaga, alpgarcia, vfresno, raquel}@lsi.uned.es
Abstract—Social tagging systems are becoming an interesting
way to retrieve web information from previously annotated
data. These sites present a tag cloud made up by the most
popular tags, where neither tag grouping nor their corre-
sponding content is considered. We present a methodology to
obtain and visualize a cloud of related tags based on the use of
self-organizing maps, and where the relations among tags are
established taking into account the textual content of tagged
documents. Each map unit can be represented by the most
relevant terms of the tags it contains, so that it is possible
to study and analyze the groups as well as to visualize and
navigate through the relevant terms and tags.
Keywords-social-tagging; clustering; information access; vi-
sualization;
I. INTRODUCTION
In social bookmarking sites people can post and tag
already posted content with their preferred tags, so it could
be expected that the more users describe an item, the more
representative is its tag set. In this context, several methods
and approaches have been proposed to improve tasks such
as: search and navigation strategies and results, tag cloud
visualization, and recall and precision in feed subscription
services, etc. All of them consider tag co-occurrence to
organize related tags into clusters or groups, whereas some
of them use extra information from the users, additional
resources and the Semantic Web. As far as we know, there
are no works using textual content of the annotated web
documents to extract the relations among tags.
In this paper, we present a methodology to identify inter-
related tags based on the textual content of tagged web
documents by means of Self-Organizing Maps (SOM). It
allows social tagging sites to suggest tags based on the
neighborhood in the map, as well as to improve feed
subscription services for related tag sets and the extraction of
the most relevant terms for each group of tags by means of
language modeling techniques. Therefore, the resulting SOM
turns into a richer tag cloud that provides an alternative way
to visualize and navigate through tags and terms.
This paper is organized as follows. Section II reviews
different methods and approaches to identify inter-related
tags. In Section III, we introduce the dataset generated
for this work, explaining how our methodology works and
the algorithms and techniques used. Section IV shows and
analyzes experimental results after tag clustering. Finally,
in Section V, the main conclusions of this approach are
presented, and future work is proposed.
II. RELATED WORK
Several methods and approaches have been proposed to
identify inter-related tags, considering tag co-occurrence to
organize related tags into clusters ([7], [2]). In [7], the
author obtains a subsumption based model derived from the
co-occurrence of tags to find groups of related tags from
Flickr. In [2], they build an undirected graph representing
the tag space, where the vertices correspond to tags, and
edges between them represent their co-occurrence frequency.
They obtain clusters of related tags, but since some clusters
are too large, they apply a spectral clustering algorithm to
refine them. In [10], use information from the co-occurrence
of tags, resources and users in a probabilistic model to
generate groups of semantically related tags. [5] uses a
tripartite model involving users (actors), tags (concepts)
and resources (instances of concepts) and builds graphs
relating tags with both users and resources. Other works
try to identify semantic relations using ontologies [1] and
the Semantic Web. In [8] the authors derive meaningful
groups of tags corresponding to concepts in ontologies by
means of co-occurrence analysis and clustering techniques.
The relations within tags in each cluster are discovered
by combining the Semantic Web and resources such as
Wikipedia or Google. Based on this approach, [1] only
rely on online ontologies to obtain semantic enrichment of
folksonomy tags.
In addition to these works, most of the tagging systems
provide functionalities to show groups or clusters and rela-
tions among tags, which apparently rely on co-occurrence
information and clustering techniques, but do not provide
detailed information about the methodologies they use.
III. OUR METHODOLOGY
Present work introduces a methodology to organize and
visualize the tag cloud making it easier to analyze relations
between tags and their content. Our methodology involves
several steps: a) Compilation of a dataset and selection
of relevant tags; b) Tag representation based on tagged
documents content; c) Clustering with SOMs to organize
2009 Advances in Social Network Analysis and Mining
978-0-7695-3689-7/09 $25.00 © 2009 IEEE
DOI 10.1109/ASONAM.2009.19
316
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


