Usage patterns of collaborative t...
Usage patterns of collaborative tagging systems 198 Journal of Information Science, 32 (2) 2006, pp. 198���208 �� CILIP, DOI: 10.1177/0165551506062337 Scott A. Golder and Bernardo A. Huberman Information Dynamics Lab, HP Labs, Palo Alto, CA, USA Received 31 August 2005 Revised 13 October 2005 Abstract. Collaborative tagging describes the process by which many users add metadata in the form of keywords to shared content. Recently, collaborative tagging has grown in popu- larity on the web, on sites that allow users to tag bookmarks, photographs and other content. In this paper we analyze the structure of collaborative tagging systems as well as their dynamic aspects. Specifically, we discovered regularities in user activity, tag frequencies, kinds of tags used, bursts of popularity in bookmarking and a remarkable stability in the relative proportions of tags within a given URL. We also present a dynamic model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge. Keywords: collaborative tagging folksonomy Del.icio.us bookmarks web sharing 1. Introduction Marking content with descriptive terms, also called keywords or tags, is a common way of organizing content for future navigation, filtering or search. Though organizing electronic content this way is not new, a collaborative form of this process, which has been given the name ���tagging��� by its proponents, is gaining popularity on the web. Document repositories or digital libraries often allow documents in their collections to be organized by assigned keywords. However, traditionally such cat- egorizing or indexing is either performed by an author- ity, such as a librarian, or else derived from the material provided by the authors of the documents [1]. In contrast, collaborative tagging is the practice of allowing anyone ��� especially consumers ��� to freely attach keywords or tags to content. Collaborative tagging is most useful when there is nobody in the ���librarian��� role or there is simply too much content for a single author- ity to classify both of these traits are true of the web, where collaborative tagging has grown popular. This kind of collaborative tagging offers an interest- ing alternative to current efforts at semantic web ontologies [2] which have been a focus of research by a number of groups (e.g. [3]). A number of now prominent web sites feature collab- orative tagging. Typically, such sites allow users to publicly tag and share content, so that they can not only categorize information for themselves, they can also browse the information categorized by others. There are therefore at once both personal and public aspects to collaborative tagging systems. In some sites, collaborative tagging is also known as ���folksonomy,��� short for ���folk taxonomy ��� however, there is some debate whether this term is accurate [4], and so we avoid using it here. Del.icio.us, the site on which we performed our analysis, allows for the collaborative tagging of shared website bookmarks [5]. Yahoo���s MyWeb [6] does this as well, and CiteULike [7] and Connotea [8] do the same for references to academic publications. Some services allow users to tag, but only content they own for example, Technorati [9] allows one to tag one���s weblog posts. Though some sites do not, strictly speaking, Correspondence to: Scott A. Golder, HP Labs, 1501 Page Mill Rd ms 1139, Palo Alto, CA 94304, USA. E-mail: scott.golder@hp.com
S.A. GOLDER AND B.A. HUBERMAN support collaborative tagging, we mention this one to illustrate the growth of tagging in a variety of media. In this paper we analyze the structure of collabora- tive tagging systems as well as their dynamic aspects. Specifically, through the study of the collaborative tagging system Del.icio.us, we are able to discover reg- ularities in user activity, tag frequencies, kinds of tags used and bursts of popularity in bookmarking, as well as a remarkable stability in the relative proportions of tags within a given URL. We also present a dynamic model of collaborative tagging that predicts these stable patterns and relates them to imitation and shared knowledge. We conclude with a discussion of potential uses of the data that users of these systems collabora- tively generate. 2. Tagging and taxonomy Proponents of collaborative tagging, typically in the weblogging community, often contrast tagging-based systems with taxonomies. While the latter are hier- archical and exclusive, the former are non-hierarchical and inclusive. Familiar taxonomies include the Linnaean system of classifying living things, the Dewey Decimal classification for libraries, and computer file systems for organizing electronic files. In such systems, each animal, book, file and so on, is in one unambigu- ous category which is in turn within a yet more general one. For example, lions and tigers fall in the genus panthera, and domestic cats in the genus felis, but panthera and felis are both part of family felidae, of which lions, tigers and domestic cats are all part. Simi- larly, books on Africa���s geography are in the Dewey Decimal system category 916 and books on South America���s in 918, but both are subsumed by the 900 category, covering all topics in geography. In contrast, tagging is neither exclusive nor hier- archical and therefore can in some circumstances have an advantage over hierarchical taxonomies. For example, consider a hypothetical researcher who down- loads an article about cat species native to Africa. If the researcher wanted to organize all her downloaded articles in a hierarchy of folders, there are several hypo- thetical options, of which we consider four: (1) c:\articles\cats all articles on cats (2) c:\articles\africa all articles on Africa (3) c:\articles\africa\cats all articles on African cats (4) c:\articles\cats\africa all articles on cats from Africa Each choice reflects a decision about the relative importance of each characteristic. Folder names and levels are in themselves informative, in that, like tags, they describe the information held within them [10]. Folders like (1) and (2) make central the fact that the folders are about ���cats��� and ���africa��� respectively, but elide all information about the other category. Folders (3) and (4) organize the files by both categories, but establish the first as primary or more salient, and the second as secondary or more specific. However, looking in (3) for a file in (4) will be fruitless, and so checking multiple locations becomes necessary. Despite these limitations, there are several good reasons to impose such a hierarchy. Though there can be too many folders in a hierarchy, especially one created haphazardly, an efficiently organized file hier- archy neatly and unambiguously bounds a folder���s contents. Unlike a keyword-based search, wherein the seeker cannot be sure that a query has returned all relevant items, a folder hierarchy assures the seeker that all the files it contains are in one stable place. In contrast to a hierarchical file system, a non-exclus- ive, flat tagging system could, unlike the system described above, identify such an article as being about a great variety of things simultaneously, including africa and cats, as well as animals more generally, and cheetahs, more specifically. Like a Venn diagram, the set of all the items marked cats and those marked africa would intersect in pre- cisely one way, namely, those documents that are tagged as being about African cats. Even this is not perfect, however. For example, a document tagged only cheetah would not be found in the intersection of africa and cats, though it arguably ought to be like the foldering example above, a seeker may still need to search multiple locations. Looking at it another way, tagging is like filtering out of all the possible documents (or other items) that are tagged, a filter (i.e. a tag) returns only those items tagged with that tag. Depending on the implementation and query, a tagging system can, instead of providing the intersection of tags (thus, filtering), provide the union of tags that is, all the items tagged with any of the given tags, rather than all of them. From a user perspective, navigating a tag system is similar to conducting keyword-based searches regardless of the implementation, users are providing salient, descrip- tive terms in order to retrieve a set of applicable items. 2.1. Semantic and cognitive aspects of classification Both tagging systems and taxonomies are beset by many problems that exist as a result of the necessarily imperfect, yet natural and evolving process of creating 199 Journal of Information Science, 32 (2) 2006, pp. 198���208 �� CILIP, DOI: 10.1177/0165551506062337