Understanding weblog communities through digital traces : a framework , a tool and an example
- ISSN: 03029743
- DOI: 10.1007/11915034_51
Abstract
Often research on online communities could be compared to archaeology16 : researchers look at patterns in digital traces that members leave to characterise the community they belong to. Relatively easy access to these traces and a growing number of methods and tools to collect and analyse them make such analysis increasingly attractive. However, a researcher is faced with the difficult task of choosing which digital artefacts and which relations between them should be taken into account, and how the findings should be interpreted to say something meaningful about the community based on the traces of its members. In this paper we present a framework that allows categorising digital traces of an online community along five dimensions (people, documents, terms, links and time) and then describe a tool that supports the analysis of community traces by combining several of them, illustrating the types of analysis possible using a dataset from a weblog community.
Understanding weblog communities through digital traces : a framework , a tool and an example
framework, a tool and an example
Anjo Anjewierden1 and Lilia Efimova2
1 Human-Computer Studies Laboratory, University of Amsterdam, Kruislaan 419, 1098 VA Amsterdam,
The Netherlands, anjo@science.uva.nl
2 Telematica Instituut, PO Box 589, 7500 AN Enschede, The Netherlands,
Lilia.Efimova@telin.nl
Abstract. Often research on online communities could be compared to archaeology [16]:
researchers look at patterns in digital traces that members leave to characterise the commu-
nity they belong to. Relatively easy access to these traces and a growing number of methods
and tools to collect and analyse them make such analysis increasingly attractive. However, a
researcher is faced with the difficult task of choosing which digital artefacts and which rela-
tions between them should be taken into account, and how the findings should be interpreted
to say something meaningful about the community based on the traces of its members.
In this paper we present a framework that allows categorising digital traces of an online
community along five dimensions (people, documents, terms, links and time) and then de-
scribe a tool that supports the analysis of community traces by combining several of them,
illustrating the types of analysis possible using a dataset from a weblog community.
1 Introduction
Although research on online communities has a long-standing history, the technological infra-
structure and social structures behind them evolve over time. In this respect communities sup-
ported by weblogs is a relatively recent phenomenon.
A weblog is “a frequently updated web-site consisting of dated entries arranged in reverse
chronological order” [22]. Weblogs are often perceived as a form of individualistic expression,
providing a “personal protected space” where a weblog author can communicate with others
while retaining control [11]. On one hand, a randomly selected weblog shows limited interactiv-
ity and seldomly links to other weblogs [13]. On the other hand, there is growing evidence of
social structures evolving around weblogs and their influence on norms and practices of blog-
ging. This evidence ranges from voices of bloggers themselves speaking about the social effects
of blogging, to studies on specific weblog communities with distinct cultures (e.g. [23]), to math-
ematical analysis of links between weblogs indicating that community formation in the blogo-
sphere is not a random process, but an indication of shared interests binding bloggers together
[17].
Often research on online communities could be compared to archaeology [16]: researchers
look at patterns in digital traces that members leave to characterise the community they belong to.
In the case of weblog communities relatively easy access to these traces and a growing number
of methods and tools to collect and analyse them make such analysis increasingly attractive
(e.g. papers from the annual workshops on the Weblogging ecosystem at the WWW conference
in 2004-06).
Many of the existing tools apply text or temporal analysis to large volumes of weblog data,
often focusing on short bursts in time or popular topics (e.g. [1], [15], [21]). Others apply meth-
ods of social network analysis to identify and characterise networks between bloggers based on
links between weblogs (e.g. [12], [19]). In our work we focus on combining both in order to
go beyond currently available views on weblog data, aiming at developing tools that take into
account existing community structures [14] and support the understanding of specific conversa-
tional clouds [18] and the “cloudmakers” behind them [20].
bers leave online, we find it important to articulate explicitly how studying the results points to
more general questions about weblog communities: which digital artefacts and which relations
between them are taken into account, and how the findings should be interpreted to say something
meaningful about the community based on the traces of its members.
In this paper we present a framework that allows categorization of digital traces of an online
community along five dimensions (people, documents, terms, links and time) and then describe
a tool that supports the analysis of community traces by combining several of these dimensions,
illustrating the types of analysis possible using a dataset from a weblog community.
2 Framework
In this section we present a simple framework that can assist in the analysis of online communi-
ties. The formulation of the framework is motivated by the perceived need to provide community
researchers with a conceptual tool to focus on particular aspects of the community.
There is a strong relation between the framework we propose and ongoing research into the
study of online communities. The field of social network analysis (SNA) can be characterised
by studying the relations between persons and their links, sometimes taking into account time.
The field of text mining from communities, sometimes called semantic social network analysis
[4], mainly looks at the relation between terms and documents, largely disregarding the notion
of the individual. Finally, the area of identifying trends in communities (e.g. [10], [8]) looks at
documents, terms and time. The research that is closest to what we are trying to achieve is work
on iQuest by Gloor and Zhao [9]. Their tool supports studying communities by making it possible
to look at the community as a whole (topics discussed) and the contribution of members (who
says what and when).
When thinking about online communities there are, therefore, at least five dimensions that
play an important role and are possibly of interest for investigation:
Document A self-contained publication by a member in the community. Examples of docu-
ments are a web page, email or weblog post.
Term A meaningful term used by one or more members of the community. These terms occur
in documents.
Person A member of the community.
Link A reference from one document to another document, and implicitly between the persons
who authored the documents.
Time The date, and possibly time, of publication of a document.
The framework thus focuses on communities that leave digital traces in the form of docu-
ments, and derives the other dimensions from the metadata (person, time) and content (terms,
links). Given a dataset represented along these dimensions the researcher can navigate through it
by specifying one or more initial dimensions, fixating a particular dimension (e.g. focusing on a
particular term, person, or time period). Navigating along multiple dimensions makes it possible
for the researcher to obtain both an overall view (what are the most frequent terms used in the
community) and more detailed views (term usage of a particular member over time). The more
dimensions involved, the more detailed, and maybe also the more interesting are the results of
the analysis.
3 Tool
The framework has been implemented on top of a tool called tOKo [7]. tOKo is an open source
tool for text analysis, with support for ontology development and, given the extensions described
in this paper, exploring communities.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


