Mining patterns of events in students ’ teamwork data
Available from
Nicolas Maisonneuve's profile on Mendeley.
Page 1
Mining patterns of events in students ’ teamwork data
Mining patterns of events in
students’ teamwork data
Judy Kay, Nicolas Maisonneuve, Kalina Yacef
School of Information Technologies
University of Sydney
{judy, nicolas,kalina}@it.usyd.edu.au
Osmar Zaïane
Department of Computing Science
University of Alberta
zaiane@cs.ualberta.ca
Abstract. It is difficult, but very important, to learn to work effectively as part of a team. One
potentially invaluable source of information about the success, or problems, in the way that teams
learn can be drawn from the electronic traces of their collaborations. The paper describes data
mining of student group interaction data to identify significant sequences of activity. Our goal is
to build tools that can flag interaction sequences indicative of problems, so that we can use these
to assist student teams in early recognition of problems. We also want tools that can identify
patterns that are markers of success so that these might indicate improvements during the learning
process. Our first challenge is to transform the raw data available in large quantities,
preprocessing it into a suitable alphabet for use in data mining. Then, we need data mining
algorithms that can properly account for the temporal nature of the data and the character of group
interaction. We envisage that this may involve a two way process, where theories of effective
group behaviour can drive the data mining and, in the opposite direction, that the data mining
should provide results that are meaningful to groups wishing to improve their effectiveness. We
report the results of our work in the context of a semester long software development project
course.
Keywords: Educational Data Mining, frequent pattern mining, collaborative learning.
INTRODUCTION
Group work has an important role in many aspects of life, both at work and elsewhere. This makes it
important for people to learn to be effective team members. In cases where the teams make substantial use of
electronic media to support their operation, they may produce substantial electronic traces of the group operation
and the interaction between the team members. It is extremely appealing to exploit such data, extracting salient
features so that groups can be advised on how well they are doing and how they might improve their
performance. In a formal educational context, these traces of student interaction have potential as an important
source of information to teachers who want to guide students to learn to improve their skills in group work and
to evaluate the value of various learning interventions. The mining of such data is now an important research
direction as can be shown by the recent workshops on Educational Data Mining held at ITS, AIED and AAAI
conferences (Beck, 2005, Beck, 2004, Choquet, et al., 2005). Studies were done to analyse how learning groups
collaborate in a shared workspace (Barros and Verdejo, 1999) or using a structured conversational interface
(Soller, 2004). Our work is fundamentally aimed at assessing group work processes and providing feedback to
students about how they could improve them.
Software development is commonly a group activity. Unsurprisingly, given that software teams need to
make use of computers for their core task, it is natural to provide closely linked tools for managing versions of
software, supporting group communication and planning as well as task allocation and scheduling against
deadlines. Our work has been conducted in the context of a semester long software development project course
where teams of five to seven students create a substantial software artifact as well as reports and presentations.
Our student teams collaborate using several media. Most importantly, they have face-to-face meetings,
usually at least twice a week. These meetings play a critical role in the group co-ordination processes. Since they
may not make use of electronic support, we have no direct access to data about activities during these. Teams
probably make considerable use of electronic communication with media that we also have no access to. For
example, they may use email, instant messaging communication, telephone conversations, SMS.
However, we do have access to a rich and interesting source of user activity trace data because the groups
are required to use TRAC, (http://www.edgewall.com/trac/) an open source tool designed for use in software
development projects.
students’ teamwork data
Judy Kay, Nicolas Maisonneuve, Kalina Yacef
School of Information Technologies
University of Sydney
{judy, nicolas,kalina}@it.usyd.edu.au
Osmar Zaïane
Department of Computing Science
University of Alberta
zaiane@cs.ualberta.ca
Abstract. It is difficult, but very important, to learn to work effectively as part of a team. One
potentially invaluable source of information about the success, or problems, in the way that teams
learn can be drawn from the electronic traces of their collaborations. The paper describes data
mining of student group interaction data to identify significant sequences of activity. Our goal is
to build tools that can flag interaction sequences indicative of problems, so that we can use these
to assist student teams in early recognition of problems. We also want tools that can identify
patterns that are markers of success so that these might indicate improvements during the learning
process. Our first challenge is to transform the raw data available in large quantities,
preprocessing it into a suitable alphabet for use in data mining. Then, we need data mining
algorithms that can properly account for the temporal nature of the data and the character of group
interaction. We envisage that this may involve a two way process, where theories of effective
group behaviour can drive the data mining and, in the opposite direction, that the data mining
should provide results that are meaningful to groups wishing to improve their effectiveness. We
report the results of our work in the context of a semester long software development project
course.
Keywords: Educational Data Mining, frequent pattern mining, collaborative learning.
INTRODUCTION
Group work has an important role in many aspects of life, both at work and elsewhere. This makes it
important for people to learn to be effective team members. In cases where the teams make substantial use of
electronic media to support their operation, they may produce substantial electronic traces of the group operation
and the interaction between the team members. It is extremely appealing to exploit such data, extracting salient
features so that groups can be advised on how well they are doing and how they might improve their
performance. In a formal educational context, these traces of student interaction have potential as an important
source of information to teachers who want to guide students to learn to improve their skills in group work and
to evaluate the value of various learning interventions. The mining of such data is now an important research
direction as can be shown by the recent workshops on Educational Data Mining held at ITS, AIED and AAAI
conferences (Beck, 2005, Beck, 2004, Choquet, et al., 2005). Studies were done to analyse how learning groups
collaborate in a shared workspace (Barros and Verdejo, 1999) or using a structured conversational interface
(Soller, 2004). Our work is fundamentally aimed at assessing group work processes and providing feedback to
students about how they could improve them.
Software development is commonly a group activity. Unsurprisingly, given that software teams need to
make use of computers for their core task, it is natural to provide closely linked tools for managing versions of
software, supporting group communication and planning as well as task allocation and scheduling against
deadlines. Our work has been conducted in the context of a semester long software development project course
where teams of five to seven students create a substantial software artifact as well as reports and presentations.
Our student teams collaborate using several media. Most importantly, they have face-to-face meetings,
usually at least twice a week. These meetings play a critical role in the group co-ordination processes. Since they
may not make use of electronic support, we have no direct access to data about activities during these. Teams
probably make considerable use of electronic communication with media that we also have no access to. For
example, they may use email, instant messaging communication, telephone conversations, SMS.
However, we do have access to a rich and interesting source of user activity trace data because the groups
are required to use TRAC, (http://www.edgewall.com/trac/) an open source tool designed for use in software
development projects.
Page 2
This makes TRAC an exemplar of the type of the electronic tools that learners will use as a normal part of
their learning activity, even though they are not learning tools as such. Therefore the work presented here does
not analyse data from a learning system, although the aim of this analysis is to support students’ learning.
Indeed, it would be a significant gain if we can create effective tools that can mine the data from such tools, to
provide insights into how effectively students are learning and how we can complement such tools with artifacts
that help them learn better.
This paper describes our preliminary work on mining data from TRAC, with the aim of identifying patterns
that characterise successful groups from less successful ones. The next section describes the TRAC system and
the context of its use in our course. We then explain the data mining framework we constructed, and we then
present our initial results before concluding and presenting some of our future directions of work.
CONTEXT OF EXPERIMENT AND USE OF THE TRAC SYSTEM
Students collaborate by sharing tasks via the TRAC system. These tasks are managed by a “Ticket”
system; Source code writing tasks are managed by a version control system called “SVN”; Students
communicate by means of collaborative web page writing called “Wiki”.
The Wiki allows collaborative editing of a set of web pages, all linked from the main page. Any member of
the team can edit a page, for example, to offer to meet and help work on the problem described. The value of the
Wiki is that all group members can see it. So, for example, if Daniel, who posted a top comment, gets help from
Peter, it is important that Peter notes that here so that other group members can see that someone is following up
on the problem. We believe that the Wiki is a source of very important information about group interaction and
communication. For example, if a group is functioning well, there will be pages on the Wiki with contributions
from many team members.
Based upon our own experience, as well as literature on group work and co-ordination (Kay, et al., 2006),
we believe that the Wiki should provide valuable data about meaningful patterns of group interaction, both
successful and not. We describe some of these as a background to the design of the data cleansing and data
mining we report later. It is also important for the other direction we want to explore, where we use theory to
define classes of patterns that we would like to look for.
We would also expect that a group which is functioning well, with good leadership, will have some pages
where one person is primarily responsible for the activity reported on that page. So one person will make most
of the contributions on that page. In the case of leadership functions, it may be the leader who will do this. In
this case, we would also expect other team members to contribute to this page since the leader's co-ordination
role suggests that they might post proposed action plans, minutes and the like. Then other team members should
add comments and correction as work progresses and to acknowledge that they agree with what is posted.
If the group does an effective job of allocating responsibilities, there will be other pages which are mainly
created by another single person. So, for example, if the group needs to learn about content-management
systems, one person may be allocated the task of doing research on that. Then that person would be the main
contributor to that page. However, if the group is functioning well, this information will be read and used by at
least some other team members. Moreover, a group that functions well will ensure that communication is
maintained overtly and so, we would expect that some other team members would post comments, suggestions,
additional information, corrections or even a simple acknowledgement that they had read the material and
agreed with it or thought it was very good.
It is possible to extract details of TRAC interactions. These indicate exactly who posted, removed or
altered which lines and exactly when they did this. The team members can see just this information from a tab
at the interface and we can extract the details of Wiki interaction at this level. Of course, this is not as rich a set
of discourse data as one might gain from tagged conversational interfaces, such as used by Soller (Soller, 2004).
However, it has the real merit that it is more natural and enables teams to operate exactly as they would for best
practice. There is also some potential for exploiting text mining to identify more details of what each Wiki page
is about. We are not currently doing this.
Essentially, our Wiki data gives us details of who placed, altered or removed how much text on each page
of the Wiki. We regard the page as a set of related conversations and we analyse this to look for patterns of
interaction that reflect the health of the group. Moreover, since we have the final grades for the semester, we can
use these to look for patterns that were more or less common among teams that performed well: these should be
success indicators. Similarly, patterns that were more common for weak groups should be indicators of problem
groups.
TRAC also has a ticket system (sometimes called an issues tracking tool). The core idea is that a ticket is
created for each task that the team has to do. For example, if the team needs to do research on content-
management systems, one person should create a ticket for this task. Often, it is the leader who does this. A
their learning activity, even though they are not learning tools as such. Therefore the work presented here does
not analyse data from a learning system, although the aim of this analysis is to support students’ learning.
Indeed, it would be a significant gain if we can create effective tools that can mine the data from such tools, to
provide insights into how effectively students are learning and how we can complement such tools with artifacts
that help them learn better.
This paper describes our preliminary work on mining data from TRAC, with the aim of identifying patterns
that characterise successful groups from less successful ones. The next section describes the TRAC system and
the context of its use in our course. We then explain the data mining framework we constructed, and we then
present our initial results before concluding and presenting some of our future directions of work.
CONTEXT OF EXPERIMENT AND USE OF THE TRAC SYSTEM
Students collaborate by sharing tasks via the TRAC system. These tasks are managed by a “Ticket”
system; Source code writing tasks are managed by a version control system called “SVN”; Students
communicate by means of collaborative web page writing called “Wiki”.
The Wiki allows collaborative editing of a set of web pages, all linked from the main page. Any member of
the team can edit a page, for example, to offer to meet and help work on the problem described. The value of the
Wiki is that all group members can see it. So, for example, if Daniel, who posted a top comment, gets help from
Peter, it is important that Peter notes that here so that other group members can see that someone is following up
on the problem. We believe that the Wiki is a source of very important information about group interaction and
communication. For example, if a group is functioning well, there will be pages on the Wiki with contributions
from many team members.
Based upon our own experience, as well as literature on group work and co-ordination (Kay, et al., 2006),
we believe that the Wiki should provide valuable data about meaningful patterns of group interaction, both
successful and not. We describe some of these as a background to the design of the data cleansing and data
mining we report later. It is also important for the other direction we want to explore, where we use theory to
define classes of patterns that we would like to look for.
We would also expect that a group which is functioning well, with good leadership, will have some pages
where one person is primarily responsible for the activity reported on that page. So one person will make most
of the contributions on that page. In the case of leadership functions, it may be the leader who will do this. In
this case, we would also expect other team members to contribute to this page since the leader's co-ordination
role suggests that they might post proposed action plans, minutes and the like. Then other team members should
add comments and correction as work progresses and to acknowledge that they agree with what is posted.
If the group does an effective job of allocating responsibilities, there will be other pages which are mainly
created by another single person. So, for example, if the group needs to learn about content-management
systems, one person may be allocated the task of doing research on that. Then that person would be the main
contributor to that page. However, if the group is functioning well, this information will be read and used by at
least some other team members. Moreover, a group that functions well will ensure that communication is
maintained overtly and so, we would expect that some other team members would post comments, suggestions,
additional information, corrections or even a simple acknowledgement that they had read the material and
agreed with it or thought it was very good.
It is possible to extract details of TRAC interactions. These indicate exactly who posted, removed or
altered which lines and exactly when they did this. The team members can see just this information from a tab
at the interface and we can extract the details of Wiki interaction at this level. Of course, this is not as rich a set
of discourse data as one might gain from tagged conversational interfaces, such as used by Soller (Soller, 2004).
However, it has the real merit that it is more natural and enables teams to operate exactly as they would for best
practice. There is also some potential for exploiting text mining to identify more details of what each Wiki page
is about. We are not currently doing this.
Essentially, our Wiki data gives us details of who placed, altered or removed how much text on each page
of the Wiki. We regard the page as a set of related conversations and we analyse this to look for patterns of
interaction that reflect the health of the group. Moreover, since we have the final grades for the semester, we can
use these to look for patterns that were more or less common among teams that performed well: these should be
success indicators. Similarly, patterns that were more common for weak groups should be indicators of problem
groups.
TRAC also has a ticket system (sometimes called an issues tracking tool). The core idea is that a ticket is
created for each task that the team has to do. For example, if the team needs to do research on content-
management systems, one person should create a ticket for this task. Often, it is the leader who does this. A
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
7 Readers on Mendeley
by Discipline
29% Education
by Academic Status
14% Other Professional
14% Senior Lecturer
14% Ph.D. Student
by Country
29% France
14% Japan
14% Luxembourg


