Using classification methods to label tasks in process mining
- ISSN: 15320618
- DOI: 10.1002/smr.463
Abstract
We investigate a method designed to improve the accuracy of process mining in scenarios where the identification of task labels for log events is uncertain. Such situations are prevalent in business processes where events consist of communications between people, such as email messages. We examine how the accuracy of an independent task identifier, such as a classification or clustering engine, can be improved by examining the currently mined process model. First, a classification scheme based on identifying the keywords in each message is presented to provide an initial labeling. We then demonstrate how these labels can be refined by considering the likelihood that the event represents a particular task as obtained via an analysis of the current representation of the process model. This process is then repeated a number of times until the model is sufficiently refined. Results show that both keyword classification and the current process model analysis can be significantly effective on their own, and when combined have the potential to correct virtually all errors when noise is low (less than 20%), and can reduce the error rate by about 85% when noise is in the 30-40% range. Copyright 2010 Crown in the right of Canada.
Using classification methods to label tasks in process mining
Archives des publications du CNRC (NPArC)
Publisher’s version / la version de l'éditeur:
Journal of Software Maintenance and Evolution : Research and Practice, 22, 6-7,
pp. 497-517, 2010-09-01
Using Classification Methods to Label Tasks in Process Mining
Buffett, Scott; Geng, Liqiang
Contact us / Contactez nous: nparc.cisti@nrc-cnrc.gc.ca.
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=fr
L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site
Web page / page Web
http://dx.doi.org/10.1002/smr.463
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=15188891&lang=en
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=rtdoc&an=15188891&lang=fr
LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.
READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE.
Access and use of this website and the material on it are subject to the Terms and Conditions set forth at
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/jsp/nparc_cp.jsp?lang=en
Scott Buffett and Liqiang Geng
Institute for Information Technology - e-Business, National Research Council,
Fredericton, New Brunswick, Canada, E3B 9W4
{scott.buffett, liqiang.geng}@nrc.gc.ca
Abstract. We investigate a method designed to improve the accuracy of process mining in scenarios
where the identification of task labels for log events is uncertain. Such situations are prevalent in
business processes where events consist of communications between people, such as email messages.
We examine how the accuracy of an independent task identifier, such as a classification or clustering
engine, can be improved by examining the currently mined process model. First, a classification
scheme based on identifying keywords in each message is presented to provide an initial labeling.
We then demonstrate how these labels can be refined by considering the likelihood that the event
represents a particular task as obtained via an analysis of the current representation of the process
model. This process is then repeated a number of times until the model is sufficiently refined. Results
show that both keyword classification and current process model analysis can be significantly effective
on their own, and when combined have the potential to correct virtually all errors when noise is low
(less than 20%), and can reduce the error rate by about 85% when noise is in the 30-40% range.
Keywords: workflow, process mining, task labeling, Bayesian classification
1 Introduction
In recent years, research in business process management has seen a considerable effort in the field of process
mining. Process mining involves automatically (or semi-automatically) inspecting a log of machine-level
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


