Discovery of process models from data and domain knowledge: A rough-granular approach

5Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The rapid expansion of the Internet has resulted not only in the ever growing amount of data therein stored, but also in the burgeoning complexity of the concepts and phenomena pertaining to those data. This issue has been vividly compared [14] by the renowned statistician, prof. Friedman of Stanford University, to the advances in human mobility from the period of walking afoot to the era of jet travel. These essential changes in data have brought new challenges to the development of new data mining methods, especially that the treatment of these data increasingly involves complex processes that elude classic modeling paradigms. "Hot" datasets like biomedical, financial or netuser behavior data are just a few examples. Mining such temporal or stream data is on the agenda of many research centers and companies worldwide (see, e.g., [31,1]). In the data mining community, there is a rapidly growing interest in developing methods for the discovery of structures of temporal processes from data. Works on discovering models for processes from data have recently been undertaken by many renowned centers worldwide (e.g., [34, 19, 36, 9], www.isle.org/~langley/, soc.web.cse.unsw.edu.au/bibliography/discovery/index. html). We discuss a research direction for discovery od process models from data and domain knowledge within the program Wisdom technology (wistech) outlined recently in [15, 16]. Wisdom commonly means rightly judging based on available knowledge and interactions. This common notion can be refined. By wisdom, we understand an adaptive ability to make judgments correctly (in particular, correct decisions) to a satisfactory degree, having in mind real-life constraints. The intuitive nature of wisdom understood in this way can be metaphorically expressed by the so-called wisdom equation as shown in (1). wisdom = adaptive judgment + knowledge + interaction. (1) Wisdom could be treated as a certain type of knowledge. Especially, this type of knowledge is important at the highest level of hierarchy of meta-reasoning in intelligent agents. Wistech is a collection of techniques aimed at the further advancement of technologies to acquire, represent, store, process, discover, communicate, and learn wisdom in designing and implementing intelligent systems. These techniques include approximate reasoning by agents or teams of agents about vague concepts concerning real-life, dynamically changing, usually distributed systems in which these agents are operating. Such systems consist of other autonomous agents operating in highly unpredictable environments and interacting with each others. Wistech can be treated as the successor of database technology, information management, and knowledge engineering technologies. Wistech is the combination of the technologies represented in equation (1) and offers an intuitive starting point for a variety of approaches to designing and implementing computational models for wistech in intelligent systems. Knowledge technology in wistech is based on techniques for reasoning about knowledge, information, and data, techniques that enable to employ the current knowledge in problem solving. This includes, e.g., extracting relevant fragments of knowledge from knowledge networks for making decisions or reasoning by analogy. Judgment technology in wistech is covering the representation of agent perception and adaptive judgment strategies based on results of perception of real life scenes in environments and their representations in the agent mind. The role of judgment is crucial, e.g., in adaptive planning relative to the Maslov Hierarchy of agents' needs or goals. Judgment also includes techniques used for perception, learning, analysis of perceived facts, and adaptive refinement of approximations of vague complex concepts (from different levels of concept hierarchies in real-life problem solving) applied in modeling interactions in dynamically changing environments (in which cooperating, communicating, and competing agents exist) under uncertain and insufficient knowledge or resources. Interaction technology includes techniques for performing and monitoring actions by agents and environments. Techniques for planning and controlling actions are derived from a combination of judgment technology and interaction technology. There are many ways to build foundations for wistech computational models. One of them is based on the rough-granular computing (RGC). Rough-granular computing (RGC) is an approach for constructive definition of computations over objects called granules, aiming at searching for solutions of problems which are specified using vague concepts. Granules are obtained in the process called granulation. Granulation can be viewed as a human way of achieving data compression and it plays a key role in implementing the divide-and-conquer strategy in human problem-solving [38]. The approach combines rough set methods with other soft computing methods, and methods based on granular computing (GC). RGC is used for developing one of the possible wistech foundations based on approximate reasoning about vague concepts. As an opening point to the presentation of methods for discovery of process models from data we use the proposal by Zdzislaw Pawlak. He proposed in 1992 [27] to use data tables (information systems) as specifications of concurrent systems. Since then, several methods for synthesis of concurrent systems from data have been developed (see, e.g., [32]). Recently, it became apparent that rough set methods and information granulation have set out a promising perspective to the development of approximate reasoning methods in multi-agent systems. At the same time, it was shown that there exist significant limitations to prevalent methods of mining emerging very large datasets that involve complex vague concepts, phenomena or processes (see, e.g., [10, 30, 35]). One of the essential weaknesses of those methods is the lack of ability to effectively induce the approximation of complex concepts, the realization of which calls for the discovery of highly elaborated data patterns. Intuitively speaking, these complex target concepts are too far apart from available low-level sensor measurements. This results in huge dimensions of the search space for relevant patterns, which renders existing discovery methods and technologies virtually ineffective. In recent years, there emerged an increasingly popular view (see, e.g., [12, 18]) that one of the main challenges in data mining is to develop methods integrating the pattern and concept discovery with domain knowledge. In this lecture, the dynamics of complex processes is specified by means of vague concepts, expressed in natural languages, and of relations between those concepts. Approximation of such concepts requires a hierarchical modeling and approximation of concepts on subsequent levels in the hierarchy provided along with domain knowledge. Because of the complexity of the concepts and processes on top levels in the hierarchy, one can not assume that fully automatic construction of their models, or the discovery of data patterns required to approximate their components, would be straightforward. We propose to use in discovery of process models and their components through an interaction with domain experts. This interaction allows steering the discovery process, therefore makes it computationally feasible. Thus, the proposed approach transforms a data mining system into an experimental laboratory, in which the software system, aided by human experts, will attempt to discover: (i) process models from data bounded by domain constraints, (ii) patterns relevant to user, e.g., required in the approximation of vague components of those processes. This research direction has been pursued by our team, in particular, toward the construction of classifiers for complex concepts (see, e.g., [2-4, 6-8, 11, 20-23]) aided by domain knowledge integration. Advances in recent years indicate a possible expansion of so far conducted research into discovery of models for processes from temporal or spatio-temporal data involving complex objects. We discuss the rough-granular modeling (see, e.g., [29]) as the basis for discovery of processes from data. We also outline some perspectives of the presented approach for application in areas such as prediction from temporal financial data, gene expression networks, web mining, identification of behavioral patterns, planning, learning interaction (e.g., cooperation protocols or coalition formation), autonomous prediction and control by UAV, summarization of situation, or discovery of language for communication. The novelty of the proposed approach for the discovery of process models from data and domain knowledge lies in combining, on one side, a number of novel methods of granular computing for wistech developed using the rough set methods and other known approaches to the approximation of vague, complex concepts (see, e.g., [2-8, 17, 20-23, 25, 26, 28, 29, 37, 38]), with, on the other side, the discovery of process' structures from data through an interactive collaboration with domain experts(s) (see, e.g., [2-8, 17, 20-23, 29]). © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Skowron, A. (2007). Discovery of process models from data and domain knowledge: A rough-granular approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4815 LNCS, pp. 192–197). Springer Verlag. https://doi.org/10.1007/978-3-540-77046-6_24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free