Checking Conformance of High-Level Business Process Models to Event Logs

Antonina K. Begicheva National Research University Higher School of Economics (HSE) 33 Kirpichnaya Str., Moscow, Russia Email: be-ton@yandex.ru

Irina A. Lomazova National Research University Higher School of Economics (HSE) 33 Kirpichnaya Str., Moscow, Russia Email: ilomazova@hse.ru

I. Introduction

Process mining [1] is a new technology, which provids variety of methods to discover, monitor and improve real processes by extracting knowledge from event logs. The two most prominent process mining tasks are: (i) process discovery: constructing a process model from example behavior recorded in an event log, and (ii) conformance checking: diagnosing and quantifying discrepancies between observed behavior and modeled behavior. There are many software products which allow us to use methods of Process Mining. ProM [4] is an open-source tool supporting many techniques of Process Mining, which are represented as plug-ins. Due to a flexibility of this environment it can be used both for reserch and applications.

This paper studies conformance checking [1], [3], [6], [7]. Conformance checking uses both an event log and a model, and compares observed behavior written in the log with the behavior produced by the model. The general goal is to find discrepancies between them to improve a model. Conformance checking techniques can also be used for measuring the performance of process discovery algorithms (that restores a model on the basis of a known log) and to repair models that have not got a well alignment with the real behavior of the process.

There are four model’s evaluation criteria: fitness, precision, generalization and simplicity. Fitness measures ”the proportion of behavior in the event log possible according to the model”. Among the four quality criteria, fitness is the most related to the conformance. There are several methods of conformance checking. We consider methods based on replay approach [3]. Replaying a log on a model can help to measure fitness.

An obvious approach to measure fitness is to count the fraction of cases that can be ”parsed completely” (i.e. the proportion of cases corresponding to firing sequences leading from [start] to [end]). Fitness can range from 0 to 1. It is supposed that fitness is equal to 1, if the log perfectly fits the model. When measuring fitness by replaying, we could stop replaying a trace when we face a problem and mark this trace as unsuitable. We get more information about conformance if we continue replaying the trace on the model, and record a count of all missing tokens and all tokens that are pending at the end.

Let us denote the number of produced tokens by p, the

number of consumed tokens by c, the number of missing tokens by m and the number of remaining tokens by r. Initially, when all places are empty, p = c = 0. Then the environment produces a token for the place [start]. Therefore, the p counter is incremented: p ^ p + 1. A log is replayed by consecutive firings of transitions, corresponding to activities of the process. Each transition consumes and produces several tokens and we increase the corresponding variables. If we need an extra token in a place to continue replaying (when the next transition is not enabled), then the m counter must be incremented and the place, that lacks a token, is marked as a place where a token was missed. If by the time of consuming a token from the place [end] there were tokens pending in some other places, the r counter must be increased by the number of the remained tokens, and the places with tokens must be tagged.

The fitness of a trace a of a workflow process model N is defined as follows:

fitness (a, N ) = 1 (1 - m) + 2 (1 - p)

When working with business processes we typically use detailed logs, which present the full report about sequentially executed activities. Since in most information systems logs are generated automatically, keeping detailed records is not a problem. However, large and detailed models are not good to deal with. Such models are not clear and readable for experts. Experts prefer to work with more abstract (high-level) models. More abstract models are easier to construct, understand and analyze. Process models developed by people are, as a rule, not very large and abstract from technical details. So, checking conformance of an abstract model and a low-level event log, generated by an information system, is an important and challenging problem. However, as far as we know, this problem has not been studied in the literature.

In this paper we consider an abstract model in which each separate activity represents a subprocess built from a set of smaller activities. A history of a detailed process behavior is recorded in low-level logs. Process models are represented by workflow nets — a special subclass of Petri nets [2]. We present a method for checking conformance of an abstract model and a low-level event log.

The paper is organized as follows. Section II contains some basic definitions and notions, including Petri nets, event log, perfect fits and refinement. In Section III we give a motivating example of handling a request for a compensation within airline in terms of Petri nets. In Section IV we present a

This work is licensed under the Creative Commons Attribution License.

method for checking conformance between an abstract model and a low-level log. We also give a justification of this method by proving its correctness in the case of perfect fitness. An implementation of our algorithm is described in Section V. Section VI contains some conclusions.

II. Preliminaries

We start with recalling some basic notions from the set theory. Let S be a set. By S* we denote the set of all finite sequences (words) over S.

S = Si U S2 U... U Sn is a partition of S iff Vi, j G [1, n] : S, C S and S, n Sj = 0.

A multiset m over a set S is a mapping m : S — Nat, where Nat is the set of natural numbers (including zero), i.e. a multiset may contain several copies of the same element.

For two multisets m, m' we write m C m' iff Vs G S : m(s) < m'(s) (the inclusion relation). The sum of two multisets m and m' is defined as usual: Vs G S : (m + m')(s) = m(s) + m'(s), the difference is a partial function: Vs G S such that m(s) > m(s') : (m — m')(s) = m(s) — m'(s). By M(S) we denote the set of all finite multisets over S. Non-negative integer vectors are often used to encode finite multisets.

Definition 1 (Petri net). Let P and T be disjoint finite sets of places and transitions and F: (P x T) U (T x P) — Nat. Then N = (P, T, F) is a Petri net. Let A be a finite set of activities. A labeled Petri net is a Petri net with a labeling function A : T — A U {e} which maps every transition to an activity (a transition label) from A, or a special label e, corresponding to an invisible action.

A marking in a Petri net is a function m: P — Nat, mapping each place to some natural number (possibly zero). Thus a marking may be considered as a multiset over the set of places. Pictorially, P-elements are represented by circles, T-elements by boxes, and the flow relation F by directed arcs. Places may carry tokens represented by filled circles. A current marking m is designated by putting m(p) tokens into each place p G P.

For a transition t g T an arc (x, t) is called an input arc, and an arc (t, x) — an output arc; the preset *t and the postset t* are defined as the multisets over P such that *t(p) = F(p, t) and t* (p) = F(t,p) for each p g P.

A transition t G T is enabled in a marking m iff Vp G P m(p) > F(p, t). An enabled transition t may fire yielding a new marking m' =def m — *t +1*, i. e. m'(p) = m(p) —

F(p, t)+F(t,p) for each p G P (denoted m — m', m —— m', or just m — m').

A Workflow-net is a (labeled) Petri net with two special places: i and f. These places are used to mark the beginning and the ending of a workflow process.

Definition 2 (Workflow net). A (labeled) Petri net N =

(P, T, F, A) is called a workflow net (WF-net) iff

1) There is one source place i G P and one sink place

*i*Не можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

f G P s. t. *i = f* = 0;

2) Every node from P U T is on a path from i to f.

i f

WF-net N Extended WF-net t+(N)

Fig. 1. Extending a WF net with initial and final transitions

3) The initial marking in N contains the only token in its source place.

By abuse of notation we denote by i both the source place and the initial marking in a WF-net. Similarly, we use f to denote the final marking in a WF-net N, defined as a marking containing the only token in the sink place f.

Let N = (P, T, F, A) be a WF-net. The extended WF net (EWF-net) N' = (P', T', F', A') is defined as follows: P' = P,T' = T U{ti,tf}, and F' = F U{(ti,i), (f,f>}, where ti,tf are new (not occurring in P, T) nodes. The new transitions t,, tf are labeled with invisible activity e in N', all other transitions in N' have the same labels as in N. In the remainder we will denote such an extended WF net of N as t+(N). The initial marking in an extended WF net contains no tokens. Thus an extended WF net may start a new case at any moment (cf. Fig.1).

Event logs keep a history of process executions.

Definition 3 (Event log). Let A be a finite set of activities. A trace a is a finite sequence of activities, i.e., a G A*. An event log L is a finite multiset of traces, i.e., L G M(A*).

In this paper we study conformance checking. Given a model and an event log we would like to compare the process model behavior and the behavior recorded in the event log. Several metrics for conformance checking were defined in the literature [1]. Among the most important metrics is fitness. Informally speaking, fitness measures the proportion of behavior in the event log possible according to the model.

Definition 4 (Perfect fit). Let N be a WF-net with transition labels from A, an initial marking i, and a final marking f. Let a be a trace over A. We say that a trace a = ai,..., ak perfectly fits N iff there exists a sequence of firings i = m0 — ... — mfc+i = f in N, s.t. the sequence of activities A(ti), A(t2),..., A(tk) after deleting all invisible activities e coincides with a. A log L perfectly fits N iff every trace from L perfectly fits N.

Petri nets can be extended with hierarchy and it is done e.g. in Colored Petri nets (CPN) [8]. Hierarchy allows to develop more compact models with a compositional network structure. In the case of two-level hierarchy there are two models of one process: a high-level (abstract) model and a low-level (refined) model. The high-level model is a model with abstract transitions. An abstract transition refers to a Petri net subprocess model refining the activity represented by this transition. The low-level model can be obtained from an abstract model by substituting subprocess models for abstract transitions.

examine

thoroughly

Fig. 2. An abstract model for handling compensation requests

Fig. 3. A refined model for handling compensation requests, which refines the model in Fig. 2

Definition 5 (Substitution). Let Ni = (Pi, T\, Fi, Ai) be a WF-net, t g T be a transition in Ni. Let also N2 = (P2,T2,F2,A2) be an EWF-net with the initial and final transitions ti,tf correspondingly. We say that a WF-net N3 =

(P3,T3,F3,A) is obtained by a substitution [t — N2] of N2 for t in Ni iff P3 = Pi U P2, T3 = Ti U T2 \ {t}, F3 = Fi U F2 \ {(p,t) | p G *t} \ {(t,p) | p G t*} U {(p,ti) | p G *t} U {(tf ,p) | p G t*},

Definition 6 (Refinement). Let Na,Nr be two WF-nets with sets of activities Aa,Ar correspondingly. Let Aa = ai, a2, . . . , an, and Ar = a1 U A;; U ... U An be a partition of Ar into n subsets, and N1, N2 ,... Nn be EWF-nets with sets of activities Ai,..., An correspondingly. We say that Nr is a refinement of Na via substitutions [a1 — Nr1,a2 — N2,...an — Nn] iff Nr can be obtained from Na by simultaneous substitutions of N* for all t s.t. A(t) = ai.

III. Motivating Example

Let us consider a toy model from [1], which describes handling a request for a compensation within airline. Here customers may request compensations for various reasons. An abstract model of this process (expressed in terms of a Petri net) is presented in Fig.2. Fig.3 presents a refined model of the same process. To avoid congestion of activities’ names in the low-level model in Fig.3 only places inherited from the abstract model are labeled in the picture.

Let us explain the correspondence between the two

models. Let N0, N 1,N2,N3,N4,N5,N6,N7 be EWF-

nets with sets of activities A0 = {t0,t1,t2,t3}, A1 =

{t4,t5,t6,t7,t8,t9}, A2 = {t10,t11,t12}, A3 =

{t13,t14,t15},A4 = {t 16,117,118, }, A5 =

{t 19, t20, t21}, A6 = {t22, t23, t24}, A7 = {t25,t26,t27}

correspondingly. The refined model in Fig. 3 is the refinement

of the abstract model in Fig. 2 via the substitutions

*i*Не можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

[register_request — N-1, examine_thoroughly — N2

, examine_casually — N3, checfc_ticfcet —

N3, decide — N4, reinitiate_request —

N5,pay_compensation — N6, reject_request — N7]. A sample of an event log obtained for the refined model L is shown in Fig.4. Note that for the log L (Fig.4) and the refined model (Fig.3) we have fitness = 1, since the log L is generated by the model.

So, we have an abstract model (as a more simple to understand and analyse) and we want to check conformance of this model to a low-level log. It is obvious that we cannot do it straightforward, since the model is defined in terms of abstract activities, and the log contains low-level activities.

IV. Checking Conformance between an Abstract Model and a Refined Event Log

To check conformance between an abstract model and a low-level log, we first transform the given log into a log over

L = { < t0, t1, t3, t4, t5, t6, t13, t14, t7, t8, t15, t9, t16, t17, t18, t25, t26, t27 >,

< t0, t1, t3, t4, t13, t14, t5, t7, t15, t6, t8, t9, t16, t17, t18, t24, t23, t22 >,

< t0,t1,t3,t13,t10,t11,t14,t15,t12,t16,t17,t18,t24,t23,t22 >,

< t0, t2, t3, t13, t4, t14, t15, t5, t6, t7, t8, t9, t16, t17, t18, t25, t26, t27 >,

< t0, t2, t3, t4, t13, t14, t15, t5, t7, t6, t8, t9, t16, t17, t18, t24, t23, t22 >,

< t0,t1,t3,t10,t11,t13,t12,t14,t15,t16,t17,t18,t19,t20,t21,t10,t11,t13,t14,t15,t12,t16,t17,t18,t25,t26,t27 >,

< t0,t2,t3,t13,t10,t14,t15,t11,t12,t16,t17,t18,t24,t23,t22 >,

< t0,t2,t3,t13,t10,t11,t12,t14,t15,t16,t17,t18,t19,t20,t21,t10,t13,t14,t11,t15,t12,t16,t17,t18,t24,t23,t22 >,

< t0,t2,t3,t13,t14,t10,t11,t15,t12,t16,t17,t18,t25,t26,t27 >}.

Fig. 4. An event log for the refined model in Fig. 3

abstract activities. For this purpose, each low-level activity in the log is replaced by a name of the subprocess (an abstract activity) it belongs to. Hence we get a log with ’’stuttering” abstract activities. This transformation is implemented by the method toHighLevel(), schematically presented in Algorithm 1.

Data: lowlevellog — a list of low-level activities,

hlaction — a set of high-level activities, where for each high-level activity is stored information about its partition into subsets of low-level activities.

Result: highlevellog — a high-level event log. i <— 0;

highlevellog <— 0;

currentLowAction <— lowlevellog[i] while i < lowlevellog.size do

// search of high-level activity,

// subsets of which contains this // low-level activity currentHighAction <— search(hlaction, currentLowAction); if currentHighAction = 0 then

// check of condition that // low-level activity is included // to partition of the current // high-level action while i < lowlevellog.size and currentHighAction.contains(currentLowAction)

do

end

i + 1;

highlevellog.add(currentHighAction);

*i*Не можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

end

end

return highlevellog;

Algorithm 1: Method toHighLevel(), transforming a low-level log into a log over abstract activities

After converting the refined log into notations of the abstract model we get a new log, which is a multiset of sequences of abstract activities. But this still can not be used for the conformance checking because of stuttering actions. Moreover, when we have two concurrent subprocesses, represented by two concurrent abstract activities in an abstract model, stuttering sequences may interleave. To overcome this

transition ai after adding loop

Fig. 5. Extending a transition by adding loop.

problem we transform an abstract model into a model allowing stuttering of each abstract activity. For this purpose we add loops to transitions in the abstract model.

Algorithm 2 schematically describes the method addLoops() for transforming an abstract model by adding loops to abstract transitions (cf . Fig.5).

Now we describe the general algorithm of conformance checking between and abstract model and a low-level log in more details.

Main Algorithm (Converting a refined log to an abstract one and transforming the high-level model for working with result of this conversion).

Let Na = (P, T, F, А) be a Petri-net corresponding to an abstract model of a process over a set of activities Aa. Let also Lr be a low-level event log (a finite multiset of traces) over a set Ar of low-level activities.

Let Aa, Ar be the set of activities in the abstract model and a set of activities in a refined model correspondingly. Denote by and denote trace from it by а^' є Lr where j is a number of trace.

1) Convert Lr to a high-level event log (denote it La) using data about the partition of Ar. Replace each ak є а^ (where k is a number of activities in the trace) to the corresponding activity from Aa for all j and k.

2) Use a rule introduced by us about repetitive activities in La. Replace every sequences of identical activities

Data: Petri net as petrinet, trace from event log as trace Result: modified Petri net

trace <— sort(trace);

// index of current activity index <— 0;

currentActivity <— trace[index]; while index < trace.size - 1 do indexOfNext <— index + 1; nextActivity <— trace [indexOfNext]; if currentActivity==nextActivity then

// check count of transition // with this name in Petri net if countInNet(petrinet, currentActivity) == 1 then

// add loop to Petri net // for current activity addLoop(petrinet, currentActivity);

end

indexOfNext <— indexOfNext + 1; while indexOfNextActivity < trace.size and currentActivity=nextActivity do | indexOfNext <— indexOfNext + 1; end

index <— indexOfNext — 1; currentActivity <— trace [index] ;

end

return petrinet;

end

Algorithm 2: Method addLoops() for model’s transformation by adding loops

in aj (for all j) to one instance of this activity, i.e. [K...ak } — ak ].

3) If the result obtained in the previous steps (La) does not contain traces with repetitive activities (unlike the previous step, they are non-consecutive like {ai,..., aa, ak+1,..., aa,..., an}), then stop, otherwise proceed to the next step.

*i*Не можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

4) Working with each trace individually find all aa, which have repeats in the same trace (see the previous step) and transitions in Na, which correspond to these actions.

5) Add a loop to Na for all transitions from the previous step (denote the current transition by t):

a) Choose one place among pi G P and pi G t* (denote it by p').

b) Choose one place among pi G P and pi G *t (denote it by p'').

c) Add to Na a new transition (denote it by t').

d) Add to Na a new arcs: {(p',t'); (t',p'')}

e) If a number of repeats for aa is even, we have to add one repeat for last instance of it, i. e. aa — aa.

6) Apply any known algorithm for conformance checking of La to Na.

We illustrate the algorithm by applying it to the example that was presented above in Fig. 2 and 4.

First, we convert the event log Lr to a high-level log by applying Algorithm 1. The log obtained as the result of this

is denoted by La and is shown in Fig. 6. Then we apply Algorithm 2 to the abstract model Na and obtain the new model Na, shown in Fig. 7. The model N'a is a stuttering model over abstract activities. And finally we check conformance between the model Na' and the log La by replaying traces from La in Na. It turns out, that all traces from La can be replayed in Na, i.e. the log La perfectly fits N'a. This is not by chance. The following theorem states, that the proposed conformance checking method is stable under perfect fitness. Theorem 1. Let Nr be a refinement of Na and Lr be an event log over the set of activities Ar, i.e. Lr g M(A*). If Lr perfectly fits Nr, then the main algorithm return 1, which is interpreted as Lr perfectly fits Na.

We omit the proof of the theorem, since it is rather technical and straightforward.

V. Implementation

The proposed method for checking conformance of high-level business model to low-level event log is implemented as a plug-in for ProM.

our tool consists of six main classes:

1) TransformerForConformanceChecking class is responsible for interaction with framework and GUI.

2) HighLevelTransition class represents a high-level transition. Each object of this type have a name of appropriate abstract activity and an array of low-level activities, corresponding to this object.

3) Activity class represents an activity and implements forming of activity with data from event log.

4) ConvertorForLowLevelLog class is responsible for implementation of Algorithm 1, i.e. it transforms a low-level event log to an abstract event log.

5) ConvertorForModel class is responsible for implementation of Algorithm 2, i.e. it transforms an abstract model by adding the requisite loops.

VI. Conclusion

Abstract models are much more clear and more understandable than low-level models. But in practice we have only low-level logs, which cannot be used for direct conformance checking. Hence checking conformance of a high-level business model to a low-level event log is an important task to facilitate the expert’s work. In this paper we have presented a method for solving this problem. Also we had developed a ProM plug-in which implements the proposed algorithm.

We have proved, that our method recognizes perfect fitness between an abstract model and a low-level log correctly. This can be considered as a justification of the proposed approach. However, this is not enough. It is very important to check the method on logs with deviations. In the further research we plan experiments with different logs (logs with noise and different kinds of deviations), as well as real application logs, and we shall work on improving the algorithms through the use of found heuristics.

Acknowledgment

This study was carried out within the National Research University Higher School of Economics’ Academic Fund.

L = { < register_request, examine_thoroughly, check_ticket, examine_thoroughly, check_ticket2, examine_thoroughly, decide, reject_request >,

< register_request, examine_thoroughly, check_ticket, examine_thoroughly, check_ticket2, examine_thoroughly, decide,pay_compensation >,

< register_request, check_ticket, examine_casually, check_ticket2, examine_casually2, decide,pay_compensation >,

< register_request, check_ticket, examine_thoroughly, check_ticket2, examine_thoroughly2, decide, reject_request >,

< register_request, examine_thoroughly, check_ticket, examine_thoroughly2, decide, pay _compensation >,

*i*Не можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

< register_request, examine_casually, check_ticket, examine_casually2, check_ticket2, decide, reinitiate_request, examine_casually, check_ticket, examine_casually2, decide, reject_request >,

< register_request, check_ticket, examine_casually, check_ticket2, examine_casually2, decide,pay_compensation >,

< register_request, check_ticket, examine_casually, check_ticket2, decide, reinitiate_request,

examine_casually, check_ticket, examine_casually, check_ticket2, examine_casually, decide, pay _compensation >,

< register_request, check_ticket, examine_casually, check_ticket2, examine_casually2, decide, reject_request >}.

Fig. 6. The abstract event log obtained by applying Algorithm 1 to the initial event log in Fig. 4

examine

thoroughly

renitiate

request

Fig. 7. The abstract model after adding loops (by applying Algorithm 2 to the model in Fig. 2)

References

[1] W.M.P. van der Aalst. Process Mining: Discovery, Conformance and Enhancement of Business Processes. Berlin: Springer-Verlag, 2011.

[2] W.M.P. van der Aalst, K.M. van Hee. Workflow Management: Models, Methods and Systems. Cambridge, MA: MlT Press, 2002.

[3] A. Rozinat, W.M.P. van der Aalst. Conformance Testing: Measuring the Alignment Between Event Logs and Process Models. BETA Working Paper Series, Eindhoven University of Technology, 2005, vol.144, pp 203-210.

[4] B.F. van Dongen, A.K. Alves de Medeiros, H.M.W. Verbeek, A.J.M.M. Weijters, W.M.P. van der Aalst. The ProM framework: A New Era in

Process Mining Tool Support. Lecture Notes in Computer Science, Berlin: Springer Verlag, 2005, vol. 3536, pp. 444-454.

[5] H. M. W. Verbeek, J. C. A. M. Buijs, B. F. van Dongen, W. M. P. van der Aalst. Prom 6: The process mining toolkit. Proc. ofBPM Demonstration, 2010. vol. 615. pp. 3439.

[6] A. Rozinat. Process mining: conformance and extension. Technische Universiteit, Eindhoven, 2010.

[7] A. Adriansyah, B. F. van Dongen, W. M. P. van der Aalst. Conformance Checking Using Cost-Based Fitness Analysis. IEEE 15th International Enterprise Distributed Object Computing Conference, 2011, pp. 55-64.

[8] K. Jensen, and L. M. Kristensen. Coloured Petri Nets: Modelling and Validation of Concurrent Systems. Berlin: Springer-Verlag, 2009.