Interpretable Process Mining: shifting control to end users

Marcello LaRosa
BPMDiscipline, Information SystemsSchool
Queensland University ofTechnology

/process mining
algorithms
live data
historical data
process model
differences,
root-causes…
conformance
report
process
performance
A ⇒ B
“actionable”
process
knowledge
Process mining in a nutshell
15
4,318
14
14
858
13
7,128
26
3,794
32
31
734 28
6,212
9
1,526
941
4,324
258
186
4,360
4,360
Created
4,360
Waiting for Support
12,587
Waiting for Customer
8,681
Resolved
5,023
Closed
4,360
Waiting for Internal
923
Escalation
42
Waiting for Approval
14
Waiting for Triage
31

Process
mining
Automated
process
discovery
Performance
analysis
Conformance
analysis
Variants
analysis
Process mining methods
/ A ⇒ B
15
4,318
14
14
858
13
7,128
26
3,794
32
31
734 28
6,212
9
1,526
941
4,324
258
186
4,360
4,360
Created
4,360
Waiting for Support
12,587
8,681
Resolved
5,023
Closed
4,360
923
Escalation
42
14
Waiting for Triage
31

Claims handling problem @ Suncorp
OK
OK Good
Bad Expected
Performance
Line

Given two logs L1 and L2, explain the differences between the
two logs
Simple claims and quick Simple claims and slow
Variants analysis
MODEL
S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013

Fantastic, but what’s the catch?

Variants analysis: possible approaches
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
• Manual visual inspection: time-consuming and error prone, or
• Automated sequence classification…
At an Australian hospital…

Sequence classification
t1: <e11[d111:v111, …, d11n:v11m] e12[d121:v121, …, d12m:v12m] … e1p[d1p1:v1p1, …, d1pm:v1pm]>
…
tq: <eq1[dq11:vq11, …, dq1n:vq1m] eq2[dq21:vq21, …, dq2m:vq2m] … eqp[dqp1:vqp1, …, dqpm:vqpm]>
Find a function F: Trace  Boolean such that
• F is an accurate approximation of the given labeling
• F is explainable, e.g. set of simple rules

L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
106-130 statements
IF |“NursingProgressNotes”| > 7.5
THEN L1
IF |“Nursing Progress Notes”| ≤ 7.5
AND |“Nursing Assessment”| > 1.5
THEN L2
…
H. Nguyen, M. Dumas, M. La Rosa, F. Maggi, S. Suriadi: Mining Business Process Deviance: A Quest for Accuracy.
CoopIS, 2014

Process
mining
Automated
process
discovery
Performance
analysis
Conformance
analysis
Log delta
analysis
Process mining methods
/15
4,318
14
14
858
13
7,128
26
3,794
32
31
734 28
6,212
9
1,526
941
4,324
258
186
4,360
4,360
Created
4,360
Waiting for Support
12,587
8,681
Resolved
5,023
Closed
4,360
923
Escalation
42
14
Waiting for Triage
31
A ⇒ B

1. Compliance auditing
• detect deviations with respect to a normative model (unfitting behavior)
2. Model maintenance
• unfitting behavior
• additional model behavior
3. Automated process model discovery
• Iterative model improvement
Conformance analysis

Given an event log L and a process model M, explain the
differences between L and M in terms of process behavior
Log Model

State of the art: Trace alignment
Log Model
A B C DA B B C
Trace alignment
E
W. van der Aalst, A. Adriansyah, B. van Dongen: Replaying history on process models for conformance checking and performance analysis.
Wiley.: Data Mining and Knowledge Discovery 2(2): 2012
ABBCE13
E H

Trace alignment: typical output
A B C H E I J K C D I J K C E G
A B C H E I J K C D I J K C E
A B C H E I J K C E I K CJ F
A B C H E I J K C D I J K G
A B C H E I J K C D I J K G
A B C H E I J K C E I KJ
A B C H E I J K C E I KJ
A B C D I J K C I J KE G
A B C D I J K I J K C E G
A B C H E I J K C I KJH
H
H
H
H
H
A B C H E I J K C I KJH
A B C H I J K C E I KJH
A B C H E I J K I K CJ FH
A B C H E I J K I K CJ FH
A B C D I J K C I J KEH
A B C H E I J K I KJC D
A B C H E I J K I KJC D
A B C H E I J K I KJH
A B C H E I J K I KJH
A B C H E I J K GEC
A B C H E I J K GEC
A B C H E I J K EC
A B C H E I J K EC
A B C H I J K EC G
A B C D I J K GEC
A B C H I J K C F
A B C H I J K C F
A B C H I J K G
A B C H E I J K
A B C GE
A IE J K
A GE
Activity occurs in the log only,
but occurs in the model in another path
Activity occurs in the model only
and is not observed anywhere in the log
Activity occurs in the model only,
but occurs in the log in another trace
Activity occurs both in the model and the log
Legend

Trace alignment: shortcomings
Designed to identify the number and exact location of
the differences
Doesn’t provide a “high-level” diagnosis that easily
allows analysts to pinpoint differences:
• Unable to identify differences across traces
• Unable to fully characterize extra model behavior not
present in the log

No unified foundation…
Variants analysis
• Model delta analysis, sequence classification
• Trace alignment, token replay, negative events…
≠

Identify all differences between the process behaviors:
• of two logs (variants analysis)
• of a model and a log (conformance analysis)
Describe each difference via a natural language
statement
Fully automated, scalable
Solution requirements

An example (conformance analysis)
Desired conformance output:
• task C is optional in the log
• the cycle including IGDF is not observed in the log
Log
ABCDEH
ACBDEH
ABCDFH
ACBDFH
ABDEH
ABDFH
Model
ABDEH
ABDFH

Prime Event Structure (PES) as a unifying foundation
Model of concurrency based on events (occurrences
of process activities) and three relations:
• Causality
• Conflict
• Concurrency
causal
conflict
concurrent

From log to PES
Log
Trace Ref N
A B C E t1 3
A C B E t2 2
A B E t3 2
A D E t4 3
Runs
e0:A
e1:B e2:C
e3:E
f0:A
f1:B
f2:E
g0:A
g1:D
g2:E
t1, t2 → p1 t3 → p2 t4 → p3
PES
{e0,f0,g0}:A
{e1,f1}:B
{f2}:E {e3}:E {g2}:E
{e2}:C {g1}:D

From model to PES
BPMN model
Petri net
Branching process

From model to PES
Branching process
Complete prefix unfolding
Cutoff event
Corresponding
event
Cutoff event
Corresponding
event

From model to PES
Complete prefix unfolding
PES

Loop relations
A
C
D
D
A
B
C
D
B
C

Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
Trace Ref N
A B C E t1 3
A C B E t2 2
A B E t3 2
A D E t4 3
A
B
D
E
C
f0:A
f1:B f2:C f3:D
f4:E f5:E

match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E

match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
In the log, C is optional
after {A,B}, whereas in
the model it is not
match Dmatch C

Mismatch patterns (conformance analysis)
Unfitting behavior patterns:
• Relation mismatch patterns
1. Causality-Concurrency
2. Conflict
• Event mismatch patterns
3. Task skipping
4. Task substitution
5. Unmatched repetition
6. Task relocation
7. Task insertion / absence
Additional model behavior patterns:
8. Unobserved acyclic interval
9. Unobserved cyclic interval
L. Garcia-Banuelos, N.R. van Beest, M. Dumas, M. La Rosa, W. Mertens, Complete and Interpretable Conformance Checking of Business
Processes, IEEE Transactions on Software Engineering, 2017
3. Task skipping

Additional model behavior: precision vs generalization
Log
⟨A⟩
⟨A,A⟩
⟨A,A,A⟩
In the log, the cycle involving [A] does not occur

Additional model behavior: precision vs generalization
Log
⟨A⟩
⟨A,A⟩
⟨A,A,A⟩
No difference found!

Unobserved cyclic interval: PES and PES prefix unfolding
A
B
C
D
Log PES
Model PES

Difference
statements
Event log
Input model
PESM
unfold
PESL
merge
Partially
Synchronized
Product (PSP)
compare
extract
differences
Difference
statements
Event log
Input model
PESM
unfold
PESL
merge
Partially
Synchronized
Product (PSP)
compare
extract
differences
Approach recap
Difference
statements
Event log
Input model
PESM
unfold
PESL
merge
Partially
Synchronized
Product (PSP)
compare
extract
differences
22

Coming back to our example (variants analysis)
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
106-130 statements
IF |“NursingProgressNotes”| > 7.5
THEN L1
IF |“Nursing Progress Notes”| ≤ 7.5
AND |“Nursing Assessment”| > 1.5
THEN L2
…
Our approach (PSP-based)
48 statements
In L2, “Nursing Primary Assessment”
is repeated after “Medical Assign”
and “Triage Request”, while in L2 it is
not
…
N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs.
BPM 2015: 386-405

Evaluation (conformance analysis)
1. Qualitative evaluation on real life process:
• Traffic fines management process in Italy
(2000-2013; 150,370 traces; 231 distinct traces)
2. Quantitative evaluation on two large process model collections:
• IBM Business Integration Unit (BIT): 735 models
• SAP R/3: 604 models
3. User evaluation (academics vs practitioners)

Qualitative evaluation: traffic fines model
Start Create
Fine
Payment
Send
Fine
Insert
Fine
Notification
Add
Penalty
Appeal
to Judge
Send for
Credit
Collection
Notify
Result
Appeal to
Offender
Insert Date
Appeal to
Prefecture
Receive
Result
Appeal from
Prefecture
Send
Appeal
to Prefecture
End
Tau10
Created from the process specification

Qualitative evaluation: trace alignment output
406 misalignments to inspect (out of 412 alignments)!

Qualitative evaluation: output of our approach
15 statements, e.g.
1. In the log, “Send for credit collection” occurs after
“Payment”
2. In the model, after “Insert fine notification”, “Add penalty”
occurs before “Appeal to judge”, while in the log they are
concurrent
3. In the log, after “Add penalty”, “Receive results appeal from
prefecture” is substituted by “Appeal to judge”
4. In the log, the cycle involving “Insert date appeal to
prefecture, Send appeal to prefecture, Receive result appeal
from prefecture, Notify result appeal to offender” does not
occur after “Insert fine notification”
5. …
Cannot be detected by trace alignment,
as diagnostics are provided at the level
of individual traces
Cannot be entirely detected by trace
alignment, as this difference
concerns additional model behavior

Quantitative evaluation
• For each model in the SAP R/3 and IBM BIT collections, we
generated an event log artificially
• Injected different levels of noise (0-20%) to simulate differences
• Total logs: 712
Results:
• Generally slower, but reasonable execution times: < 10 sec
• Extreme cases (8,000+ events, 15-20% noise): < 2 min
• Consistently more compact diagnosis than trace alignment

User evaluation
Online survey:
• Simple Petri net model with 31 nodes, created from a real-life
claims handling process
• small size to avoid understandability bias
• anonymized to avoid domain bias
• Accompanied by a log with 53 traces
Output of trace alignment (misalignments)
vs
Output of our approach (list of statements)

User evaluation
Responded stated their experience (years, models created and analyzed) and
expertise in Petri nets (familiarity, competence and confidence)
Respondents compared both approaches using the Technology Acceptance Model:
1. What is the easiest approach for checking the conformance of an event log to
a process model?
2. What is the easiest approach for identifying the differences between a process
model and an event log?
3. What is the most useful approach for checking the conformance of an event
log to a process model?
4. What is the most useful approach for identifying the differences between a
process model and an event log?
5. Which approach would you likely use for checking the conformance of an
event log to a process model?
6. Which approach would you likely use for identifying the differences between a
process model and an event log?

User evaluation: population
Academics (38 responses)
• Expertise: more familiar, confident and competent in working with Petri nets
• Experience: analysed and created more models in the past 12 months
Professionals (33 responses)
• Less expert and experienced with Petri nets
• Mostly rely on professional training (higher than academics)

User evaluation: hypotheses
H1: respondents would have a preference for verbalization
H2: respondents with less experience, familiarity, confidence and
competence in the use of Petri nets would have a stronger
preference for verbalization

User evaluation: results
H1: preference for verbalization
• Tested for the full sample and for the two cohorts separately
• For the full sample there is no general preference for our approach: the
median was zero (“neutral”)
• Professionals did show a preference for verbalization (especially along
ease of use) while academics preferred alignment
• H1 is supported for the professionals cohort only
H2: little knowledge of Petri nets -> stronger preference
• Respondents with more experience with and expertise in Petri nets have
a stronger preference for alignments
• H2 is supported

Pushing it a bit further… Process model repair
• Rank statements based on impact
• Visualize differences on top of BPMN model
• Repair process model interactively and incrementally
A. Armas Cervantes, N. van Beest, M. La Rosa, M. Dumas, L. Garcia-Banuelos, Interactive and Incremental Business Process Model Repair,
CoopIS 2017

Tool support: Apromore (apromore.org)
• Open-source BPM analytics platform as Software as a Service
• Focus is on end users (business analytics), not on data scientists
• 50+ plugins
!
!

Nirdizati: predictive process monitoring (nirdizati.com)
• Predict process outcome (e.g. “Is this loan offer going to be rejected?”)
• Predict process performance (e.g. “Will this claim take longer than 5 days to be
handled?”)
• Predict future events (e.g. “What activity is likely to be executed next? And after that?”)

BPM Discipline
Information Systems School
Science & Engineering Faculty
Queensland University of Technology
m.larosa@qut.edu.au
marcellolarosa.com
@mlr80

Interpretable Process Mining: shifting control to end users

Recommandé

Recommandé

Contenu connexe

Similaire à Interpretable Process Mining: shifting control to end users

Similaire à Interpretable Process Mining: shifting control to end users (20)

Dernier

Dernier (20)

Interpretable Process Mining: shifting control to end users