This (scientific) presentation puts forward the idea of using event structures as a unifying foundation to handle the problems of variants analysis and conformance analysis in process mining. In a nutshell, whether one is comparing two event logs (each capturing a different business process variant, aka "variants analysis") or an event log capturing the actual process behavior and a normative process model (aka "conformance analysis"), event structures can be derived from event logs and from process models and compared against each other. The result is a set of statements, in natural language, as well as visualizations overlaid on a BPMN model, describing the differences between the two logs or between a log and a model. Long story short, using this approach the results can be interpreted by an end user (say a business analyst), as opposed to state-of-the-art techniques for variants analysis and conformance analysis.
3. /process mining
algorithms
live data
historical data
process model
differences,
root-causes…
conformance
report
process
performance
A ⇒ B
“actionable”
process
knowledge
Process mining in a nutshell
15
4,318
14
14
858
13
7,128
26
3,794
32
31
734 28
6,212
9
1,526
941
4,324
258
186
4,360
4,360
Created
4,360
Waiting for Support
12,587
Waiting for Customer
8,681
Resolved
5,023
Closed
4,360
Waiting for Internal
923
Escalation
42
Waiting for Approval
14
Waiting for Triage
31
6. Given two logs L1 and L2, explain the differences between the
two logs
Simple claims and quick Simple claims and slow
Variants analysis
MODEL
S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013
9. Variants analysis: possible approaches
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
• Manual visual inspection: time-consuming and error prone, or
• Automated sequence classification…
At an Australian hospital…
10. Variants analysis: possible approaches
• Manual visual inspection: time-consuming and error prone, or
• Automated sequence classification…
Sequence classification
t1: <e11[d111:v111, …, d11n:v11m] e12[d121:v121, …, d12m:v12m] … e1p[d1p1:v1p1, …, d1pm:v1pm]>
…
tq: <eq1[dq11:vq11, …, dq1n:vq1m] eq2[dq21:vq21, …, dq2m:vq2m] … eqp[dqp1:vqp1, …, dqpm:vqpm]>
Find a function F: Trace Boolean such that
• F is an accurate approximation of the given labeling
• F is explainable, e.g. set of simple rules
11. Variants analysis: possible approaches
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
Sequence classification
106-130 statements
IF |“NursingProgressNotes”| > 7.5
THEN L1
IF |“Nursing Progress Notes”| ≤ 7.5
AND |“Nursing Assessment”| > 1.5
THEN L2
…
H. Nguyen, M. Dumas, M. La Rosa, F. Maggi, S. Suriadi: Mining Business Process Deviance: A Quest for Accuracy.
CoopIS, 2014
• Manual visual inspection: time-consuming and error prone, or
• Automated sequence classification…
At an Australian hospital…
13. 1. Compliance auditing
• detect deviations with respect to a normative model (unfitting behavior)
2. Model maintenance
• unfitting behavior
• additional model behavior
3. Automated process model discovery
• Iterative model improvement
Conformance analysis
14. Given an event log L and a process model M, explain the
differences between L and M in terms of process behavior
Conformance analysis
Log Model
15. State of the art: Trace alignment
Log Model
A B C DA B B C
Trace alignment
E
W. van der Aalst, A. Adriansyah, B. van Dongen: Replaying history on process models for conformance checking and performance analysis.
Wiley.: Data Mining and Knowledge Discovery 2(2): 2012
ABBCE13
E H
16. Trace alignment: typical output
A B C H E I J K C D I J K C E G
A B C H E I J K C D I J K C E
A B C H E I J K C E I K CJ F
A B C H E I J K C D I J K G
A B C H E I J K C D I J K G
A B C H E I J K C E I KJ
A B C H E I J K C E I KJ
A B C D I J K C I J KE G
A B C D I J K I J K C E G
A B C H E I J K C I KJH
H
H
H
H
H
A B C H E I J K C I KJH
A B C H I J K C E I KJH
A B C H E I J K I K CJ FH
A B C H E I J K I K CJ FH
A B C D I J K C I J KEH
A B C H E I J K I KJC D
A B C H E I J K I KJC D
A B C H E I J K I KJH
A B C H E I J K I KJH
A B C H E I J K GEC
A B C H E I J K GEC
A B C H E I J K EC
A B C H E I J K EC
A B C H I J K EC G
A B C D I J K GEC
A B C H I J K C F
A B C H I J K C F
A B C H I J K G
A B C H E I J K
A B C GE
A IE J K
A GE
Activity occurs in the log only,
but occurs in the model in another path
Activity occurs in the model only
and is not observed anywhere in the log
Activity occurs in the model only,
but occurs in the log in another trace
Activity occurs both in the model and the log
Legend
17. Trace alignment: shortcomings
Designed to identify the number and exact location of
the differences
Doesn’t provide a “high-level” diagnosis that easily
allows analysts to pinpoint differences:
• Unable to identify differences across traces
• Unable to fully characterize extra model behavior not
present in the log
19. Identify all differences between the process behaviors:
• of two logs (variants analysis)
• of a model and a log (conformance analysis)
Describe each difference via a natural language
statement
Fully automated, scalable
Solution requirements
20. An example (conformance analysis)
Desired conformance output:
• task C is optional in the log
• the cycle including IGDF is not observed in the log
Log
ABCDEH
ACBDEH
ABCDFH
ACBDFH
ABDEH
ABDFH
Model
ABDEH
ABDFH
21. Prime Event Structure (PES) as a unifying foundation
Model of concurrency based on events (occurrences
of process activities) and three relations:
• Causality
• Conflict
• Concurrency
causal
conflict
concurrent
22. From log to PES
Log
Trace Ref N
A B C E t1 3
A C B E t2 2
A B E t3 2
A D E t4 3
Runs
e0:A
e1:B e2:C
e3:E
f0:A
f1:B
f2:E
g0:A
g1:D
g2:E
t1, t2 → p1 t3 → p2 t4 → p3
PES
{e0,f0,g0}:A
{e1,f1}:B
{f2}:E {e3}:E {g2}:E
{e2}:C {g1}:D
23. From model to PES
BPMN model
Petri net
Branching process
24. From model to PES
Branching process
Complete prefix unfolding
Cutoff event
Corresponding
event
Cutoff event
Corresponding
event
27. Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
Trace Ref N
A B C E t1 3
A C B E t2 2
A B E t3 2
A D E t4 3
A
B
D
E
C
f0:A
f1:B f2:C f3:D
f4:E f5:E
28. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
29. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
30. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
31. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
32. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
33. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
34. match B
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B}
rhide Cmatch C
lh = {}, rh = {f2:C}
m = {(e0,f0)A,(e1,f1)B}
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C}
lh = {}, rh = {}
m = {(e0,f0)A}
match A
lh = {}, rh = {}
m = {}
match E
lh = {}, rh = {}
m = {(e0,f0)A,(e1,f1)B,(e2,f2)C,(e5,f4)E}
Comparing PESs
Log PES Model PES
e0:A
e1:B e2:C e3:D
e4:E e5:E e6:E
f0:A
f1:B f2:C f3:D
f4:E f5:E
In the log, C is optional
after {A,B}, whereas in
the model it is not
match Dmatch C
35. Mismatch patterns (conformance analysis)
Unfitting behavior patterns:
• Relation mismatch patterns
1. Causality-Concurrency
2. Conflict
• Event mismatch patterns
3. Task skipping
4. Task substitution
5. Unmatched repetition
6. Task relocation
7. Task insertion / absence
Additional model behavior patterns:
8. Unobserved acyclic interval
9. Unobserved cyclic interval
L. Garcia-Banuelos, N.R. van Beest, M. Dumas, M. La Rosa, W. Mertens, Complete and Interpretable Conformance Checking of Business
Processes, IEEE Transactions on Software Engineering, 2017
3. Task skipping
36. Additional model behavior: precision vs generalization
Log
⟨A⟩
⟨A,A⟩
⟨A,A,A⟩
In the log, the cycle involving [A] does not occur
40. Coming back to our example (variants analysis)
L1 - Short stay
448 cases
7329 events
L2 - Long stay
363 cases
7496 events
Sequence classification
106-130 statements
IF |“NursingProgressNotes”| > 7.5
THEN L1
IF |“Nursing Progress Notes”| ≤ 7.5
AND |“Nursing Assessment”| > 1.5
THEN L2
…
Our approach (PSP-based)
48 statements
In L2, “Nursing Primary Assessment”
is repeated after “Medical Assign”
and “Triage Request”, while in L2 it is
not
…
N.R. van Beest, L. Garcia-Banuelos, M. Dumas, M. La Rosa, Log Delta Analysis: Interpretable Differencing of Business Process Event Logs.
BPM 2015: 386-405
At an Australian hospital…
41. Evaluation (conformance analysis)
1. Qualitative evaluation on real life process:
• Traffic fines management process in Italy
(2000-2013; 150,370 traces; 231 distinct traces)
2. Quantitative evaluation on two large process model collections:
• IBM Business Integration Unit (BIT): 735 models
• SAP R/3: 604 models
3. User evaluation (academics vs practitioners)
42. Qualitative evaluation: traffic fines model
Start Create
Fine
Payment
Send
Fine
Insert
Fine
Notification
Add
Penalty
Appeal
to Judge
Send for
Credit
Collection
Notify
Result
Appeal to
Offender
Insert Date
Appeal to
Prefecture
Receive
Result
Appeal from
Prefecture
Send
Appeal
to Prefecture
End
Tau10
Created from the process specification
44. Qualitative evaluation: output of our approach
15 statements, e.g.
1. In the log, “Send for credit collection” occurs after
“Payment”
2. In the model, after “Insert fine notification”, “Add penalty”
occurs before “Appeal to judge”, while in the log they are
concurrent
3. In the log, after “Add penalty”, “Receive results appeal from
prefecture” is substituted by “Appeal to judge”
4. In the log, the cycle involving “Insert date appeal to
prefecture, Send appeal to prefecture, Receive result appeal
from prefecture, Notify result appeal to offender” does not
occur after “Insert fine notification”
5. …
Cannot be detected by trace alignment,
as diagnostics are provided at the level
of individual traces
Cannot be entirely detected by trace
alignment, as this difference
concerns additional model behavior
45. Quantitative evaluation
• For each model in the SAP R/3 and IBM BIT collections, we
generated an event log artificially
• Injected different levels of noise (0-20%) to simulate differences
• Total logs: 712
Results:
• Generally slower, but reasonable execution times: < 10 sec
• Extreme cases (8,000+ events, 15-20% noise): < 2 min
• Consistently more compact diagnosis than trace alignment
46. User evaluation
Online survey:
• Simple Petri net model with 31 nodes, created from a real-life
claims handling process
• small size to avoid understandability bias
• anonymized to avoid domain bias
• Accompanied by a log with 53 traces
Output of trace alignment (misalignments)
vs
Output of our approach (list of statements)
47. User evaluation
Responded stated their experience (years, models created and analyzed) and
expertise in Petri nets (familiarity, competence and confidence)
Respondents compared both approaches using the Technology Acceptance Model:
1. What is the easiest approach for checking the conformance of an event log to
a process model?
2. What is the easiest approach for identifying the differences between a process
model and an event log?
3. What is the most useful approach for checking the conformance of an event
log to a process model?
4. What is the most useful approach for identifying the differences between a
process model and an event log?
5. Which approach would you likely use for checking the conformance of an
event log to a process model?
6. Which approach would you likely use for identifying the differences between a
process model and an event log?
48. User evaluation: population
Academics (38 responses)
• Expertise: more familiar, confident and competent in working with Petri nets
• Experience: analysed and created more models in the past 12 months
Professionals (33 responses)
• Less expert and experienced with Petri nets
• Mostly rely on professional training (higher than academics)
49. User evaluation: hypotheses
H1: respondents would have a preference for verbalization
H2: respondents with less experience, familiarity, confidence and
competence in the use of Petri nets would have a stronger
preference for verbalization
50. User evaluation: results
H1: preference for verbalization
• Tested for the full sample and for the two cohorts separately
• For the full sample there is no general preference for our approach: the
median was zero (“neutral”)
• Professionals did show a preference for verbalization (especially along
ease of use) while academics preferred alignment
• H1 is supported for the professionals cohort only
H2: little knowledge of Petri nets -> stronger preference
• Respondents with more experience with and expertise in Petri nets have
a stronger preference for alignments
• H2 is supported
51. Pushing it a bit further… Process model repair
• Rank statements based on impact
• Visualize differences on top of BPMN model
• Repair process model interactively and incrementally
A. Armas Cervantes, N. van Beest, M. La Rosa, M. Dumas, L. Garcia-Banuelos, Interactive and Incremental Business Process Model Repair,
CoopIS 2017
52. Pushing it a bit further… Process model repair
• Rank statements based on impact
• Visualize differences on top of BPMN model
• Repair process model interactively and incrementally
A. Armas Cervantes, N. van Beest, M. La Rosa, M. Dumas, L. Garcia-Banuelos, Interactive and Incremental Business Process Model Repair,
CoopIS 2017
53. Tool support: Apromore (apromore.org)
• Open-source BPM analytics platform as Software as a Service
• Focus is on end users (business analytics), not on data scientists
• 50+ plugins
!
!
54. Nirdizati: predictive process monitoring (nirdizati.com)
• Predict process outcome (e.g. “Is this loan offer going to be rejected?”)
• Predict process performance (e.g. “Will this claim take longer than 5 days to be
handled?”)
• Predict future events (e.g. “What activity is likely to be executed next? And after that?”)
55. BPM Discipline
Information Systems School
Science & Engineering Faculty
Queensland University of Technology
m.larosa@qut.edu.au
marcellolarosa.com
@mlr80