This document discusses managing large collections of business process models. It presents challenges in discovering process models from event logs, coping with evolutionary changes to processes over time, and aligning new event logs with existing process model collections. Approaches are proposed to address each challenge, including techniques for trace clustering, process discovery, merging similar models, and detecting concept drift in processes. The approaches aim to create process model collections that can self-adapt to changing organizational processes and behaviors observed in event logs.
2. Research
• Technology oriented
• Business oriented
Teaching
• BPM specializations
• Master’s of BPM
Service
• Professional training
• Consultancy
BPM Discipline @ QUT
http://bpm-research-group.org
3. Each process is varied by product & brand…
End to end insurance process
Source: Guidewire reference models
Total number of insurance models: 3,000+
30
variations
500
tasks
Home
Motor
Commercial
Liability
CTP / WC
A few years back… Suncorp insurance
4. Managing large process model collections
versions & variants management
merging
refactoring /
standardization
clone
detection
Process model
repository
querying
similarity
search
80%
R. Dijkman, M. La Rosa, H. Reijers “Managing large collections of business process models – Current techniques and challenges”, COMIND 2012
0
10
20
30
40
2000 2002 2004 2006 2008 2010 2012
5. The Apromore Initiative (apromore.org)
An open-source, highly scalable, SaaS platform to manage
process model collections
M. La Rosa, H. Reijers, W. van der Aalst, R. Dijkman, J. Mendling, M. Dumas, L. Garcia-Banuelos “APROMORE: an advanced process model
repository”, EXP.SYS.APP. 2011
6. “Build awareness”
Understand differences and causes for these differences
“Achieve simplification”
Identify and consolidate common business functions
“Achieve centralisation”
Centralise support for non-core processes across LOBs
“Identify opportunities for partnering”
Make better decisions about the processes you can partner/run in-house
Expected benefits (beginning of the project)
7. “The tool is great but it would be pretty useless because our
process models aren’t great and we know they aren’t great” (*)
“If they [the models] would have been updated all along it
would have been worthwhile but now they’re out of date, it’s
not really worth the effort of bringing them up to date” (**)
(*) Realistic Suncorp employee
(**) Disillusioned Suncorp employee
The reality (end of the project)
8. Large process model collections, hard to maintain
Large collections of “dead” process models
The other face of the medal
9. Vision: and if the collection could self-adapt?
Change is endemic to organizations and continuously affects them:
Requirements change
Environments change
Processes change
10. If an organization’s processes change, this will be recorded in
the systems logs
Use process mining techniques to “discover” process changes
from logs, and apply these changes to process model collections
Release the full potential of management techniques for
process model collections
Let’s get more concrete
/
event log
live event stream
database
process model
patterns
conformance
analysis
process
performance…
if A then B
extract
process
knowledge
11. A process model collection that is
• Aligned with organizational behavior
• Can self-adapt to evolving organizational behavior
Solution: “Liquid” process model collection
W.M.P. van der Aalst, M. La Rosa, A.H.M. ter Hofstede, M.T. Wynn, Liquid Business Process Model Collections. In Modeling and
Simulation-based Systems Engineering Handbook, 2014
12. 1. Discovering a collection in the first place
2. Coping with evolution
3. Aligning logs with an existing process model collection
Approach: 3 interrelated challenges
13. Event log
Challenge 1: Discovering a collection
Process discovery
algorithm
Current situation: discovering “all-in-one” models
14. a b a b c d e c a f g h
a b a b k d e c h f g h
b c p q p r a k q r s
b c p h p r a k q r s
a x p h y z t t u
Trace
clustering
a b a b c d e c a f g h
a b a b k d e c h f g h
b c p q p r a k q r s
b c p h p r a k q r s
a x p h y z t t u
Cluster 1
Cluster 2
Noise
Event log
Process variant 1
Process variant 2
Trace clustering
15. Our approach: Slice, Mine and Dice (SMD)
L. Garcia-Banuelos, M. Dumas, M. La Rosa, J. De Weerdt, C.C. Ekanayake. Controlled Automated Discovery of Collections of
Business Process Models. Information Systems, 2014
slice the log horizontally
per variant
dice the discovered models
hierarchically
mine
20. F10
F11 F13
F14F12
M2 RPST of M2
F20
F21 F22
F24F23
F25
RPST of M3
Refined Process Structure Tree (RPST)
J. Vanhatalo, H. Volzer, J. Koehler: The Refined Process Structure Tree. Data Knowl. Eng., 2009
M3
21. M2
M3
F14 F25
M. Dumas, L. García-Bañuelos, M. La Rosa, and R. Uba. Fast Detection of Exact Clones in Business Process Model Repositories.
Information Systems, 2013
RPSDAG
F10
F11 F13
F14F12
RPST of M2
F20
F21 F22
F24F23
F25
RPST of M3
23. F12
F22M3
+
S3
+
S3
?
?
Extracting approximate clones
M. La Rosa, M. Dumas, C. Ekanayake, L. Garcia-Baneulos, J. Recker, A.H.M. ter Hofstede, Detecting Approximate Clones in
Business Process Model Repositories. Information Systems, 2015
Appr. clones:
• SESE
• Non-trivial
• Similar
• Unrelated
+
+
M2
24. Merging
algorithm
Fragment F12 of model M2
Fragment F22 of model M3
Configurable
gateway
Configurable
label
M. La Rosa, M. Dumas, R. Uba, and R. M. Dijkman. Business Process Model Merging: An Approach to Business Process
Consolidation. ACM TOSEM, 2013.
Merging approximate clones
S4
26. Trace clustering
• M. Song, C.W. Gunther, and W.M.P. van der Aalst, Improving Process Mining
with Trace Clustering, J. Korean Inst. of Industrial Engineers 34(4), 2008
• R.P.J.C. Bose, W.M.P. van der Aalst, Trace Clustering Based on Conserved
Patterns: Towards Achieving Better Process Models, BPM 2009 Workshops
• A.K.A. de Medeiros, A. Guzzo, G. Greco, W.M.P. van der Aalst, A.J.M.M.
Weijters, B.F. van Dongen, D. Saccà. Process Mining Based on Clustering: A
Quest for Precision, BPM Workshops 2007
Discovery
• A.J.M.M. Weijters, J.T.S. Ribeiro. Flexible Heuristics Miner (FHM), CIDM,
2011.
Evaluation setup
Log Traces Events
Event
classes
Duplication
ratio
Motor 4,293 33,202 292 114
Commercial 4,852 54,134 81 668
BPI 2012 5,312 91,949 36 2,554
27. Evaluation – repository size and models number
S: Song et al.
B: Bose et al.
M: de Medeiros et al.
• up to 64% reduction in repository size
• up to 66% reduction in # of top level process models
• up to 120 sub-processes extracted
Motor Comm BPI Motor Comm BPI
14%
22%
66%
64%
29. concept
drift
log at time2 > time1
A
B
C
X
E
F
Y
A
B
C
X
E
Y
B
C
C
X
E
B
C
C
X
E
B
C
C
X
E
E
A
B
C
X
D
F
Y
Challenge 2: Coping with evolution
log at time1
A
B
C
D
E
F
G
A
B
C
D
E
G
B
C
C
D
F
B
C
C
D
E
B
C
C
D
E
E
A
B
C
D
D
F
G
liquid process
model collection
(currently in use)
intentional changes
since last version
process
stakeholder
liquid process
model collection
(consolidated)
liquid process
model collection
(from new behavior)
non-transient changes
since last log
x
y
x
y
y
30.
31. A time point when there is a statistically significant difference
between the observed process behavior before and after this
point
Concept drift in a single process
33. 1. Fully automated
2. Highly scalable (online use)
3. Highly accurate
- types of drifts detected
- delay in detecting the drift
4. Explainable
Requirements
34. • Statistically significant difference in process behavior, i.e.
“when are two processes different?”
• Use an appropriate data structure to encode process behavior
Partial order runs of a process where concurrency is explicitly captured >
configuration equivalence
• Process drift = time point when there is a statistically
significant difference in the distribution of the runs before and
after (for a given time window size)
Our approach: ProDrift
A. Maaradji, M. Dumas, M. La Rosa, A. Ostovar, Fast and accurate business process drift detection. In BPM 2015
35. 1. Starting from an event log, we consider completed traces
2. For each new trace 𝜎𝑖:
• update the concurrency relation
• transform trace 𝜎𝑖 into run 𝜋𝑖 by encoding the associated concurrency
relation
From a stream of traces to a stream of runs
Stream of tracesStream of runs
36. 1. Define two juxtaposed sliding windows (reference and
detection) forming the 2𝑤 most recent runs
2. Consider the runs as observations of a categorical variable,
one per window
3. Apply the Chi-square test of independence between the two
windows
Reference window
Point of the
hypothesis test
Detection window
𝜋𝑖+2𝑤𝜋𝑖+1
𝜋𝑖+𝑤 𝜋𝑖+𝑤+1
Chi-square test of independence
P-value < threshold
Stream of runs
37. The detection delay d is the distance between the actual drift
and the last trace read in order to detect a drift
To avoid sporadic stochastic oscillations of P-value, we have a
drift when P-value < threshold for 𝜙 consecutive tests
Detection delay and noise filter
𝜋𝑖+2𝑤
Actual drift
d
Reference window
Point of the
hypothesis test
Detection window
𝜋𝑖+1
𝜋𝑖+𝑤 𝜋𝑖+𝑤+1
Stream of runs
38. The choice of the windows size is critical for drift detection:
• a higher variation needs more observations
• a lower variation needs less observations
We use an adaptive window technique to have a more reliable
statistical test based on the evolution ratio
Adaptive window
Reference window Detection window
Reference window Detection window
𝜋𝑖+2𝑤𝜋𝑖+1
𝜋𝑖+𝑤 𝜋𝑖+𝑤+1
Stream of runs
𝑇𝑗
𝑇𝑗+1
40. We generated a benchmark dataset of 72 logs by simulating a
textbook example (loan origination process) using BIMP
Injected 18 different change patterns
For each pattern, we generated 4 logs of different lengths
(2,500, 5,000, 7,500 and 10,000 traces)
Evaluation: synthetic dataset
41. Change patterns from Weber et al.
12 simple change patterns:
+ 6 complex change patterns (3 nested patterns each):
IRO, IOR, ORI, OIR, RIO, ROI
Weber, B., Reichert, M., Rinderle-Ma, S.: Change patterns and change support features-enhancing flexibility in
process-aware information systems. DKE 66(3), 2008
42. Drift injection – gold standard
Each drift injected 9 times by composing 10 sublogs
juxtaposition
simulation
43. Time performance: time required to perform a new statistical test
- min: 0.26ms
- max: 2.3ms
- mean: 0.5ms (real time)
Accuracy:
- F-score
- Mean delay
Evaluation measures
48. • Log from claims management system of a large Australian
insurance company
• 4,509 traces, 29,108 events with 12 event classes
Evaluation on real-life log
49. • Results validated with a business analyst from the insurance
company
• Distribution of the number of active cases over log timeline
confirms the results
Evaluation on real-life log
50. How to explain what happened?
Reference window Detection window
𝜋𝑖+2𝑤𝜋𝑖+1
𝜋𝑖+𝑤 𝜋𝑖+𝑤+1
N.R. van Beest, M. Dumas, L. Garcia-Banuelos, M. La Rosa, Log delta analysis: Interpretable differencing of business process event logs. In
BPM 2015
Event structure1 Event structure2
MERGE MERGE
Runs1
Runs2
PSP
Before the drift, task “Emit invoice”
could be repeated, afterwards not
anymore...
51. lack of accuracy
superfluous activity
missing activity
P P
Q Q
R >>
S >>
E >>
F F
- X
G -
Challenge 3: Aligning logs with existing collection
log
P
Q
R
S
E
F
G
A
B
D
A
B
C
D
E
P
Q
F
X
E
F
G
process model collection
A
D
B C
P
Q
F
X
A
D
B C
E
trace sub-
trace
activity
event
A A
B B
C C
D D
E E
full alignment
partial alignment
overall alignment
score = ?
Wil M. P. van der Aalst, Arya Adriansyah, Boudewijn F. van Dongen: Replaying history on process models for conformance checking and
performance analysis. Wiley Interdisc. Rew.: Data Mining and Knowledge Discovery, 2012
52. Academic Director (Corporate engagements)
BPM Discipline, IS School
Science & Engineering Faculty
Queensland University of Technology
m.larosa@qut.edu.au
marcellolarosa.com
@mlr80