This document discusses integrating computer log files from different sources for process mining. It presents a genetic algorithm inspired technique to merge log files by combining different matching indicators like trace identifiers, attribute values, and timestamps. An experiment uses simulated data to test the technique, showing it can correctly merge logs with increasing accuracy as the number of iterations rises. Future work includes optimizing the genetic algorithm and validating the approach using real-world case studies.
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Integrating Computer Log Files Genetic Algorithm
1. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
Integrating Computer Log Files
for Process Mining
A Genetic Algorithm Inspired Technique
Jan Claes
jan.claes@ugent.be
http://processmining.ugent.be
Ghent University, Belgium
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
2. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
1. Process Mining
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
3. A plane crashed... What happened?
Analyse the ‘black box’
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 3 / 24
4. A process failed... What happened?
Analyse the ‘black box’: look for historical data
Process Mining:
Reconstruct and analyse processes
From historical process data
• Log files
• Audit trails
• Database history fields/tables
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 4 / 24
5. Process Mining
Processes are supported by IT systems
IT systems record actual process data
Process data can be used to automatically
Discover process model
Check conformance with existing process info
Extend existing process model
Attention Process Mining
Only As-Is
Only (correctly) recorded information
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 5 / 24
6. Process Mining steps
Preparation
Collect data: find traces
Merge data: from different sources
Structure data: group per instance
Convert data: to tool specific format
Process mining
Make decisions, take action
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 6 / 24
7. Process Mining steps
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 7 / 24
8. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
2. Merging log files
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
9. Example
Product ordering: registered events:
Sales order: document creation (administration)
Delivery: truck load confirmation (warehouse)
Invoice: document creation (administration)
Logging
from administration software
from warehouse software
How to merge both log files?
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 9 / 24
10. Example 1
Administration Warehouse
SO1 SO > Inv SO1 Deliver
SO2 SO > Inv SO2 Deliver
SO3 SO > Inv SO3 Deliver
SO1 SO > Deliver > Inv
SO2 SO > Deliver > Inv
SO3 SO > Deliver > Inv
Merge based on matching trace identifiers
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 10 / 24
11. Example 2
Administration Warehouse
SO1 SO > Inv Del1 Deliver (SO1)
SO2 SO > Inv Del2 Deliver (SO2)
SO3 SO > Inv Del3 Deliver (SO3)
SO1 SO > Deliver > Inv
SO2 SO > Deliver > Inv
SO3 SO > Deliver > Inv
Merge based on matching attribute values
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 11 / 24
12. Example 3
Administration t1<t2<t3 Warehouse
<<
SO1 SO t > Inv t Arr1 Deliver t
1 3 t4<t5<t6 2
SO2 SO t > Inv t
6
<< Arr2 Deliver t
4 5
SO3 SO t > Inv t
t7<t8<t9 Arr3 Deliver t
7 9 8
SO1 SO > Deliver > Inv
SO2 SO > Deliver > Inv
SO3 SO > Deliver > Inv
Merge based on time information
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 12 / 24
13. Merging computer log files
Merge based on
Example 1: matching trace identifiers indicator 1
Example 2: matching attribute values indicator 2
Example 3: time information indicator 3
General solution
algorithm combining different indicators
Genetic algorithm
indicators build up fitness function
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 13 / 24
14. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
3. Genetic algorithm
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
15. Genetic algorithm
cross-over
survival of
the fittest
mutation
1st generation 2nd generation 3th generation
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 15 / 24
16. Genetic algorithm
Fitness function score
14 18 18
cross-over
27 29 28
survival of
the fittest
mutation
6 5 32
1st generation 2nd generation 3th generation
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 16 / 24
17. Genetic algorithm inspired technique
Find links between traces of both log files and
merge them chronologically in new log file
Steps
Make initial solution (best individual links)
Make pseudo-random changes
(try to improve score for one specific factor)
Evaluate (keep original or changed solution)
Stop condition (fixed amount of steps)
Only one solution, no cross-over
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 17 / 24
18. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
4. Experiment results
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
19. Experiment: proof of concept
Simulated data
Given model
Generate
• random set of logs
• single log (=solution)
Use merge algorithm to merge set of logs
Check resulting log with solution log
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 19 / 24
20. Experiment: proof of concept
Advantages of using simulated data
Solution is known
Controllable parameters
(e.g. noise, overlap, matching id)
Disadvantages of using simulated data
Limited internal validity (are results realistic?)
No external validity (results not generalisable)
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 20 / 24
21. Experiment results
Incorrect links related to total links identified
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 / 24
22. FACULTY OF ECONOMICS AND BUSINESS ADMINISTRATION
5. Discussion
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 21 June, 2011
23. Future work
Optimise genetic algorithm
Less incorrect links
Faster implementation (AIS algorithm)
Fitness function factors
Validation with real test cases
Ghent University DPO (Human Resources)
Century21 (Real Estate) & FlexPack (Packaging)
BNP Paribas Fortis (Finance)
...
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 23 / 24
24. Contact information
Jan Claes
jan.claes@ugent.be
http://processmining.ugent.be
Twitter: @janclaesbelgium
Faculty of Economics and Business Administration Jan Claes for INISET@CAiSE 2011
Department of Management Information and Operations Management 24 / 24