Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Process Mining for ERP Systems

2 654 vues

Publié le

Presentation held at the 1st Workshop for Data- and Artifact-Centric Processes, co-located with BPM 2012, September 2012.

Publié dans : Technologie, Business
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Process Mining for ERP Systems

  1. 1. Erik Nooijen, Boudewijn v. Dongen, Dirk FahlandProcess Mining for ERP Systems
  2. 2. Process Discovery process event process discovery log model algorithm c1: A B C D E assumptions c2: A C B D E • case = sequence of events of this case c3: A F D E • cases are isolated: event A in c1 happens only in c1 (and not in c2) … • cases of the same process • one unique case id, • each event associated to exactly one case id PAGE 1
  3. 3. Typical Process in an ERP System Manufacturer Material A Material B order Material B Material B product X orderAlice materials ACME Inc. Material B Material A order Material C Material C product Y orderBob materials Build to Order Mega Corp. PAGE 2
  4. 4. n-to-m relations  database process process discovery model algorithmid attributes time-stamp attributes ProductOrder CustomerpoID cust. … created processed built shipped cust. address …po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 Alice … …po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 Bob … … relations data attributes OrderedMaterial id attributes MaterialOrderpoID moID type added moID suppl. … completed sent receivedpo1 mo3 B 30-08 13:13 mo3 ACME 30-08 13:15 30-08 14:15 01-09 9:05po1 mo4 A 30-08 13:14 mo4 MEGA 30-08 13:17 30-08 16:12 01-09 10:13po2 mo3 B 30-08 13:15po2 mo4 C 30-08 13:16 relations PAGE 3
  5. 5. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DBProductOrder - cust 1 -… • events stored as time-stamped- poID- cust attributes in tables- created OrderedMat. MaterialOrder- processed - poID- built 1 - moID - moID • multiple primary keys- shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 4
  6. 6. Process Discovery for ERP Systems process process discovery model algorithm 0..* Customer reality: data in a relational DBProductOrder - cust 1 -… • events stored as time-stamped- poID- cust attributes in tables- created OrderedMat. MaterialOrder- processed - poID- built 1 - moID - moID • multiple primary keys- shipped 1..* - supplier  multiple notions of case - type 1..* - completed - added 1 - sent - received • tables are related  one event related to multiple cases PAGE 5
  7. 7. Outline process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 6
  8. 8. Find Artifact Schemas process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 7
  9. 9. Step 0: discover database schema document schema vs. actual schema  identify • column types (esp. time-stamped columns) • primary keys • foreign keys various (non-trivial) techniques available key discovery is NP-complete in the size of the table(s) result: PAGE 8
  10. 10. Step 1: decompose schema into processes= schema summarization find: 1. sets of corresponding tables 2. links between those ProductOrder MaterialOrder PAGE 9
  11. 11. Automatic Schema Summarization= group similar tables through clustering define a distance between any 2 tables • by relations • by information content tables that are close to each other  same cluster # of clusters: user input PAGE 10
  12. 12. Automatic Schema Summarization1. structural distance A between tables 1 2 fanout: 1 = (2+0)/2 fanout ~ avg. # of child fanout: 1 records related to the fanout: 2 same parent record A B A B A B 1 X 1 X 1 X 2 Y 1 Y 1 Y 2 Z 2 U PAGE 11
  13. 13. Automatic Schema Summarization1. structural distance A between tables 1 2 fanout: 1 fanout ~ avg. # of child fanout: 1 m.fr: 2 = 1/ (1/2) records related to the m.fr: 1 fanout: 2 same parent record m.fr: 1 A B A B A B matched fraction ~ 1 X 1 X 1 X 1 / (fraction of records in 2 Y 1 Y 1 Y parent with matching child 2 Z record) 2 U PAGE 12
  14. 14. Grouping by Clustering1. structural distance2. information distance importance of each table = entropy (is maximal if all records are different) distance: 2 tables with high entropies  large distance3. weighted distance by structure + information4. k-means clustering: most important table of cluster k clusters based on = table with least distance to all  key attribute of the cluster weighted distance PAGE 13
  15. 15. Artifact Schema  Artifact Log process model related by primary foreign-key relations decompose by primary keys model f. log f. discovery PO log f. model f. MO PO MO discovery PAGE 14
  16. 16. Log Extraction cluster = set of related tables + primary key of most important table case id poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15po2: po2 mo4 C 30-08 13:16 PAGE 15
  17. 17. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, …) po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 16
  18. 18. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 17
  19. 19. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 po2 mo4 C 30-08 13:16 PAGE 18
  20. 20. Log Extraction cluster = set of related tables + primary key of most important table case id time-stamped attribute  event related attributes  event attributes poID cust. … created processed built shipped log f. PO po1 Alice 30-08 9:22 30-08 13:12 01-09 15:12 03-09 10:15 po2 Bob 30-08 10:15 30-08 13:14 01-09 16:13 03-09 17:18 poID moID type added po1 mo3 B 30-08 13:13po1: (created, poID=po1, time=30-08 9:22, cust.=Alice, …)po1 mo4 A 30-08 13:14 (processed, poID=po1, time=30-08 13:12, …) po2 mo3 B 30-08 13:15 (added, poID=po1, time=30-08 13:13, moID=mo3, …)po2 mo4 C 30-08 13:16 refers to artifact “MaterialOrder” PAGE 19
  21. 21. Outline process model compose by primary foreign-key relations decompose by primary keys model f. log f. discovery order log f. model f. order quote quote discovery PAGE 20
  22. 22. Resulting Model(s) Product Order Material Order 1..* added create completed processed added 1..* sent built received shipped (addded, poID=po1, …, moID=mo3) PAGE 21
  23. 23. Implementation & Evaluation prototype tool • input: relational database (via JDBC), .csv tables • steps − discover database schema (types, keys, relations) − discover artifact schema − by k-means clustering − by user picking tables − extract logs  ProM PAGE 22
  24. 24. Evaluation: SAP System of Sligro > 300 tables, > 40 GiB of data schema extraction time-stamp attributes: 15 hrs primary keys: 4 hrs foreign keys: 5 hrs (single col)/ 6 days (double col.) clustering entropies: 17 hrs table distances: 5 hrs clustering: a few seconds ~20 different artifacts found largest: 47 tables, 869 columns log extraction extract 1000 traces of > 246,000 events query database: 1 hrs write log file: 32 hrs PAGE 23
  25. 25. Sligro: Artikel lifecycle model PAGE 24
  26. 26. Open issues performance • key discovery: NP-complete in R (# of records) • foreign key discovery: NP-complete in R2 • problem is in the “hard part” of NP •  sampling of data, domain knowledge, semi-automatic requires good database structure • proper relations, proper keys • otherwise wrong clusters are formed • events don’t get right attributes •  semi-automatic approach events shared by multiple cases… working on it… PAGE 25
  27. 27. Erik Nooijen, Boudewijn v. Dongen, Dirk FahlandProcess Mining for ERP Systems

×