Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
S-CUBE LP: Mining Lifecycle Event Logs for Enhancing SBAs
1. Exploiting Knowledge on Past Process
Execution to Improve SBA Analysis
Mining Lifecycle Event Logs for
Enhancing SBAs
ISTI-CNR (CNR), TU Wien (TUW)
Franco Maria Nardini, Gabriele Tolomei, CNR
2. Learning Package Categorization
S-Cube
Monitoring and Analysis of SBA
Process Mining
Exploiting Knowledge on Past Process
Execution to Improve SBA Analysis
3. Connections to the S-Cube IRF
Conceptual Research Framework:
– Service Composition and Coordination
– Service Infrastructure
– Adaptation and Monitoring
Logical Run-Time Architecture:
– Monitoring Engine
– Adaptation Engine
– Negotiation Engine
– Runtime QA Engine
– Resource Broker
3
5. SBA Event Logs
Most complex software systems collect their lifecycle
usage data in event log files
SBA event logs contain several information about service
components exchanging messages
– e.g., service invocation, service failure, registry querying, etc.
Event logs represent a huge source of “hidden” information
(i.e., knowledge)
5
6. Mining SBA Event Logs
Data Mining algorithms and techniques allow extracting
valuable knowledge from event logs
Extracted knowledge may refer to several aspects:
– e.g., service usage patterns, service failure patterns, etc.
If properly exploited, such knowledge might help
improving the overall quality of the system:
– recommending frequent invoked services;
– avoiding/handling anomalous situations, etc.
6
7. Process Mining (PM)
Process Mining (PM) is an application of data mining
techniques to SBA event logs
PM aims at discovering structured process models
derived from patterns that are present in actual traces
of service executions
Each process is usually represented by a digraph and
the problem of PM has been modeled as:
– finite state machine [CW96]
– sequential pattern mining (SPM) [AGL98]
– Petri-net [vdAWM04]
7
8. Another Example: Web Search Engines
Web Search Engines (WSEs) are another example of
systems that benefit from mining their event log data (i.e.,
Query Logs)
Query Log Mining (QLM) has proven to be effective for
enhancing the overall performances of WSEs
We propose a QLM technique for identifying search
patterns (tasks) from the stream of queries recorded in
query logs [LOPST11]
8
10. Goal
Treat PM as an instance of the SPM problem
Detect frequent sequential patterns of service
invocation, i.e., services that are frequently co-invoked
within the same sequence
– e.g., service Y is usually invoked afterwards service X
Find which/how services are actually used
– service recommendation
– avoiding/handling anomalous situations
10
12. Sequential Pattern Mining
Event log might be viewed as sequences of events that
change with time (time-series)
We are interested in finding sequences of services that are
frequently invoked in a specific order, i.e., sequential patterns
Sequential Pattern Mining (SPM) is the process of extracting
sequential patterns whose support exceeds a predefined
minimal support threshold min_supp
12
13. PrefixSpan
One of the most efficient algorithm for finding sequential
patterns [PHMP01]
Mines the complete set of patterns but greatly reduces the
efforts of candidate subsequence generation
Takes only into account the chronological order between
events
- i.e., it only cares if X comes before Y without worrying about the
actual time interval
13
14. MiSTA
Hint: observing that two services are invoked really
close rather than far away to each other in a sequence
could lead to distinct conclusions
MiSTA [GNPP06] is able to deal with the actual time
interval between any two consecutive service
invocations
It needs a time threshold tau for specifying the
maximum time interval of events in a frequent
sequence
14
16. Data Set: VRESCo
VRESCo is the runtime environment for Service-oriented
Computing developed by VITALab@TUW
It collects usage data (i.e., events) in the form of XML log
file
VRESCo event log file contains information about: invoked
services, service rebinding, service failure, etc.
We only focus on service invocation events
16
23. Results
The service logs coming from the VRESCo runtime
environment contain frequent patterns of services;
Those patters contains information about: invoked services,
service rebinding, service failure, etc;
Those patterns could be collected by considering co-
occurring sequences and also by considering the time;
Such inferred knowledge can be used to enhance SBAs:
e.g., by means of novel design tools like service
recommendation.
23
25. Conclusions
Event logs collected by complex software systems
represent a huge source of information (knowledge)
Find sequences of frequently co-invoked services from
SBA event logs using Sequential Pattern Mining (SPM)
2 SPM algorithms run on top of a real-world SBA event log
(VRESCo): PrefixSpan, MiSTA
Experimental results show that some services are often
invoked together in a frequent sequence
Exploit such inferred knowledge to enhance SBAs: e.g., by
means of novel design tools like service recommendation
26. References
– [CW96] J. E. Cook and A. L. Wolf, “Discovering models of software processes
from event-based data”. Research Report Technical Report CUCS-819-96,
Computer Science Dept., Univ. of Colorado, 1996.
– [AGL98] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models
from Workflow Logs”. In Sixth International Conference on Extending Database
Technology, pp. 469–483, 1998
– [vdAWM04] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining:
Discovering Process Models from Event Logs”. IEEE Transactions on
Knowledge and Data Engineering, vol. 16, no. 9, pp. 1128–1142, Sep. 2004.
– [LOPST11] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei,
“Identifying task-based sessions in search engine query logs”, in WSDM ’11.
ACM, 2011, pp. 277–286.
– [PHMP01] J. Pei, J. Han, B. Mortazavi-Asl, and H. Pinto, “Prefixspan: Mining
sequential patterns efficiently by prefix-projected pattern growth,” in ICDE ’01.
IEEE, 2001
– [GNPP06] F. Giannotti, M. Nanni, D. Pedreschi, and F. Pinelli, “Mining
sequences with temporal annotations,” in SAC ’06. ACM, 2006, pp. 593–597.