SlideShare une entreprise Scribd logo
1  sur  6
Region-Based Foldings in Process Discovery
ABSTRACT
A central problem in the area of Process Mining is to obtain a formal model that represents the processes
that are conducted in a system. If realized, this simple motivation allows for powerful techniques that can be
used to formally analyze and optimize a system, without the need to resort to its semiformal and sometimes
inaccurate specification. The problem addressed in this paper is known as Process Discovery: to obtain a formal
model from a set of system executions. The theory of regions is a valuable tool in process discovery: it aims at
learning a formal model (Petri nets) from a set of traces. On its genuine form, the theory is applied on an
automaton and therefore one should convert the traces into an acyclic automaton in order to apply these
techniques. Given that the complexity of the region-based techniques depends on the size of the input automata,
revealing the underlying cycles and folding the initial automaton can incur in a significant complexity
alleviation of the region-based techniques. In this paper, we follow this idea by incorporating region
information in the cycle detection algorithm, enabling the identification of complex cycles that cannot be
obtained efficiently with state-of-the-art techniques. The experimental results obtained by the devised tool
suggest that the techniques presented in this paper are a big step into widening the application of the theory of
regions in Process Mining for industrial scenarios.
Existing System
The global patterns that can be used to make predictions about the future has been one of the key
elements that have brought Data Mining to be one of the most relevant research areas in the last decades. Data
mining techniques can be applied naturally on large amount of data like databases or even the Internet, and with
GLOBALSOFT TECHNOLOGIES
IEEE PROJECTS & SOFTWARE DEVELOPMENTS
IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE
BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS
CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401
Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
the help of other disciplines like statistics or machine learning, can effectively reveal important patterns in many
scenarios such as health care, business or transportation. As in data mining, Process Discovery tries to reveal
patterns. However, the patterns aimed by Process Discovery techniques are process models, i.e., formal
representations of the processes of a system. Due to its different focus, Process Discovery techniques apply
disciplines different from the ones used in data mining, to allow for the derivation of both the statics and the
dynamics of a system process. Depending on the emphasis, different dimensions can be considered ranging
from social (the identification of communities) to control-flow (the identification of the complex interplay
between system’s tasks). In this work we consider the latter: discover a Petri net from a log, that is from a set of
traces corresponding to executions of a system. The first method to obtain a Petri net from a log was presented.
Disadvantages
To overcome this limitation, several extensions have been presented in the literature to widen the class
of Petri nets that the algorithm can discover.
The theory of regions was initially proposed to solve the synthesis problem: obtain a Petri net that has a
behavior equivalent to a given transition system.
Proposed System
The theory of regions was initially proposed to solve the synthesis problem: obtain a Petri net that has a
behavior equivalent to a given transition system. three conversions from a language to a TS were proposed,
namely sequence, multiset, and set. The main difference between them is how it is decided whether the
occurrence of an event in a trace produces a new state in the TS or just introduces an arc to an existing state.
Together with these conversions, a number of additional conversions producing smaller TSs by means of
abstractions have been proposed in the literature. Besides the sequence and multiset conversions, other
conversions have been proposed that can yield smaller TSs at the cost of sacrificing regions. We use the term
abstraction techniques to refer to them. The fundamental difference between all these methods and our proposal
is that, in our case, the set of sacrificed regions is controlled considering bounds that are already used by
process discovery tools, thus the compression of the TS does not involve a quality reduction.
Advantages
An advantage of region theory for process discovery is that it allows to perform label splitting.
The advantages offered by the theory of regions, there are two main reasons that hamper a wider
adoption of region-based Process Discovery methodologies in an industrial setting. One is their
sensitivity to noise.
The other hand the benefits for rbminer are twofold, since a smaller region basis reduces the amount of
regions to explore. In this case, both advantages (state and basis reduction) combine to achieve orders of
magnitude speedups.
Module
1. Get Input Text File
2. Discovery Sentence Word
3. Decided Sentence
4. Tandem Repeats
5. Sequence And Multiset Conversions
6. Counting Data
Module Description
Get Input Text File
The Process Discovery differs from synthesis in the knowledge assumption: while in synthesis one
assumes a complete description of the system, only a partial description of the system is assumed in Process
Discovery. Therefore, equivalence or bisimulation is no longer a goal to achieve. Instead, obtaining
approximations that succinctly represent the log under consideration are more valuable.
Discovery Sentence Word
The fact that a discovery algorithm returns a PN with a smaller language than desired is referred as
overfitting. A classical strategy to avoid overfitting is to allow the algorithms to restrict their output to k-
bounded PNs (kbounded discovery), usually for small values of k, as nets with high numbers of tokens are
considered harder to understand for humans than nets with fewer tokens. The particular k used in each case can
be either determined from the desired level of complexity of the resulting PN1 or the number of available
resources in the system (since places can represent resources).
Decided Sentence
The conversions from a language to a TS were proposed, namely sequence, multiset, and set. The main
difference between them is how it is decided whether the occurrence of an event in a trace produces a new state
in the TS or just introduces an arc to an existing state.
Tandem Repeats
The detection of unfolded cycles in an acyclic TS is a problem related to finding consecutively repeated patterns
in a string. The latter problem has been studied in several fields with many variations and under different
names, although it is often referred as the finding tandem repeats problem.
Sequence And Multiset Conversions
The sequence and multiset conversions, other conversions have been proposed that can yield smaller
TSs at the cost of sacrificing regions. We use the term abstraction techniques to refer to them. The fundamental
difference between all these methods and our proposal is that, in our case, the set of sacrificed regions is
controlled considering bounds that are already used by process discovery tools, thus the compression of the TS
does not involve a quality reduction.
Counting Data
The region-based approaches yield PNs that never reject a trace of the log, they are extremely sensitive
to noise. Hence, to be applicable, the approach presented in this paper must be preceded by a noise filtering
phase. The filtering can be done by clustering techniques or by outlier detection. Also, considering the
frequencies of the states is a possibility in our approach to distinguish between real and noisy states, because the
latter have often low frequency. For instance, only Parikh vector differences between frequent states could be
taken into account to differentiate real folding opportunities from spurious cycle unfoldings caused by noise. An
advantage of region theory for process discovery is that it allows to perform label splitting (i.e., to change the
label of some arcs in the TS so that an event is actually represented by a set of different events). Label splitting
is a technique that can help into improving the visualization of the PN, but also into avoiding to generalize too
much. This technique can also be used with the TSs produced by our approach. However, the splitting options
might be reduced as a consequence of arcs with the same label in the original TS that have been now merged
into one arc in the folded TS.
FLOW CHART
Region-Based Process Discovery
Get The Input Text File
Discovery Sentence Word
Sequence and Multiset Tandem Repeats Counting Data
CONCLUSION
The presents a novel technique for compacting a TS, one of the objects typically used in process discovery
algorithms. The two main characteristics of this technique makes it very attractive in the context of region-
based k-bounded process discovery: first, it is one of the most aggressive folding techniques in the literature,
and second, it preserves the important regions that are crucial for PN derivation. The use of folding techniques
that are region-aware like the one presented in this paper may be a crucial step to use region-based algorithms
for process discovery in industrial scenarios.
REFFERENCE
[1] W. van der Aalst, H. Reijers, and M. Song, “Discovering Social Networks from Event Logs,” Computer
Supported Cooperative Work, vol. 14, no. 6, pp. 549-593, 2005.
[2] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from
Event Logs,” IEEE Trans. Knowledge Data Eng., vol. 16, no. 9, pp. 1128-1142, Sept. 2004.
[3] A. de Medeiros, W. van der Aalst, and A. Weijters, “Workflow Mining: Current Status and Future
Directions,” Proc. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, pp. 389-
406, 2003.
[4] L. Wen, W. van der Aalst, J. Wang, and J. Sun, “Mining Process Models with Non-Free-Choice
Constructs,” Data Mining and Knowledge Discovery, vol. 15, no. 2, pp. 145-180, 2007.
[5] W. van der Aalst, A. de Medeiros, and A. Weijters, “Genetic Process Mining,” Proc. 26th Int’l Conf.
Applications and Theory of Petri Nets (ICATPN), pp. 48-69, 2005.
[6] A. Ehrenfeucht and G. Rozenberg, “Partial (Set) 2-Structures. Part I, II,” Acta Informatica, vol. 27, pp. 315-
368, 1990.

Contenu connexe

Tendances

SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDSSECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
Gyan Prakash
 
Optimal Configuration of Network Coding in Ad Hoc Networks
Optimal Configuration of Network Coding in Ad Hoc NetworksOptimal Configuration of Network Coding in Ad Hoc Networks
Optimal Configuration of Network Coding in Ad Hoc Networks
1crore projects
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection
1crore projects
 
Splay trees based early packet rejection mechanism against do s traffic targe...
Splay trees based early packet rejection mechanism against do s traffic targe...Splay trees based early packet rejection mechanism against do s traffic targe...
Splay trees based early packet rejection mechanism against do s traffic targe...
Anh Phan
 
RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDF
Thomas Aranda
 

Tendances (20)

New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...New Data Association Technique for Target Tracking in Dense Clutter Environme...
New Data Association Technique for Target Tracking in Dense Clutter Environme...
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDSSECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
SECURE & EFFICIENT AUDIT SERVICE OUTSOURCING FOR DATA INTEGRITY IN CLOUDS
 
011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map011_20160321_Topological_data_analysis_of_contagion_map
011_20160321_Topological_data_analysis_of_contagion_map
 
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
IRJET- Sampling Selection Strategy for Large Scale Deduplication of Synthetic...
 
A Template Matching Approach to Classification of QAM Modulation using Geneti...
A Template Matching Approach to Classification of QAM Modulation using Geneti...A Template Matching Approach to Classification of QAM Modulation using Geneti...
A Template Matching Approach to Classification of QAM Modulation using Geneti...
 
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET-  	  Clustering of Hierarchical Documents based on the Similarity Deduc...
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...
 
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...018 20160902 Machine Learning Framework for Analysis of Transport through Com...
018 20160902 Machine Learning Framework for Analysis of Transport through Com...
 
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
Signal Processing Approach for Recognizing Identical Reads From DNA Sequencin...
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...Clustering sentence level text using a novel fuzzy relational clustering algo...
Clustering sentence level text using a novel fuzzy relational clustering algo...
 
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
Domain-Specific Term Extraction for Concept Identification in Ontology Constr...
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
 
50120140503004
5012014050300450120140503004
50120140503004
 
Optimal Configuration of Network Coding in Ad Hoc Networks
Optimal Configuration of Network Coding in Ad Hoc NetworksOptimal Configuration of Network Coding in Ad Hoc Networks
Optimal Configuration of Network Coding in Ad Hoc Networks
 
Vchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joinsVchunk join an efficient algorithm for edit similarity joins
Vchunk join an efficient algorithm for edit similarity joins
 
Progressive Duplicate Detection
Progressive Duplicate DetectionProgressive Duplicate Detection
Progressive Duplicate Detection
 
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification? Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
Commentz-Walter: Any Better than Aho-Corasick for Peptide Identification?
 
Splay trees based early packet rejection mechanism against do s traffic targe...
Splay trees based early packet rejection mechanism against do s traffic targe...Splay trees based early packet rejection mechanism against do s traffic targe...
Splay trees based early packet rejection mechanism against do s traffic targe...
 
RFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDFRFNM-Aranda-Final.PDF
RFNM-Aranda-Final.PDF
 

En vedette (16)

MAKALAH
MAKALAHMAKALAH
MAKALAH
 
Becascomedor el periodico
Becascomedor el periodicoBecascomedor el periodico
Becascomedor el periodico
 
Akta klaustroa (zuzenketa)
Akta klaustroa (zuzenketa)Akta klaustroa (zuzenketa)
Akta klaustroa (zuzenketa)
 
The strokes 9 frame
The strokes 9 frameThe strokes 9 frame
The strokes 9 frame
 
Plataformas del comercio electronico
Plataformas del comercio electronicoPlataformas del comercio electronico
Plataformas del comercio electronico
 
The Ancient Indian Researcher
The Ancient Indian ResearcherThe Ancient Indian Researcher
The Ancient Indian Researcher
 
~~ ^^ Zodiac signs the main owner’s ( i.e. swamee ) ...
~~ ^^  Zodiac signs       the main  owner’s    ( i.e. swamee )               ...~~ ^^  Zodiac signs       the main  owner’s    ( i.e. swamee )               ...
~~ ^^ Zodiac signs the main owner’s ( i.e. swamee ) ...
 
Gilded agepolitics
Gilded agepoliticsGilded agepolitics
Gilded agepolitics
 
Sentimientos 8 de oct
Sentimientos 8 de octSentimientos 8 de oct
Sentimientos 8 de oct
 
Tema 2
Tema 2Tema 2
Tema 2
 
Perception
PerceptionPerception
Perception
 
Buletina (zuzenketa)
Buletina (zuzenketa)Buletina (zuzenketa)
Buletina (zuzenketa)
 
Lh erabilitakoa
Lh erabilitakoaLh erabilitakoa
Lh erabilitakoa
 
Akta klaustroa (jatorrizkoa)
Akta klaustroa (jatorrizkoa)Akta klaustroa (jatorrizkoa)
Akta klaustroa (jatorrizkoa)
 
Progetto Seap del Comune di Senigallia
Progetto Seap del Comune di SenigalliaProgetto Seap del Comune di Senigallia
Progetto Seap del Comune di Senigallia
 
Becascomedor el periodico2
Becascomedor el periodico2Becascomedor el periodico2
Becascomedor el periodico2
 

Similaire à JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery

accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
Farhan Zaki
 
Java region-based foldings in process discovery
Java  region-based foldings in process discoveryJava  region-based foldings in process discovery
Java region-based foldings in process discovery
Ecway Technologies
 
Java region-based foldings in process discovery
Java  region-based foldings in process discoveryJava  region-based foldings in process discovery
Java region-based foldings in process discovery
ecwayerode
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
ecway
 
Region based foldings in process discovery
Region based foldings in process discoveryRegion based foldings in process discovery
Region based foldings in process discovery
Ecway Technologies
 
Region based foldings in process discovery
Region based foldings in process discoveryRegion based foldings in process discovery
Region based foldings in process discovery
Ecway Technologies
 
Dotnet region-based foldings in process discovery
Dotnet  region-based foldings in process discoveryDotnet  region-based foldings in process discovery
Dotnet region-based foldings in process discovery
Ecwaytech
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwaytech
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwaytechnoz
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwaytechnoz
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecway2004
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwaytechnoz
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecway2004
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwayt
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
Ecwayt
 

Similaire à JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery (20)

IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...
 
Extended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithmExtended pso algorithm for improvement problems k means clustering algorithm
Extended pso algorithm for improvement problems k means clustering algorithm
 
accessible-streaming-algorithms
accessible-streaming-algorithmsaccessible-streaming-algorithms
accessible-streaming-algorithms
 
Java region-based foldings in process discovery
Java  region-based foldings in process discoveryJava  region-based foldings in process discovery
Java region-based foldings in process discovery
 
Java region-based foldings in process discovery
Java  region-based foldings in process discoveryJava  region-based foldings in process discovery
Java region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Region based foldings in process discovery
Region based foldings in process discoveryRegion based foldings in process discovery
Region based foldings in process discovery
 
Region based foldings in process discovery
Region based foldings in process discoveryRegion based foldings in process discovery
Region based foldings in process discovery
 
Ijricit 01-002 enhanced replica detection in short time for large data sets
Ijricit 01-002 enhanced replica detection in  short time for large data setsIjricit 01-002 enhanced replica detection in  short time for large data sets
Ijricit 01-002 enhanced replica detection in short time for large data sets
 
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic AlgorithmCongestion Control in Wireless Sensor Networks Using Genetic Algorithm
Congestion Control in Wireless Sensor Networks Using Genetic Algorithm
 
A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Dotnet region-based foldings in process discovery
Dotnet  region-based foldings in process discoveryDotnet  region-based foldings in process discovery
Dotnet region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 
Android region-based foldings in process discovery
Android  region-based foldings in process discoveryAndroid  region-based foldings in process discovery
Android region-based foldings in process discovery
 

Plus de IEEEGLOBALSOFTTECHNOLOGIES

Plus de IEEEGLOBALSOFTTECHNOLOGIES (20)

DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Vampire attacks draining life from w...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Vampire attacks draining life from w...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Vampire attacks draining life from w...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Vampire attacks draining life from w...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT SSD a robust rf location fingerprint...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT SSD a robust rf location fingerprint...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT SSD a robust rf location fingerprint...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT SSD a robust rf location fingerprint...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Privacy preserving distributed profi...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Privacy preserving distributed profi...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Privacy preserving distributed profi...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Privacy preserving distributed profi...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Optimal multicast capacity and delay...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Optimal multicast capacity and delay...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Optimal multicast capacity and delay...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Optimal multicast capacity and delay...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT On the real time hardware implementa...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT On the real time hardware implementa...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT On the real time hardware implementa...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT On the real time hardware implementa...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Model based analysis of wireless sys...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Model based analysis of wireless sys...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Model based analysis of wireless sys...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Model based analysis of wireless sys...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Mobile relay configuration in data i...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Mobile relay configuration in data i...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Mobile relay configuration in data i...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Mobile relay configuration in data i...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Distributed cooperative caching in s...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Distributed cooperative caching in s...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Distributed cooperative caching in s...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Distributed cooperative caching in s...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Delay optimal broadcast for multihop...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Dcim distributed cache invalidation ...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Dcim distributed cache invalidation ...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Dcim distributed cache invalidation ...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Dcim distributed cache invalidation ...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Cooperative packet delivery in hybri...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Cooperative packet delivery in hybri...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Cooperative packet delivery in hybri...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Cooperative packet delivery in hybri...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Content sharing over smartphone base...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Content sharing over smartphone base...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Content sharing over smartphone base...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Content sharing over smartphone base...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Community aware opportunistic routin...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Community aware opportunistic routin...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Community aware opportunistic routin...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Community aware opportunistic routin...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Capacity of hybrid wireless mesh net...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Capacity of hybrid wireless mesh net...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Capacity of hybrid wireless mesh net...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Capacity of hybrid wireless mesh net...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Adaptive position update for geograp...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Adaptive position update for geograp...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Adaptive position update for geograp...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT Adaptive position update for geograp...
 
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT A scalable server architecture for m...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT A scalable server architecture for m...DOTNET 2013 IEEE MOBILECOMPUTING PROJECT A scalable server architecture for m...
DOTNET 2013 IEEE MOBILECOMPUTING PROJECT A scalable server architecture for m...
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Attribute based access to scalable me...
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Scalable and secure sharing of person...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Scalable and secure sharing of person...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Scalable and secure sharing of person...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Scalable and secure sharing of person...
 
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Qos ranking prediction for cloud serv...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Qos ranking prediction for cloud serv...DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Qos ranking prediction for cloud serv...
DOTNET 2013 IEEE CLOUDCOMPUTING PROJECT Qos ranking prediction for cloud serv...
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Dernier (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery

  • 1. Region-Based Foldings in Process Discovery ABSTRACT A central problem in the area of Process Mining is to obtain a formal model that represents the processes that are conducted in a system. If realized, this simple motivation allows for powerful techniques that can be used to formally analyze and optimize a system, without the need to resort to its semiformal and sometimes inaccurate specification. The problem addressed in this paper is known as Process Discovery: to obtain a formal model from a set of system executions. The theory of regions is a valuable tool in process discovery: it aims at learning a formal model (Petri nets) from a set of traces. On its genuine form, the theory is applied on an automaton and therefore one should convert the traces into an acyclic automaton in order to apply these techniques. Given that the complexity of the region-based techniques depends on the size of the input automata, revealing the underlying cycles and folding the initial automaton can incur in a significant complexity alleviation of the region-based techniques. In this paper, we follow this idea by incorporating region information in the cycle detection algorithm, enabling the identification of complex cycles that cannot be obtained efficiently with state-of-the-art techniques. The experimental results obtained by the devised tool suggest that the techniques presented in this paper are a big step into widening the application of the theory of regions in Process Mining for industrial scenarios. Existing System The global patterns that can be used to make predictions about the future has been one of the key elements that have brought Data Mining to be one of the most relevant research areas in the last decades. Data mining techniques can be applied naturally on large amount of data like databases or even the Internet, and with GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • 2. the help of other disciplines like statistics or machine learning, can effectively reveal important patterns in many scenarios such as health care, business or transportation. As in data mining, Process Discovery tries to reveal patterns. However, the patterns aimed by Process Discovery techniques are process models, i.e., formal representations of the processes of a system. Due to its different focus, Process Discovery techniques apply disciplines different from the ones used in data mining, to allow for the derivation of both the statics and the dynamics of a system process. Depending on the emphasis, different dimensions can be considered ranging from social (the identification of communities) to control-flow (the identification of the complex interplay between system’s tasks). In this work we consider the latter: discover a Petri net from a log, that is from a set of traces corresponding to executions of a system. The first method to obtain a Petri net from a log was presented. Disadvantages To overcome this limitation, several extensions have been presented in the literature to widen the class of Petri nets that the algorithm can discover. The theory of regions was initially proposed to solve the synthesis problem: obtain a Petri net that has a behavior equivalent to a given transition system. Proposed System The theory of regions was initially proposed to solve the synthesis problem: obtain a Petri net that has a behavior equivalent to a given transition system. three conversions from a language to a TS were proposed, namely sequence, multiset, and set. The main difference between them is how it is decided whether the occurrence of an event in a trace produces a new state in the TS or just introduces an arc to an existing state. Together with these conversions, a number of additional conversions producing smaller TSs by means of abstractions have been proposed in the literature. Besides the sequence and multiset conversions, other conversions have been proposed that can yield smaller TSs at the cost of sacrificing regions. We use the term abstraction techniques to refer to them. The fundamental difference between all these methods and our proposal is that, in our case, the set of sacrificed regions is controlled considering bounds that are already used by process discovery tools, thus the compression of the TS does not involve a quality reduction.
  • 3. Advantages An advantage of region theory for process discovery is that it allows to perform label splitting. The advantages offered by the theory of regions, there are two main reasons that hamper a wider adoption of region-based Process Discovery methodologies in an industrial setting. One is their sensitivity to noise. The other hand the benefits for rbminer are twofold, since a smaller region basis reduces the amount of regions to explore. In this case, both advantages (state and basis reduction) combine to achieve orders of magnitude speedups. Module 1. Get Input Text File 2. Discovery Sentence Word 3. Decided Sentence 4. Tandem Repeats 5. Sequence And Multiset Conversions 6. Counting Data Module Description Get Input Text File The Process Discovery differs from synthesis in the knowledge assumption: while in synthesis one assumes a complete description of the system, only a partial description of the system is assumed in Process Discovery. Therefore, equivalence or bisimulation is no longer a goal to achieve. Instead, obtaining approximations that succinctly represent the log under consideration are more valuable. Discovery Sentence Word The fact that a discovery algorithm returns a PN with a smaller language than desired is referred as overfitting. A classical strategy to avoid overfitting is to allow the algorithms to restrict their output to k- bounded PNs (kbounded discovery), usually for small values of k, as nets with high numbers of tokens are considered harder to understand for humans than nets with fewer tokens. The particular k used in each case can
  • 4. be either determined from the desired level of complexity of the resulting PN1 or the number of available resources in the system (since places can represent resources). Decided Sentence The conversions from a language to a TS were proposed, namely sequence, multiset, and set. The main difference between them is how it is decided whether the occurrence of an event in a trace produces a new state in the TS or just introduces an arc to an existing state. Tandem Repeats The detection of unfolded cycles in an acyclic TS is a problem related to finding consecutively repeated patterns in a string. The latter problem has been studied in several fields with many variations and under different names, although it is often referred as the finding tandem repeats problem. Sequence And Multiset Conversions The sequence and multiset conversions, other conversions have been proposed that can yield smaller TSs at the cost of sacrificing regions. We use the term abstraction techniques to refer to them. The fundamental difference between all these methods and our proposal is that, in our case, the set of sacrificed regions is controlled considering bounds that are already used by process discovery tools, thus the compression of the TS does not involve a quality reduction. Counting Data The region-based approaches yield PNs that never reject a trace of the log, they are extremely sensitive to noise. Hence, to be applicable, the approach presented in this paper must be preceded by a noise filtering phase. The filtering can be done by clustering techniques or by outlier detection. Also, considering the frequencies of the states is a possibility in our approach to distinguish between real and noisy states, because the latter have often low frequency. For instance, only Parikh vector differences between frequent states could be taken into account to differentiate real folding opportunities from spurious cycle unfoldings caused by noise. An advantage of region theory for process discovery is that it allows to perform label splitting (i.e., to change the label of some arcs in the TS so that an event is actually represented by a set of different events). Label splitting is a technique that can help into improving the visualization of the PN, but also into avoiding to generalize too much. This technique can also be used with the TSs produced by our approach. However, the splitting options might be reduced as a consequence of arcs with the same label in the original TS that have been now merged into one arc in the folded TS.
  • 5. FLOW CHART Region-Based Process Discovery Get The Input Text File Discovery Sentence Word Sequence and Multiset Tandem Repeats Counting Data
  • 6. CONCLUSION The presents a novel technique for compacting a TS, one of the objects typically used in process discovery algorithms. The two main characteristics of this technique makes it very attractive in the context of region- based k-bounded process discovery: first, it is one of the most aggressive folding techniques in the literature, and second, it preserves the important regions that are crucial for PN derivation. The use of folding techniques that are region-aware like the one presented in this paper may be a crucial step to use region-based algorithms for process discovery in industrial scenarios. REFFERENCE [1] W. van der Aalst, H. Reijers, and M. Song, “Discovering Social Networks from Event Logs,” Computer Supported Cooperative Work, vol. 14, no. 6, pp. 549-593, 2005. [2] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from Event Logs,” IEEE Trans. Knowledge Data Eng., vol. 16, no. 9, pp. 1128-1142, Sept. 2004. [3] A. de Medeiros, W. van der Aalst, and A. Weijters, “Workflow Mining: Current Status and Future Directions,” Proc. On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE, pp. 389- 406, 2003. [4] L. Wen, W. van der Aalst, J. Wang, and J. Sun, “Mining Process Models with Non-Free-Choice Constructs,” Data Mining and Knowledge Discovery, vol. 15, no. 2, pp. 145-180, 2007. [5] W. van der Aalst, A. de Medeiros, and A. Weijters, “Genetic Process Mining,” Proc. 26th Int’l Conf. Applications and Theory of Petri Nets (ICATPN), pp. 48-69, 2005. [6] A. Ehrenfeucht and G. Rozenberg, “Partial (Set) 2-Structures. Part I, II,” Acta Informatica, vol. 27, pp. 315- 368, 1990.