SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
Mining Closed Sequential Patterns in
Large Datasets
Presenter: Ildar Nurgaliev
Lab: Dainfos
Innopolis University CloSpan page 1 of 34
Main idea
Instead of mining the complete set of frequent subsequences
we mine frequent closed subsequences
Innopolis University CloSpan page 2 of 34
Benets
• can mine really long sequences
• produce signicantly less number of discovered frequent
sequences
Innopolis University CloSpan page 3 of 34
Preliminary Concepts
Sequence
• items: I = {i1, i2, ..., im}
• itemset (ti ): ti ⊆ I
• sequence (ordered list): s = t1, t2, ..., tm
• size |s|: number of itemsets in s
• length l(s): l(s) =
n
i=1
|ti | (total number of items)
Innopolis University CloSpan page 4 of 34
Preliminary Concepts
α sub-sequence of β OR β super-sequence of α (contains)
• α = α1, α2, ..., αm
• β = β1, β2, ..., βm
• α β (if α = β, written as α β)
• i ∃i1, i2, ..., im, such that
1 ≤ i1  i2  ...  im ≤ n and
α1 ⊆ βi , α2 ⊆ βi2, ..., αm ⊆ βim
• β absorbs α: if β contains α and their support are the
same
Innopolis University CloSpan page 5 of 34
Preliminary Concepts
Support
• D = {s1, s2, ..., sn}: sequence database
• each s associated with id (id of si is i)
• |D|: number of s in D
• support(α): number of s in D which contain α
support(α) = |{s|s ∈ D and α s}|
• min_sup: minimum support threshold
Innopolis University CloSpan page 6 of 34
Preliminary Concepts
Frequent sequential pattern (FS) and closed FS (CS)
• FS: includes all s of support(s) ≥ min_sup
• CS = {α|α ∈ FS and β ∈ FS
such that α β and support(α) = support(β)}
• closed sequence mining: nd CS above min_sup
• database containment relation D D :
if ∃ an injective function f : D → D , s.t.
∀s ∈ D, s f (s)
Innopolis University CloSpan page 7 of 34
Preliminary Concepts
Item extension
• Given: s = t1, ..., tm and item α
• s α: concatenation (I-Step or S-Step)
• s i α = t1, ..., tm ∪ {α} if ∀k ∈ rm, k  α
Example: (αe) is I-Step extension of (α)
• s s α = t1, ..., tm, {α}
Example: (α)(c) is S-Step extension of (α)
Innopolis University CloSpan page 8 of 34
Preliminary Concepts
Sequence extension
• Given: s = t1, ..., tm and p = t1, ..., tn
• s p: concatenation (itemset-extension or
sequence-extension)
• s i p = t1, ..., tm ∪ t1, ..., tn if ∀k ∈ tm, j ∈ t1, k  j
• s s p = t1, ..., tm, t1, ..., tn
• s = p s: p - prex and s - sux of s
Example: (e)(α) is prex of (e)(abf )(bde) and
(bf )(bde) is its sux
Innopolis University CloSpan page 9 of 34
Preliminary Concepts
s-projected database (physical projection and pseudo projection)
• Ds = {p|s ∈ D, s = r p s.t. r is minimum prex
containing s (s r and r , s r r)}
p can be empty
Example
• D (αf ) = { (d)(e)(α) , (bde) }
• D (e)(α) = {$, (b) , (_bf )(bde) }
Innopolis University CloSpan page 10 of 34
Lexicographic Sequence Tree
Set Lexicographic Order
• Let t = {i1, i2, ..., ik}, t = {j1, j2, ..., jl }, where
i1 ≤ ... ≤ ik and j1 ≤ ... ≤ jl
• t  t i either of the following is true:
1. 0 ≤ h ≤ min{k, l }, we have ir = jr for r  h, and ih  jh
2. k  l , and i1 = j1, i2 = j2, ..., ik = jk
Example: (a, f )  (b, f ), (a, b)  (a, b, c) and
(a, b, c)  (b, c)
Innopolis University CloSpan page 11 of 34
Lexicographic Sequence Tree
Sequence Lexicographic Order
i if s = s p, then s  s
ii if s = α i p and s = α s p , no matter what is order
relation between p and p is, s  s
iii if s = α i p and s = α i p , p  p indicated s  s
iv s = α s p and s = α s p , p  p indicates s  s
Example: (a, b)  (a, b)(a) ; (a, b)  (a)(a)
Innopolis University CloSpan page 12 of 34
Lexicographic Sequence Tree
Lexicographic Sequence Tree construction
1. each node in the tree corresponds to a sequence, and the
root is a null sequence;
2. if a parent node corresponds to a sequence s, its child is
either an itemset-extension of s, or a sequence-extension
of s;
3. the left sibling is less than the right sibling in sequence
lexicographic order.
Innopolis University CloSpan page 13 of 34
Lexicographic Sequence Tree
Lexicographic Sequence Tree and Prex Search Tree
Innopolis University CloSpan page 14 of 34
Lexicographic Sequence Tree
Example
Lexicographic Sequence Tree with min_sup = 2
Innopolis University CloSpan page 15 of 34
Search Space Pruning and Prex Sequence Lattice
LEMMA 1 (Common Prex)
LEMMA 1. Given a subsequence s, and its projected database
Ds, if ∃α, α is a common prex for all the sequences with the
same extension type (either itemset or sequence - extension) in
Ds, then ∀β, if s β is closed, α must be a prex of β. That
means ∀β α, we need not search s β and its descendants
except the branch of s α.
Example: Ds = { (d)(e)(af ) , (d)(e)(fg) }, all the
sequences in Ds share a common prex α = (d)(e) , so any
sequence with prex s but not s (d)(e) must not be closed.
So we can jump to the branch s α.
Innopolis University CloSpan page 16 of 34
Search Space Pruning and Prex Sequence Lattice
LEMMA 2 (Partial Order)
LEMMA 2. Given a sequence s, and its projected database Ds,
if among all the sequences in Ds, and item α does always
occur before an item β (either in the same itemset for all
sequences in Ds or in a dierent itemset, but not both), then
Ds α β = Ds β. Therefore, ∀γ, s β γ is not closed. We need
not search any sequence in the branch of s β.
Innopolis University CloSpan page 17 of 34
Search Space Pruning and Prex Sequence Lattice
Theorem 1 (Equivalence of Projected Databases)
• I(D) =
n
i=1
l(si ): total number items in D
Theorem 1: Given 2 sequences, s, s , s s , then
Ds = Ds ⇔ I(Ds) = I(Ds )
Example: Consider D-sample on 15 slide.
• D (af ) = D (f ) = { (d)(e) , (de) }, and
• I(D (af ) ) = I(D (f ) ) = 4.
Based on Theorem 1, the following search space pruning can
be achieved.
Innopolis University CloSpan page 18 of 34
Search Space Pruning and Prex Sequence Lattice
Proof of Theorem 1
• Ds = Ds → I(Ds) = I(Ds ) (obvious);
• Since s s , then Ds Ds and I(Ds ) ≤ I(Ds);
• The equality between I(Ds ) and I(Ds) holds only if
∀γ ∈ Ds , γ ∈ Ds, and vice versa. Therefore, Ds = Ds .
Innopolis University CloSpan page 19 of 34
Search Space Pruning and Prex Sequence Lattice
LEMMA 3 (Early Termination by Equivalence)
LEMMA 3. Given 2 sequences, s s and also
I(Ds) = I(Ds ), then ∀γ, support(s γ) = support(s γ).
Example: Consider D-sample on 15 slide.
• I(D (af ) ) = I(D (f ) );
• both ((af )(d)) and (af )(e) are frequent;
We can conclude that the support of (af )(d) and (f )(d) ,
(af )(e) and (f )(e) are the same without knowing the
support of (f )(e) and (f )(d) .
Innopolis University CloSpan page 20 of 34
Search Space Pruning and Prex Sequence Lattice
Projected database closed set (LS)
• LS = {s|support(s) ≥ min_sup} and s , s.t s s and
I(Ds) = I(Ds );
• CS ⊆ LS ⊆ FS: instead of mining CS directly, CloSpan
algorithm rst produces the complete set of LS
• then non-closed sequence elimination is applied in LS to
generate CS based of Lemma 3.
Innopolis University CloSpan page 21 of 34
Search Space Pruning and Prex Sequence Lattice
Corollary 1 (Backward Sub-Pattern)
Corollary 1. If a sequence s  s' and s s , the condition of
I(Ds) = I(Ds ) is sucient to stop searching any descendant
of s in the prex searching tree.
s is backward sub-pattern of s if s  s and s s (s is discovered
after s)
Example: I(D (f ) ) = I(D (af ) ) → D (f ) = D (af )
Innopolis University CloSpan page 22 of 34
Search Space Pruning and Prex Sequence Lattice
Corollary 2 (Backward Super-Pattern)
Corollary 2. If a sequence s  s and s s , if the condition of
I(Ds) = I(Ds ) holds, it is sucient to translating the
descendants of s to s instead of searching any descendant of
s in the prex search tree.
Example: the same logic as in the previous example.
Innopolis University CloSpan page 23 of 34
CloSpan: Design and Implementation
2 main steps
CloSpan divides mining process into 2 stages.
1. Generated the LS set, a superset of closed frequent
sequences, and stores it in a prex sequence lattice;
2. it does post-pruning to eliminate non-closed sequences.
Innopolis University CloSpan page 24 of 34
CloSpan: Design and Implementation
Algorithm 1: ClosedMining(D, min_sup, L)
Innopolis University CloSpan page 25 of 34
CloSpan: Design and Implementation
Algorithm 2: CloSpan(s, Ds , min_sup, L)
Innopolis University CloSpan page 26 of 34
CloSpan: Design and Implementation
Algorithm : CloSpan
• Hash index on the size of projected database in order to
speed up check on Theorem 1 (1-4 lines of CloSpan);
• if I(Ds ) = I(Ds) then;
• if s s , then we do not add I(Ds), s ;
• if s s, then we replace I(Ds ), s with I(Ds), s .
I(Ds), s
Innopolis University CloSpan page 27 of 34
CloSpan: Design and Implementation
Algorithm 3: checkProjectedDBSize(s, k, H)
Corresponds to line 1-4 in Algorithm 2.
Innopolis University CloSpan page 28 of 34
CloSpan: Design and Implementation
Algorithm 3: hash function algorithm
• Database size range from 0 to I(D), so if the values of
I(Ds) are dense in a small range, performance degrade;
• by Theorem 1 we could use necessary propositions of
holding Ds = Ds in a part of hash key;
• L(Ds) = I(Ds) + m
j=1
n
k=ij +1 l(sk);
• if s s , L(Ds) = L(Ds ) ↔ I(Ds) = I(Ds ).
Innopolis University CloSpan page 29 of 34
Non-Closed Sequence Elimination
Check out for super sequence
• support(s) as its Hash function
• nd all the sequences with the same support of s
• check whether there is a super-sequence containing s.
• if s s and
support(s) = support(s ) → T (Ds) = T (Ds )
(corresponding sequences' id sum)
• that's why T (Ds) = T (Ds ) could be used as a Hash
function instead of support (more sparse)
Innopolis University CloSpan page 30 of 34
Conclusion
CloSpan
• Solve closed sequential pattern mining problem;
• CloSpan outperforms PrexSpan by more than one order
of magnitude;
• capable of mining longer frequent sequences in a large
data set with low min_sup;
• it does not modify the frequent pattern mining algorithm:
it only denes the early termination condition of search
branch;
• this method can be extended to other existing sequential
pattern mining algorithms (SPADE, SPAM).
Innopolis University CloSpan page 31 of 34
Possible improvements
CloSpan
• The performance of CloSpan is achieved by smart
prunning method, do it more smart;
• Do not need to keep track of any single historical
frequent closed sequence (or candidate) for a new
pattern's closure checking.
Innopolis University CloSpan page 32 of 34
Possible improvements
BIDE algorithm
1. BIDE consumes much less memory and can be an order of
magnitude faster than CloSpan when the support is low;
2. BIDE has linear scalability against base size in terms of
runtime eciency and space usage;
3. the BackScan pruning method is very eective in
enhancing the performance of BIDE.
Innopolis University CloSpan page 33 of 34
CloSpam in trajectory Mining
Sequential Pattern Mining from Trajectory Data
• need more studies: IPCA and DBScan on Trajectory data.
• CloSpan could be used as unsupevised algorithm for
detecting most crowded paths in a city.
• ...
Innopolis University CloSpan page 34 of 34

Contenu connexe

Similaire à CloSapn

Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
jacksnathalie
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
Miller Lee
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Huang Po Chun
 
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Oana Tifrea-Marciuska
 
Jarrar.lecture notes.aai.2011s.descriptionlogic
Jarrar.lecture notes.aai.2011s.descriptionlogicJarrar.lecture notes.aai.2011s.descriptionlogic
Jarrar.lecture notes.aai.2011s.descriptionlogic
PalGov
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
Jie Bao
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithm
Trector Rancor
 

Similaire à CloSapn (20)

block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Answers withexplanations
Answers withexplanationsAnswers withexplanations
Answers withexplanations
 
Chapter 4 ds
Chapter 4 dsChapter 4 ds
Chapter 4 ds
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docxDivide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
Divide-and-Conquer & Dynamic ProgrammingDivide-and-Conqu.docx
 
presentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptxpresentation on important DAG,TRIE,Hashing.pptx
presentation on important DAG,TRIE,Hashing.pptx
 
Dataflow Analysis
Dataflow AnalysisDataflow Analysis
Dataflow Analysis
 
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
Divide_and_Contrast__Source_free_Domain_Adaptation_via_Adaptive_Contrastive_L...
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/{ Ontologies under Group Preferences
 
1520 differentiation-l1
1520 differentiation-l11520 differentiation-l1
1520 differentiation-l1
 
Fol
FolFol
Fol
 
Crib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC examsCrib Sheet AP Calculus AB and BC exams
Crib Sheet AP Calculus AB and BC exams
 
Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/– Ontologies under Group PreferencesQuery Answering in Probabilistic Datalog+/– Ontologies under Group Preferences
Query Answering in Probabilistic Datalog+/– Ontologies under Group Preferences
 
Wi13 otm
Wi13 otmWi13 otm
Wi13 otm
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Jarrar.lecture notes.aai.2011s.descriptionlogic
Jarrar.lecture notes.aai.2011s.descriptionlogicJarrar.lecture notes.aai.2011s.descriptionlogic
Jarrar.lecture notes.aai.2011s.descriptionlogic
 
Query Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data SourcesQuery Translation for Ontology-extended Data Sources
Query Translation for Ontology-extended Data Sources
 
Tree distance algorithm
Tree distance algorithmTree distance algorithm
Tree distance algorithm
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 

Plus de Ildar Nurgaliev

Plus de Ildar Nurgaliev (7)

Presentation
PresentationPresentation
Presentation
 
Анализ маркетинговой деятельности ООО “БАРС ГРУП”
Анализ маркетинговой деятельности  ООО “БАРС ГРУП”Анализ маркетинговой деятельности  ООО “БАРС ГРУП”
Анализ маркетинговой деятельности ООО “БАРС ГРУП”
 
Fuzzy logic and application in AI
Fuzzy logic and application in AIFuzzy logic and application in AI
Fuzzy logic and application in AI
 
Scala syntax analysis
Scala syntax analysisScala syntax analysis
Scala syntax analysis
 
Kotlin compiler construction (very brief)
Kotlin compiler construction (very brief)Kotlin compiler construction (very brief)
Kotlin compiler construction (very brief)
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Social dynamic simulation
Social dynamic simulationSocial dynamic simulation
Social dynamic simulation
 

Dernier

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Dernier (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 

CloSapn

  • 1. Mining Closed Sequential Patterns in Large Datasets Presenter: Ildar Nurgaliev Lab: Dainfos Innopolis University CloSpan page 1 of 34
  • 2. Main idea Instead of mining the complete set of frequent subsequences we mine frequent closed subsequences Innopolis University CloSpan page 2 of 34
  • 3. Benets • can mine really long sequences • produce signicantly less number of discovered frequent sequences Innopolis University CloSpan page 3 of 34
  • 4. Preliminary Concepts Sequence • items: I = {i1, i2, ..., im} • itemset (ti ): ti ⊆ I • sequence (ordered list): s = t1, t2, ..., tm • size |s|: number of itemsets in s • length l(s): l(s) = n i=1 |ti | (total number of items) Innopolis University CloSpan page 4 of 34
  • 5. Preliminary Concepts α sub-sequence of β OR β super-sequence of α (contains) • α = α1, α2, ..., αm • β = β1, β2, ..., βm • α β (if α = β, written as α β) • i ∃i1, i2, ..., im, such that 1 ≤ i1 i2 ... im ≤ n and α1 ⊆ βi , α2 ⊆ βi2, ..., αm ⊆ βim • β absorbs α: if β contains α and their support are the same Innopolis University CloSpan page 5 of 34
  • 6. Preliminary Concepts Support • D = {s1, s2, ..., sn}: sequence database • each s associated with id (id of si is i) • |D|: number of s in D • support(α): number of s in D which contain α support(α) = |{s|s ∈ D and α s}| • min_sup: minimum support threshold Innopolis University CloSpan page 6 of 34
  • 7. Preliminary Concepts Frequent sequential pattern (FS) and closed FS (CS) • FS: includes all s of support(s) ≥ min_sup • CS = {α|α ∈ FS and β ∈ FS such that α β and support(α) = support(β)} • closed sequence mining: nd CS above min_sup • database containment relation D D : if ∃ an injective function f : D → D , s.t. ∀s ∈ D, s f (s) Innopolis University CloSpan page 7 of 34
  • 8. Preliminary Concepts Item extension • Given: s = t1, ..., tm and item α • s α: concatenation (I-Step or S-Step) • s i α = t1, ..., tm ∪ {α} if ∀k ∈ rm, k α Example: (αe) is I-Step extension of (α) • s s α = t1, ..., tm, {α} Example: (α)(c) is S-Step extension of (α) Innopolis University CloSpan page 8 of 34
  • 9. Preliminary Concepts Sequence extension • Given: s = t1, ..., tm and p = t1, ..., tn • s p: concatenation (itemset-extension or sequence-extension) • s i p = t1, ..., tm ∪ t1, ..., tn if ∀k ∈ tm, j ∈ t1, k j • s s p = t1, ..., tm, t1, ..., tn • s = p s: p - prex and s - sux of s Example: (e)(α) is prex of (e)(abf )(bde) and (bf )(bde) is its sux Innopolis University CloSpan page 9 of 34
  • 10. Preliminary Concepts s-projected database (physical projection and pseudo projection) • Ds = {p|s ∈ D, s = r p s.t. r is minimum prex containing s (s r and r , s r r)} p can be empty Example • D (αf ) = { (d)(e)(α) , (bde) } • D (e)(α) = {$, (b) , (_bf )(bde) } Innopolis University CloSpan page 10 of 34
  • 11. Lexicographic Sequence Tree Set Lexicographic Order • Let t = {i1, i2, ..., ik}, t = {j1, j2, ..., jl }, where i1 ≤ ... ≤ ik and j1 ≤ ... ≤ jl • t t i either of the following is true: 1. 0 ≤ h ≤ min{k, l }, we have ir = jr for r h, and ih jh 2. k l , and i1 = j1, i2 = j2, ..., ik = jk Example: (a, f ) (b, f ), (a, b) (a, b, c) and (a, b, c) (b, c) Innopolis University CloSpan page 11 of 34
  • 12. Lexicographic Sequence Tree Sequence Lexicographic Order i if s = s p, then s s ii if s = α i p and s = α s p , no matter what is order relation between p and p is, s s iii if s = α i p and s = α i p , p p indicated s s iv s = α s p and s = α s p , p p indicates s s Example: (a, b) (a, b)(a) ; (a, b) (a)(a) Innopolis University CloSpan page 12 of 34
  • 13. Lexicographic Sequence Tree Lexicographic Sequence Tree construction 1. each node in the tree corresponds to a sequence, and the root is a null sequence; 2. if a parent node corresponds to a sequence s, its child is either an itemset-extension of s, or a sequence-extension of s; 3. the left sibling is less than the right sibling in sequence lexicographic order. Innopolis University CloSpan page 13 of 34
  • 14. Lexicographic Sequence Tree Lexicographic Sequence Tree and Prex Search Tree Innopolis University CloSpan page 14 of 34
  • 15. Lexicographic Sequence Tree Example Lexicographic Sequence Tree with min_sup = 2 Innopolis University CloSpan page 15 of 34
  • 16. Search Space Pruning and Prex Sequence Lattice LEMMA 1 (Common Prex) LEMMA 1. Given a subsequence s, and its projected database Ds, if ∃α, α is a common prex for all the sequences with the same extension type (either itemset or sequence - extension) in Ds, then ∀β, if s β is closed, α must be a prex of β. That means ∀β α, we need not search s β and its descendants except the branch of s α. Example: Ds = { (d)(e)(af ) , (d)(e)(fg) }, all the sequences in Ds share a common prex α = (d)(e) , so any sequence with prex s but not s (d)(e) must not be closed. So we can jump to the branch s α. Innopolis University CloSpan page 16 of 34
  • 17. Search Space Pruning and Prex Sequence Lattice LEMMA 2 (Partial Order) LEMMA 2. Given a sequence s, and its projected database Ds, if among all the sequences in Ds, and item α does always occur before an item β (either in the same itemset for all sequences in Ds or in a dierent itemset, but not both), then Ds α β = Ds β. Therefore, ∀γ, s β γ is not closed. We need not search any sequence in the branch of s β. Innopolis University CloSpan page 17 of 34
  • 18. Search Space Pruning and Prex Sequence Lattice Theorem 1 (Equivalence of Projected Databases) • I(D) = n i=1 l(si ): total number items in D Theorem 1: Given 2 sequences, s, s , s s , then Ds = Ds ⇔ I(Ds) = I(Ds ) Example: Consider D-sample on 15 slide. • D (af ) = D (f ) = { (d)(e) , (de) }, and • I(D (af ) ) = I(D (f ) ) = 4. Based on Theorem 1, the following search space pruning can be achieved. Innopolis University CloSpan page 18 of 34
  • 19. Search Space Pruning and Prex Sequence Lattice Proof of Theorem 1 • Ds = Ds → I(Ds) = I(Ds ) (obvious); • Since s s , then Ds Ds and I(Ds ) ≤ I(Ds); • The equality between I(Ds ) and I(Ds) holds only if ∀γ ∈ Ds , γ ∈ Ds, and vice versa. Therefore, Ds = Ds . Innopolis University CloSpan page 19 of 34
  • 20. Search Space Pruning and Prex Sequence Lattice LEMMA 3 (Early Termination by Equivalence) LEMMA 3. Given 2 sequences, s s and also I(Ds) = I(Ds ), then ∀γ, support(s γ) = support(s γ). Example: Consider D-sample on 15 slide. • I(D (af ) ) = I(D (f ) ); • both ((af )(d)) and (af )(e) are frequent; We can conclude that the support of (af )(d) and (f )(d) , (af )(e) and (f )(e) are the same without knowing the support of (f )(e) and (f )(d) . Innopolis University CloSpan page 20 of 34
  • 21. Search Space Pruning and Prex Sequence Lattice Projected database closed set (LS) • LS = {s|support(s) ≥ min_sup} and s , s.t s s and I(Ds) = I(Ds ); • CS ⊆ LS ⊆ FS: instead of mining CS directly, CloSpan algorithm rst produces the complete set of LS • then non-closed sequence elimination is applied in LS to generate CS based of Lemma 3. Innopolis University CloSpan page 21 of 34
  • 22. Search Space Pruning and Prex Sequence Lattice Corollary 1 (Backward Sub-Pattern) Corollary 1. If a sequence s s' and s s , the condition of I(Ds) = I(Ds ) is sucient to stop searching any descendant of s in the prex searching tree. s is backward sub-pattern of s if s s and s s (s is discovered after s) Example: I(D (f ) ) = I(D (af ) ) → D (f ) = D (af ) Innopolis University CloSpan page 22 of 34
  • 23. Search Space Pruning and Prex Sequence Lattice Corollary 2 (Backward Super-Pattern) Corollary 2. If a sequence s s and s s , if the condition of I(Ds) = I(Ds ) holds, it is sucient to translating the descendants of s to s instead of searching any descendant of s in the prex search tree. Example: the same logic as in the previous example. Innopolis University CloSpan page 23 of 34
  • 24. CloSpan: Design and Implementation 2 main steps CloSpan divides mining process into 2 stages. 1. Generated the LS set, a superset of closed frequent sequences, and stores it in a prex sequence lattice; 2. it does post-pruning to eliminate non-closed sequences. Innopolis University CloSpan page 24 of 34
  • 25. CloSpan: Design and Implementation Algorithm 1: ClosedMining(D, min_sup, L) Innopolis University CloSpan page 25 of 34
  • 26. CloSpan: Design and Implementation Algorithm 2: CloSpan(s, Ds , min_sup, L) Innopolis University CloSpan page 26 of 34
  • 27. CloSpan: Design and Implementation Algorithm : CloSpan • Hash index on the size of projected database in order to speed up check on Theorem 1 (1-4 lines of CloSpan); • if I(Ds ) = I(Ds) then; • if s s , then we do not add I(Ds), s ; • if s s, then we replace I(Ds ), s with I(Ds), s . I(Ds), s Innopolis University CloSpan page 27 of 34
  • 28. CloSpan: Design and Implementation Algorithm 3: checkProjectedDBSize(s, k, H) Corresponds to line 1-4 in Algorithm 2. Innopolis University CloSpan page 28 of 34
  • 29. CloSpan: Design and Implementation Algorithm 3: hash function algorithm • Database size range from 0 to I(D), so if the values of I(Ds) are dense in a small range, performance degrade; • by Theorem 1 we could use necessary propositions of holding Ds = Ds in a part of hash key; • L(Ds) = I(Ds) + m j=1 n k=ij +1 l(sk); • if s s , L(Ds) = L(Ds ) ↔ I(Ds) = I(Ds ). Innopolis University CloSpan page 29 of 34
  • 30. Non-Closed Sequence Elimination Check out for super sequence • support(s) as its Hash function • nd all the sequences with the same support of s • check whether there is a super-sequence containing s. • if s s and support(s) = support(s ) → T (Ds) = T (Ds ) (corresponding sequences' id sum) • that's why T (Ds) = T (Ds ) could be used as a Hash function instead of support (more sparse) Innopolis University CloSpan page 30 of 34
  • 31. Conclusion CloSpan • Solve closed sequential pattern mining problem; • CloSpan outperforms PrexSpan by more than one order of magnitude; • capable of mining longer frequent sequences in a large data set with low min_sup; • it does not modify the frequent pattern mining algorithm: it only denes the early termination condition of search branch; • this method can be extended to other existing sequential pattern mining algorithms (SPADE, SPAM). Innopolis University CloSpan page 31 of 34
  • 32. Possible improvements CloSpan • The performance of CloSpan is achieved by smart prunning method, do it more smart; • Do not need to keep track of any single historical frequent closed sequence (or candidate) for a new pattern's closure checking. Innopolis University CloSpan page 32 of 34
  • 33. Possible improvements BIDE algorithm 1. BIDE consumes much less memory and can be an order of magnitude faster than CloSpan when the support is low; 2. BIDE has linear scalability against base size in terms of runtime eciency and space usage; 3. the BackScan pruning method is very eective in enhancing the performance of BIDE. Innopolis University CloSpan page 33 of 34
  • 34. CloSpam in trajectory Mining Sequential Pattern Mining from Trajectory Data • need more studies: IPCA and DBScan on Trajectory data. • CloSpan could be used as unsupevised algorithm for detecting most crowded paths in a city. • ... Innopolis University CloSpan page 34 of 34