SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
IEEE BigData 2019 , December 4-12
2
IEEE BigData 2019 , December 4-12
3
[KW ‘02] K. Wang, L. Tang, J. Han, and J. Liu, “Top down fp-growth for association rule mining,”
in Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data
Mining, ser. PAKDD ’02.
IEEE BigData 2019 , December 4-12
4
IEEE BigData 2019 , December 4-12
5
IEEE BigData 2019 , December 4-12
6
[F.Z. ‘16] F. Zhang, P. Di, H. Zhou, X. Liao, and J. Xue, “Regtt: Accelerating tree traversals on gpus by exploiting regularities,”
in 2016 ICPP
[M.G. ‘13] M. Goldfarb, Y. Jo, and M. Kulkarni, “General transformations for gpu execution of tree traversals,” in Proceedings
of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13.
IEEE BigData 2019 , December 4-12
7
Index 0
item
(parent item, the index of parent node, support)
coalesced access
IEEE BigData 2019 , December 4-12
8
(a)
(b)
(a)
(b)
53x
IEEE BigData 2019 , December 4-12
9
IEEE BigData 2019 , December 4-12
10
[XH ’10] X. Huang, C. I. Rodrigues, S. Jones, I. Buck and W. Hwu,
"XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines,"
2010 10th IEEE International Conference on Computer and Information Technology
[MS ’12] M. Steinberger, M. Kenzel, B. Kainz and D. Schmalstieg,
"ScatterAlloc: Massively parallel dynamic memory allocation for the GPU,"
2012 Innovative Parallel Computing (InPar)
Input table set
Output table
set
Mining Iteration 0
Input table set
Output table
set
Mining Iteration 1
Input table set
Output table
set
Mining
Iteration 2
Header
table 0
Header
table 1
Header
table k
Info of an item : node, support, etc.
Header table XY: the header table of pattern XY
Info of
item 0
Info of
item 1
Info of
item k-1
Thread
blocks
Out of order
Header
table 1k
Header
table 2k
Header
table (k-1)k
Header
table 13
Header
table 59
IEEE BigData 2019 , December 4-12
11
IEEE BigData 2019 , December 4-12
12
IEEE BigData 2019 , December 4-12
13
2 0 1Remap
Size 0 Size 1 Size 2
Size 1 Size 2 Size 0
exclusive prefix-sum
0 Size 1 Size 1+2
Write offset
Calculating the write offsets
IEEE BigData 2019 , December 4-12
14
2 0 1
0 Size 1 Size 1+2
Table Table Table
Write offset
Remap
Using the write offsets
IEEE BigData 2019 , December 4-12
15
I I
I
I
Idx:0 Idx:1
Idx:2
Idx:3
0 2 3 1 4
Thread 0, Thread 1, Thread 2, Thread 3
I
Idx:4
Thread block size: 4
IEEE BigData 2019 , December 4-12
16
[CB ’05] C. Borgelt, “An implementation of the fp-growth algorithm,” OSDM ’05.(workshop)
[FW ’14] F. Wang and B. Yuan, “Parallel frequent pattern mining without candidate generation on gpus,”
2014 IEEE ICDMW
[HJ ‘17]H. Jiang and H. Meng, “A parallel fp-growth algorithm based on gpu,” 2017 IEEE ICEBE
[WF ’09] W. Fang, M. Lu, X. Xiao, B. He, and Q. Luo, “Frequent itemset mining on graphics processors,” DaMoN ’09
[Chon ’18] K.-W. Chon, S.-H. Hwang, and M.-S. Kim, “Gminer: A fast gpu-based frequent itemset mining
method for large-scale data,” InformationSciences, vol. 439-440, pp. 19 – 38, 2018.
Not open source,
and the normalized results are too bad
IEEE BigData 2019 , December 4-12
17
Dataset #items #trans Size Threshold
(%)
v.s. CPU
FP-
growth
v.s. the
best GPU
Apriori
chess 75 3196 335KB 35~60 1.2x~0.7x 1.8x~3.3x
retail 16470 88163 4MB 0.07~0.1 2x ~ 1.8x 9.8x ~ 8.6x
accident 468 340184 34MB 20~40 8x~6x 16x ~ 42x
kosarak 41270 990002 30MB 0.3 ~ 0.6 6x~7x 12x ~ 40x
Webdoc 5267656 1692082 1.48GB 20 ~ 25 12x~7x 12x ~ 86x
Fewer patternsPerformance criteria :execution time
Operations can be processed offline are excluded.
IEEE BigData 2019 , December 4-12
18
IEEE BigData 2019 , December 4-12
19
IEEE BigData 2019 , December 4-12
20
Generated frequent patterns
04 14 24 34 4
Header table of pattern 24
0:5
2:2
3:2
4:2
1:3
2:1
3:1
4:1
4:2
2
The length of index array
Depend on hash function
0 1
3 1
1 1
Idx:0
Idx:1
0
3
1
2
The position is decided by hash value
# node
# support
IEEE BigData 2019 , December 4-12
21
Assume the support threshold is 3
A new frequent pattern 024:3 will be generated

Contenu connexe

Tendances

Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
balmanme
 
Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephant
Abinasha Karana
 

Tendances (19)

Big Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud ComputingBig Data Analysis of Airline Data Set on Cloud Computing
Big Data Analysis of Airline Data Set on Cloud Computing
 
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
Hpcwire100gnetworktosupportbigscience 130725203822-phpapp01-1
 
Hadoop
HadoopHadoop
Hadoop
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Bizosys at fifth elephant
Bizosys at fifth elephantBizosys at fifth elephant
Bizosys at fifth elephant
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 
Hadoop
HadoopHadoop
Hadoop
 
Overview of bigdata
Overview of bigdataOverview of bigdata
Overview of bigdata
 
simple introduction to hadoop
simple introduction to hadoopsimple introduction to hadoop
simple introduction to hadoop
 
How Do I Learn Big Data
How Do I Learn Big DataHow Do I Learn Big Data
How Do I Learn Big Data
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
 
2013 Geospatial Data and Project Management Track, Building Better Data: The ...
2013 Geospatial Data and Project Management Track, Building Better Data: The ...2013 Geospatial Data and Project Management Track, Building Better Data: The ...
2013 Geospatial Data and Project Management Track, Building Better Data: The ...
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Research Data Australia and the national research data landscape
Research Data Australia and the national research data landscapeResearch Data Australia and the national research data landscape
Research Data Australia and the national research data landscape
 
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
VFB 2013 - HP Labs - Horizon Scanning - Technology TrendsVFB 2013 - HP Labs - Horizon Scanning - Technology Trends
VFB 2013 - HP Labs - Horizon Scanning - Technology Trends
 
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
Big Data Mining, Techniques, Handling Technologies and Some Related Issues: A...
 
Intro to hadoop ecosystem
Intro to hadoop ecosystemIntro to hadoop ecosystem
Intro to hadoop ecosystem
 
The internet of things, do we need all that data?
The internet of things, do we need all that data?The internet of things, do we need all that data?
The internet of things, do we need all that data?
 
Hadoop bigdata projects list(ver)
Hadoop bigdata projects list(ver)Hadoop bigdata projects list(ver)
Hadoop bigdata projects list(ver)
 

Similaire à Fast Frequent Pattern Mining without Candidate Generations on GPU by Low Latency Memory Allocation(IEEE Big data 2019)

Iciic 2010 114
Iciic 2010 114Iciic 2010 114
Iciic 2010 114
hanums1
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
SK Ahammad Fahad
 
Iciic2010 114
Iciic2010 114Iciic2010 114
Iciic2010 114
hanums1
 

Similaire à Fast Frequent Pattern Mining without Candidate Generations on GPU by Low Latency Memory Allocation(IEEE Big data 2019) (20)

Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 FPGA-based soft-processors: 6G nodes and post-quantum security in space FPGA-based soft-processors: 6G nodes and post-quantum security in space
FPGA-based soft-processors: 6G nodes and post-quantum security in space
 
Analysis of Decoding Plaintext Data Using Enhanced Hamming Code Techniques
Analysis of Decoding Plaintext Data Using Enhanced Hamming Code TechniquesAnalysis of Decoding Plaintext Data Using Enhanced Hamming Code Techniques
Analysis of Decoding Plaintext Data Using Enhanced Hamming Code Techniques
 
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdfFederated Learning of Neural Network Models with Heterogeneous Structures.pdf
Federated Learning of Neural Network Models with Heterogeneous Structures.pdf
 
2. Rationale behind FPGA
2. Rationale behind FPGA2. Rationale behind FPGA
2. Rationale behind FPGA
 
Distributed Computing for Everyone
Distributed Computing for EveryoneDistributed Computing for Everyone
Distributed Computing for Everyone
 
Netsoft19 Keynote: Fluid Network Planes
Netsoft19 Keynote: Fluid Network PlanesNetsoft19 Keynote: Fluid Network Planes
Netsoft19 Keynote: Fluid Network Planes
 
Iciic 2010 114
Iciic 2010 114Iciic 2010 114
Iciic 2010 114
 
Coco co-desing and co-verification of masked software implementations on cp us
Coco   co-desing and co-verification of masked software implementations on cp usCoco   co-desing and co-verification of masked software implementations on cp us
Coco co-desing and co-verification of masked software implementations on cp us
 
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
ACT Talk, Giuseppe Totaro: High Performance Computing for Distributed Indexin...
 
A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)A Hacking Toolset for Big Tabular Files (3)
A Hacking Toolset for Big Tabular Files (3)
 
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost PlatformsComparing Scalable Predictive Analysis using Spark XGBoost Platforms
Comparing Scalable Predictive Analysis using Spark XGBoost Platforms
 
stanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdfstanford_graph-learning_workshop.pdf
stanford_graph-learning_workshop.pdf
 
A Study on New York City Taxi Rides
A Study on New York City Taxi RidesA Study on New York City Taxi Rides
A Study on New York City Taxi Rides
 
A modified k means algorithm for big data clustering
A modified k means algorithm for big data clusteringA modified k means algorithm for big data clustering
A modified k means algorithm for big data clustering
 
Extreme Computing A Primer
Extreme Computing A PrimerExtreme Computing A Primer
Extreme Computing A Primer
 
Creating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data SuperhighwayCreating a Science-Driven Big Data Superhighway
Creating a Science-Driven Big Data Superhighway
 
Iciic2010 114
Iciic2010 114Iciic2010 114
Iciic2010 114
 
Agents In An Exponential World Foster
Agents In An Exponential World FosterAgents In An Exponential World Foster
Agents In An Exponential World Foster
 
Future of hpc
Future of hpcFuture of hpc
Future of hpc
 

Dernier

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
Bhagirath Gogikar
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
1301aanya
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Dernier (20)

Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Fast Frequent Pattern Mining without Candidate Generations on GPU by Low Latency Memory Allocation(IEEE Big data 2019)

  • 1.
  • 2. IEEE BigData 2019 , December 4-12 2
  • 3. IEEE BigData 2019 , December 4-12 3
  • 4. [KW ‘02] K. Wang, L. Tang, J. Han, and J. Liu, “Top down fp-growth for association rule mining,” in Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, ser. PAKDD ’02. IEEE BigData 2019 , December 4-12 4
  • 5. IEEE BigData 2019 , December 4-12 5
  • 6. IEEE BigData 2019 , December 4-12 6
  • 7. [F.Z. ‘16] F. Zhang, P. Di, H. Zhou, X. Liao, and J. Xue, “Regtt: Accelerating tree traversals on gpus by exploiting regularities,” in 2016 ICPP [M.G. ‘13] M. Goldfarb, Y. Jo, and M. Kulkarni, “General transformations for gpu execution of tree traversals,” in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, ser. SC ’13. IEEE BigData 2019 , December 4-12 7
  • 8. Index 0 item (parent item, the index of parent node, support) coalesced access IEEE BigData 2019 , December 4-12 8
  • 10. IEEE BigData 2019 , December 4-12 10 [XH ’10] X. Huang, C. I. Rodrigues, S. Jones, I. Buck and W. Hwu, "XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines," 2010 10th IEEE International Conference on Computer and Information Technology [MS ’12] M. Steinberger, M. Kenzel, B. Kainz and D. Schmalstieg, "ScatterAlloc: Massively parallel dynamic memory allocation for the GPU," 2012 Innovative Parallel Computing (InPar)
  • 11. Input table set Output table set Mining Iteration 0 Input table set Output table set Mining Iteration 1 Input table set Output table set Mining Iteration 2 Header table 0 Header table 1 Header table k Info of an item : node, support, etc. Header table XY: the header table of pattern XY Info of item 0 Info of item 1 Info of item k-1 Thread blocks Out of order Header table 1k Header table 2k Header table (k-1)k Header table 13 Header table 59 IEEE BigData 2019 , December 4-12 11
  • 12. IEEE BigData 2019 , December 4-12 12
  • 13. IEEE BigData 2019 , December 4-12 13 2 0 1Remap Size 0 Size 1 Size 2 Size 1 Size 2 Size 0 exclusive prefix-sum 0 Size 1 Size 1+2 Write offset Calculating the write offsets
  • 14. IEEE BigData 2019 , December 4-12 14 2 0 1 0 Size 1 Size 1+2 Table Table Table Write offset Remap Using the write offsets
  • 15. IEEE BigData 2019 , December 4-12 15 I I I I Idx:0 Idx:1 Idx:2 Idx:3 0 2 3 1 4 Thread 0, Thread 1, Thread 2, Thread 3 I Idx:4 Thread block size: 4
  • 16. IEEE BigData 2019 , December 4-12 16 [CB ’05] C. Borgelt, “An implementation of the fp-growth algorithm,” OSDM ’05.(workshop) [FW ’14] F. Wang and B. Yuan, “Parallel frequent pattern mining without candidate generation on gpus,” 2014 IEEE ICDMW [HJ ‘17]H. Jiang and H. Meng, “A parallel fp-growth algorithm based on gpu,” 2017 IEEE ICEBE [WF ’09] W. Fang, M. Lu, X. Xiao, B. He, and Q. Luo, “Frequent itemset mining on graphics processors,” DaMoN ’09 [Chon ’18] K.-W. Chon, S.-H. Hwang, and M.-S. Kim, “Gminer: A fast gpu-based frequent itemset mining method for large-scale data,” InformationSciences, vol. 439-440, pp. 19 – 38, 2018. Not open source, and the normalized results are too bad
  • 17. IEEE BigData 2019 , December 4-12 17 Dataset #items #trans Size Threshold (%) v.s. CPU FP- growth v.s. the best GPU Apriori chess 75 3196 335KB 35~60 1.2x~0.7x 1.8x~3.3x retail 16470 88163 4MB 0.07~0.1 2x ~ 1.8x 9.8x ~ 8.6x accident 468 340184 34MB 20~40 8x~6x 16x ~ 42x kosarak 41270 990002 30MB 0.3 ~ 0.6 6x~7x 12x ~ 40x Webdoc 5267656 1692082 1.48GB 20 ~ 25 12x~7x 12x ~ 86x Fewer patternsPerformance criteria :execution time Operations can be processed offline are excluded.
  • 18. IEEE BigData 2019 , December 4-12 18
  • 19. IEEE BigData 2019 , December 4-12 19
  • 20. IEEE BigData 2019 , December 4-12 20
  • 21. Generated frequent patterns 04 14 24 34 4 Header table of pattern 24 0:5 2:2 3:2 4:2 1:3 2:1 3:1 4:1 4:2 2 The length of index array Depend on hash function 0 1 3 1 1 1 Idx:0 Idx:1 0 3 1 2 The position is decided by hash value # node # support IEEE BigData 2019 , December 4-12 21 Assume the support threshold is 3 A new frequent pattern 024:3 will be generated