SlideShare a Scribd company logo
1 of 71
Survival of the Fittest - Using Genetic
Algorithm for Data Mining Optimization
July 25, 2013
Or Levi
Introduction
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 2
•Better Results
•Higher Accuracy
•Knowledge
•Insights
Big DataMachine Learning on eBay
Data Mining Optimization
Genetic Algorithm
Agenda
What is Genetic Algorithm?
How GA can help improve Cluster Analysis?
Where it might be useful? An eBay Use Case
Questions and Answers
3Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
1
2
3
4
Genetic Algorithm
A Search Heuristic Inspired by the Natural Evolution
Genetic Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 5
0 0 1 0 0 1 0
Neck Length
Solution Representation Fitness Value Natural Selection Mechanism
EnvironmentChromosome
Tall Trees, Competition5’1
Adi
7 Genes
Genetic Algorithm
Initial Population
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 6
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
0 0 1 0 0 1 0
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
Adi
Genetic Algorithm
Fitness Function
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 7
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
Adi
5’6
4’2
5’8
4’9
5’1
Neck Length
7
Genetic Algorithm
Selection
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 8
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
5’6
4’2
5’8
4’9
Elitism
Adi 5’1
Neck Length
Genetic Algorithm
Selection
Fitness proportionate selection
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 9
Ron
Joe
Adi
Tom
Zoe
Genetic Algorithm
Crossover
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 10
1 1 0 1 1 1 0
Ron
0 1 0 1 0 1 1
Zoe
1 1 0 1 0 1 1
Ron Junior
0 1 0 1 1 1 0
Zoe Junior
5’8
4’2
6’0
5’3
Crossover Probability
Genetic Algorithm
Mutation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 11
1 1 0 1 0 1 1
Ron Junior
0 1 0 1 1 1 0
Zoe Junior
No Mutation
6’0
5’3
Mutation Probability: 0. 1
Fitness
Chromosome
Genetic Algorithm
Crossover
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 12
1 0 0 1 1 0 0
Joe
0 0 1 0 0 1 0
Adi
No Crossover
5’6
5’1
Crossover Probability
Genetic Algorithm
Mutation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 13
1 0 0 1 1 0 0
Joe
0 0 1 0 0 1 0
Adi
0 0 1 0 1 1 0
Adi Junior 5’5
5’6
5’1
Genetic Algorithm
New Generation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 14
1 1 0 1 1 1 0
Ron
1 1 0 1 0 1 1
Ron Junior
1 0 0 1 1 0 0
Joe
0 0 1 0 1 1 0
5’8
6’0
5’6
Neck Length
Previous
Adi Junior 5’5
0 1 0 1 1 1 0
Zoe Junior 5’3
5’1 5’5
New
Genetic Algorithm
Results
15Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
1 0 1 1 0 1 0
Adi Junior VIII 7’0
Overview – Genetic Algorithm
16Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
eBay Structured Data
What inventory is on our shelves?
Structured Data
18Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Taxonomy
Products
Item Finders
Attributes
Structured Data
19Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Structured Data
20Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Structured Data
21Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Data Vendors
eBay Sellers
eBay Items
Products
Everywhere
Products
ISBN
UPI
Choose Aggregation Set
Brand
Model
Color
Creating Products from Items
22
Product Features
Network: 4G
Camera: 8.0MP
Screen Size: 4 in.
Used iOS
16GB New
Unlocked
$525.00
17 Bids
$649.99
Buy It Now
$579.99
or Best Offer
Storage
Carrier
Apple iPhone 5 – BlackSmartphones
Product Type eBay View Items Product
Apple iPhone 5 Black
Apple iPhone 5 Black
Black Apple iPhone 5
Other Features
Bluetooth: Yes
GPS: Yes
Dimensions:
Height: 4.87 in.
Depth: 0.30 in.
Width: 2.31 in.
Choose Aggregation Set Extract Relevant Attributes Aggregate Similar Items
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Creating Products from Items
23Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Structured Aggregation
Top-Down
Unstructured Clustering
Bottom-Up
Items
Products
Aggregation Set
Aggregation Set
Overview – eBay Use Case
24Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
Cluster Analysis
Discovering groups and structures that are in some way similar
K-Means Cluster Analysis
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 26
𝒙𝒋
𝒎𝒊𝒏
𝒊=𝟏
𝑲
𝑺𝑺𝑬𝒊
𝑺𝑺𝑬𝒊 =
𝒙 𝒋∈𝑪 𝒊
𝒙𝒋 − 𝝁𝒊
𝟐
𝝁𝒊
𝑪𝒊
Model
Total Within
Cluster Variance
Observation
Center
Cluster
Objective
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 27
Choose
K Random
Points
Initial Center
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 28
Assign
Points to
Clusters
Cluster
Center
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 29
Recalculate
the Clusters
Means
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 30
1
33.7
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Recalculate
the Clusters
Means
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 31
2
26.8
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 32
3
23.6
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 33
4
21.6
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 34
5
19.5
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 35
6
18.8
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 36
7
18.7
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Local Optimum
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 37
Initial
Cluster
Centers
Initial Center
Local Optimum
Overview – Standard K-Means
38Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Standard K-Means
Local Optimum
Genetic K-Means Algorithm
Applying genetic algorithm to the standard K-Means heuristic
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 40
0 0 1 0 0 1 0
Chromosome
Adi
7 Genes
Solution Representation
Genetic K-Means Algorithm
Solution Representation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 41
Genetic K-Means Algorithm
Solution Representation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 42
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 43
Solution Fitness
Neck Length
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 44
𝒊=𝟏
𝑲
𝒙 𝒋∈𝑪 𝒊
𝒙𝒋 − 𝝁𝒊
𝟐
Total Within
Cluster Variance
Solution Fitness
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 45
𝑲 𝑴𝒆𝒂𝒏𝒔 𝑰𝒕𝒆𝒓𝒂𝒕𝒊𝒐𝒏
Solution Fitness
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 46
𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓
Solution 1
Solution 2
Crossover
Ron Zoe
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 47
Offspring 1
Offspring 2
Crossover
Ron Junior Zoe Junior
𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 48
Mutation
Adi
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 49
Mutation
Adi Junior
Overview – Genetic K-Means Algorithm
50Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Apply GA to K-MeansStandard K-Means
Local Optimum
Demo
Genetic Algorithm VS Standard K-Means
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 52
1
32.5
Best Solution
In Generation
Cluster
Center
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 53
2
21.9
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 54
3
14.7
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 55
4
12.3
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 56
5
9.7
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 57
6
9.1
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 58
7
8.9
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
Genetic Algorithm VS Standard K-Means
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 59
0
5
10
15
20
25
30
35
40
0 2 4 6 8 10 12
Total
Within
Cluster
Variance
Generations
Total Within Cluster Variance Per Generation
Genetic Algorithm K-Means
Genetic Algorithm VS Standard K-Means
60Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
0
5
10
15
20
25
30
0 5 10 15 20
Total Within-Cluster Variance on Different Runs
K-Means Multiple K-Means Genetic Algorithm
Local Optimum
High Volatility
Global Optimum
51% 32%
Average
Improvement
Across 20 Different Runs VS Standard K-Means VS Multiple K-Means
Total Within
Cluster Variance
Overview – GA VS Standard K-Means
61Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
eBay Use Case
Extract Structured Data from groups of similar items
eBay Use Case
63Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Lumia 920 Red 32GB Lumia 520 Yellow 8GB Lumia 620 Green 8GB
Lumia 800 Blue 16GB
eBay Use Case
64Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Nokia Lumia 800 Blue 8GB phone
0.05 0.12 0 0.31 0 0.20 0.12 0 0.14
Clean Up
TF-IDF Weights
NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE
NOKIA LUMIA 800 BLUE 8GB PHONE
Number of Unique Terms in All Titles
Original Title
9 7 9
25
520 620 800 920
50 Random Items
Text Dictionary: All Titles
Importance of
A term to a title
{Stop Words}
brand new
eBay Use Case
65Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Aggregation Set:
Model Color Storage
0.08 0.11 0 0.13 0 0.06 0.05 0 0.03
8GB 620
Average
Weight
GREEN 5MP CAMERA PHONE
Cluster Center
1 Item
NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE
46% 23%
Average
Improvement
Across 20 Different Runs VS Standard K-Means VS Multiple K-Means
Accurate Item
Classifications
Overview – Example
66Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
Questions & Answers
Open Discussion
?
Conclusion
Summing it all up
Conclusion
69Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
+50% Accuracy
Thank You!
Or Levi
Data Analyst
Catalog & Classification
eBay Structured Data
olevi@ebay.com
Linked
Appendix – Genetic Algorithm Parameters
71Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Crossover
Probability
65% 90%
5%
10%
Mutation
Probability
Population Size: 10 Number of Generations: 10
Crossover Probability: 75% Mutation Probability: 9%
Normalized
Score
100
0
Total Within
Cluster Variance
Average of 5 Runs

More Related Content

What's hot

Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceSahil Kumar
 
Introduction to Genetic Algorithms and Evolutionary Computation
Introduction to Genetic Algorithms and Evolutionary ComputationIntroduction to Genetic Algorithms and Evolutionary Computation
Introduction to Genetic Algorithms and Evolutionary ComputationAleksander Stensby
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm pptMayank Jain
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmszamakhan
 
Introduction to Evolutionary Algorithms
Introduction to Evolutionary AlgorithmsIntroduction to Evolutionary Algorithms
Introduction to Evolutionary Algorithmsherbps10
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktimRaktim Halder
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by ExampleNobal Niraula
 
Fuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmFuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmPintu Khan
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithmJari Abbas
 
Genetic programming
Genetic programmingGenetic programming
Genetic programmingMeghna Singh
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning SystemAnalysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning SystemHarshal Jain
 
GENETIC ALGORITHM
GENETIC ALGORITHMGENETIC ALGORITHM
GENETIC ALGORITHMHarsh Sinha
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithmsanas_elf
 

What's hot (20)

Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Genetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial IntelligenceGenetic Algorithms - Artificial Intelligence
Genetic Algorithms - Artificial Intelligence
 
Introduction to Genetic Algorithms and Evolutionary Computation
Introduction to Genetic Algorithms and Evolutionary ComputationIntroduction to Genetic Algorithms and Evolutionary Computation
Introduction to Genetic Algorithms and Evolutionary Computation
 
Genetic algorithm ppt
Genetic algorithm pptGenetic algorithm ppt
Genetic algorithm ppt
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
Introduction to Evolutionary Algorithms
Introduction to Evolutionary AlgorithmsIntroduction to Evolutionary Algorithms
Introduction to Evolutionary Algorithms
 
RM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lectureRM 701 Genetic Algorithm and Fuzzy Logic lecture
RM 701 Genetic Algorithm and Fuzzy Logic lecture
 
Genetic algorithm raktim
Genetic algorithm raktimGenetic algorithm raktim
Genetic algorithm raktim
 
Ga ppt (1)
Ga ppt (1)Ga ppt (1)
Ga ppt (1)
 
Ga
GaGa
Ga
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by Example
 
Introduction to Genetic Algorithms
Introduction to Genetic AlgorithmsIntroduction to Genetic Algorithms
Introduction to Genetic Algorithms
 
Fuzzy Genetic Algorithm
Fuzzy Genetic AlgorithmFuzzy Genetic Algorithm
Fuzzy Genetic Algorithm
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Genetic programming
Genetic programmingGenetic programming
Genetic programming
 
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning SystemAnalysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
Analysis of Parameter using Fuzzy Genetic Algorithm in E-learning System
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 
GENETIC ALGORITHM
GENETIC ALGORITHMGENETIC ALGORITHM
GENETIC ALGORITHM
 
Genetic Algorithms
Genetic AlgorithmsGenetic Algorithms
Genetic Algorithms
 

Viewers also liked

Genetic Algorithms Made Easy
Genetic Algorithms Made EasyGenetic Algorithms Made Easy
Genetic Algorithms Made EasyPrakash Pimpale
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithmgarima931
 
Week 12 future computing 2014 tr2
Week 12 future computing 2014 tr2Week 12 future computing 2014 tr2
Week 12 future computing 2014 tr2karenmclaughlin1961
 
Genetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problemGenetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problemshrinivasvasala
 
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs Mutation
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs MutationAmiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs Mutation
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs MutationArchiLab 7
 
Secrets of Landing Page Testing [132] - Steffek
Secrets of Landing Page Testing [132] - SteffekSecrets of Landing Page Testing [132] - Steffek
Secrets of Landing Page Testing [132] - SteffekRobin Steffek
 
Lecture 28 genetic algorithm
Lecture 28 genetic algorithmLecture 28 genetic algorithm
Lecture 28 genetic algorithmHema Kashyap
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingLogin Technoligies
 
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMAvay Minni
 
Rule discovery from time series
Rule discovery from time seriesRule discovery from time series
Rule discovery from time seriesbutest
 
Mis(data mning)
Mis(data mning)Mis(data mning)
Mis(data mning)774474
 
A hybrid genetic algorithm and chaotic function model for image encryption
A hybrid genetic algorithm and chaotic function model for image encryptionA hybrid genetic algorithm and chaotic function model for image encryption
A hybrid genetic algorithm and chaotic function model for image encryptionsadique_ghitm
 
Analysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsAnalysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsXin-She Yang
 
Selection in Evolutionary Algorithm
Selection in Evolutionary AlgorithmSelection in Evolutionary Algorithm
Selection in Evolutionary AlgorithmRiyad Parvez
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an IntroductionAli Abbasi
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentSwapnil Shahade
 
Airline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAirline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAlan Walker
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data MiningValerii Klymchuk
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern miningkiran said
 

Viewers also liked (20)

Genetic Algorithms Made Easy
Genetic Algorithms Made EasyGenetic Algorithms Made Easy
Genetic Algorithms Made Easy
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Week 12 future computing 2014 tr2
Week 12 future computing 2014 tr2Week 12 future computing 2014 tr2
Week 12 future computing 2014 tr2
 
Genetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problemGenetic algorithm and graph partitioning problem
Genetic algorithm and graph partitioning problem
 
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs Mutation
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs MutationAmiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs Mutation
Amiina Bakunowicz: Genetic Algorithm, Fitness vs Crossover vs Mutation
 
Secrets of Landing Page Testing [132] - Steffek
Secrets of Landing Page Testing [132] - SteffekSecrets of Landing Page Testing [132] - Steffek
Secrets of Landing Page Testing [132] - Steffek
 
Lecture 28 genetic algorithm
Lecture 28 genetic algorithmLecture 28 genetic algorithm
Lecture 28 genetic algorithm
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process Scheduling
 
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHMSTUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
STUDY ON PROJECT MANAGEMENT THROUGH GENETIC ALGORITHM
 
Rule discovery from time series
Rule discovery from time seriesRule discovery from time series
Rule discovery from time series
 
Mis(data mning)
Mis(data mning)Mis(data mning)
Mis(data mning)
 
Data mining
Data miningData mining
Data mining
 
A hybrid genetic algorithm and chaotic function model for image encryption
A hybrid genetic algorithm and chaotic function model for image encryptionA hybrid genetic algorithm and chaotic function model for image encryption
A hybrid genetic algorithm and chaotic function model for image encryption
 
Analysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization AlgorithmsAnalysis of Nature-Inspried Optimization Algorithms
Analysis of Nature-Inspried Optimization Algorithms
 
Selection in Evolutionary Algorithm
Selection in Evolutionary AlgorithmSelection in Evolutionary Algorithm
Selection in Evolutionary Algorithm
 
Data Mining: an Introduction
Data Mining: an IntroductionData Mining: an Introduction
Data Mining: an Introduction
 
Genetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing EnvironmentGenetic Algorithm for task scheduling in Cloud Computing Environment
Genetic Algorithm for task scheduling in Cloud Computing Environment
 
Airline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithmAirline scheduling and pricing using a genetic algorithm
Airline scheduling and pricing using a genetic algorithm
 
01 Introduction to Data Mining
01 Introduction to Data Mining01 Introduction to Data Mining
01 Introduction to Data Mining
 
Sequential pattern mining
Sequential pattern miningSequential pattern mining
Sequential pattern mining
 

Similar to Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization

The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...Iddo
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGThermo Fisher Scientific
 
Artificial Intelligence - 2
Artificial Intelligence - 2Artificial Intelligence - 2
Artificial Intelligence - 2Muhd Mu'izuddin
 
An innovative approach for feature selection based on chicken swarm optimization
An innovative approach for feature selection based on chicken swarm optimizationAn innovative approach for feature selection based on chicken swarm optimization
An innovative approach for feature selection based on chicken swarm optimizationAboul Ella Hassanien
 
Genetic Algorithms for Evolving Computer Chess Programs
Genetic Algorithms for Evolving Computer Chess Programs   Genetic Algorithms for Evolving Computer Chess Programs
Genetic Algorithms for Evolving Computer Chess Programs Patrick Walter
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostTakami Sato
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GenomeInABottle
 
Deep Parameters Tuning for Android Mobile Apps
Deep Parameters Tuning for Android Mobile AppsDeep Parameters Tuning for Android Mobile Apps
Deep Parameters Tuning for Android Mobile AppsDavide De Chiara
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Data Con LA
 
Data generation, the hard parts
Data generation, the hard partsData generation, the hard parts
Data generation, the hard partsEric Torreborre
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
Ga presentation
Ga presentationGa presentation
Ga presentationziad zohdy
 
GENETIC ALGORITHM ( GA )
GENETIC ALGORITHM ( GA )GENETIC ALGORITHM ( GA )
GENETIC ALGORITHM ( GA )abuamo
 
Winning Data Science Competitions
Winning Data Science CompetitionsWinning Data Science Competitions
Winning Data Science CompetitionsJeong-Yoon Lee
 
Search Marketing 101
Search Marketing 101Search Marketing 101
Search Marketing 101Rob Goldman
 
WIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxWIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxKelvinCheah4
 
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Ahmed Gamal Abdel Gawad
 

Similar to Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization (20)

The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...The roles communities play in improving bioinformatics: better software, bett...
The roles communities play in improving bioinformatics: better software, bett...
 
Presentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAGPresentation from the Life Technologies booth at PAG
Presentation from the Life Technologies booth at PAG
 
Artificial Intelligence - 2
Artificial Intelligence - 2Artificial Intelligence - 2
Artificial Intelligence - 2
 
An innovative approach for feature selection based on chicken swarm optimization
An innovative approach for feature selection based on chicken swarm optimizationAn innovative approach for feature selection based on chicken swarm optimization
An innovative approach for feature selection based on chicken swarm optimization
 
Genetic Algorithms for Evolving Computer Chess Programs
Genetic Algorithms for Evolving Computer Chess Programs   Genetic Algorithms for Evolving Computer Chess Programs
Genetic Algorithms for Evolving Computer Chess Programs
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboost
 
GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005GIAB-GRC workshop oct2015 giab introduction 151005
GIAB-GRC workshop oct2015 giab introduction 151005
 
Deep Parameters Tuning for Android Mobile Apps
Deep Parameters Tuning for Android Mobile AppsDeep Parameters Tuning for Android Mobile Apps
Deep Parameters Tuning for Android Mobile Apps
 
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
Big Data Day LA 2015 - Applications of the Apriori Algorithm on Open Data by ...
 
Genetic algo
Genetic algoGenetic algo
Genetic algo
 
Genetic Algorithm
Genetic AlgorithmGenetic Algorithm
Genetic Algorithm
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
Data generation, the hard parts
Data generation, the hard partsData generation, the hard parts
Data generation, the hard parts
 
2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
Ga presentation
Ga presentationGa presentation
Ga presentation
 
GENETIC ALGORITHM ( GA )
GENETIC ALGORITHM ( GA )GENETIC ALGORITHM ( GA )
GENETIC ALGORITHM ( GA )
 
Winning Data Science Competitions
Winning Data Science CompetitionsWinning Data Science Competitions
Winning Data Science Competitions
 
Search Marketing 101
Search Marketing 101Search Marketing 101
Search Marketing 101
 
WIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptxWIX3001 Lecture 6 Principles of GA.pptx
WIX3001 Lecture 6 Principles of GA.pptx
 
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
Structural Optimization using Genetic Algorithms - Artificial Intelligence Fu...
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationShrmpro
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...masabamasaba
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...Nitya salvi
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 

Recently uploaded (20)

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
SHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions PresentationSHRMPro HRMS Software Solutions Presentation
SHRMPro HRMS Software Solutions Presentation
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 

Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization

  • 1. Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization July 25, 2013 Or Levi
  • 2. Introduction Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 2 •Better Results •Higher Accuracy •Knowledge •Insights Big DataMachine Learning on eBay Data Mining Optimization Genetic Algorithm
  • 3. Agenda What is Genetic Algorithm? How GA can help improve Cluster Analysis? Where it might be useful? An eBay Use Case Questions and Answers 3Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 1 2 3 4
  • 4. Genetic Algorithm A Search Heuristic Inspired by the Natural Evolution
  • 5. Genetic Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 5 0 0 1 0 0 1 0 Neck Length Solution Representation Fitness Value Natural Selection Mechanism EnvironmentChromosome Tall Trees, Competition5’1 Adi 7 Genes
  • 6. Genetic Algorithm Initial Population Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 6 1 0 0 1 1 0 0 Joe 0 1 0 1 0 1 1 Zoe 1 1 0 1 1 1 0 Ron 1 0 1 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 0 1 Tom 0 0 1 0 0 1 0 Adi
  • 7. Genetic Algorithm Fitness Function Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 7 1 0 0 1 1 0 0 Joe 0 1 0 1 0 1 1 Zoe 1 1 0 1 1 1 0 Ron 1 0 1 0 1 0 1 Tom 0 0 1 0 0 1 0 Adi 5’6 4’2 5’8 4’9 5’1 Neck Length 7
  • 8. Genetic Algorithm Selection Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 8 1 0 0 1 1 0 0 Joe 0 1 0 1 0 1 1 Zoe 1 1 0 1 1 1 0 Ron 1 0 1 0 1 0 1 Tom 0 0 1 0 0 1 0 5’6 4’2 5’8 4’9 Elitism Adi 5’1 Neck Length
  • 9. Genetic Algorithm Selection Fitness proportionate selection Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 9 Ron Joe Adi Tom Zoe
  • 10. Genetic Algorithm Crossover Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 10 1 1 0 1 1 1 0 Ron 0 1 0 1 0 1 1 Zoe 1 1 0 1 0 1 1 Ron Junior 0 1 0 1 1 1 0 Zoe Junior 5’8 4’2 6’0 5’3 Crossover Probability
  • 11. Genetic Algorithm Mutation Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 11 1 1 0 1 0 1 1 Ron Junior 0 1 0 1 1 1 0 Zoe Junior No Mutation 6’0 5’3 Mutation Probability: 0. 1 Fitness Chromosome
  • 12. Genetic Algorithm Crossover Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 12 1 0 0 1 1 0 0 Joe 0 0 1 0 0 1 0 Adi No Crossover 5’6 5’1 Crossover Probability
  • 13. Genetic Algorithm Mutation Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 13 1 0 0 1 1 0 0 Joe 0 0 1 0 0 1 0 Adi 0 0 1 0 1 1 0 Adi Junior 5’5 5’6 5’1
  • 14. Genetic Algorithm New Generation Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 14 1 1 0 1 1 1 0 Ron 1 1 0 1 0 1 1 Ron Junior 1 0 0 1 1 0 0 Joe 0 0 1 0 1 1 0 5’8 6’0 5’6 Neck Length Previous Adi Junior 5’5 0 1 0 1 1 1 0 Zoe Junior 5’3 5’1 5’5 New
  • 15. Genetic Algorithm Results 15Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 1 0 1 1 0 1 0 Adi Junior VIII 7’0
  • 16. Overview – Genetic Algorithm 16Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
  • 17. eBay Structured Data What inventory is on our shelves?
  • 18. Structured Data 18Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Taxonomy Products Item Finders Attributes
  • 19. Structured Data 19Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
  • 20. Structured Data 20Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
  • 21. Structured Data 21Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Data Vendors eBay Sellers eBay Items Products Everywhere Products ISBN UPI
  • 22. Choose Aggregation Set Brand Model Color Creating Products from Items 22 Product Features Network: 4G Camera: 8.0MP Screen Size: 4 in. Used iOS 16GB New Unlocked $525.00 17 Bids $649.99 Buy It Now $579.99 or Best Offer Storage Carrier Apple iPhone 5 – BlackSmartphones Product Type eBay View Items Product Apple iPhone 5 Black Apple iPhone 5 Black Black Apple iPhone 5 Other Features Bluetooth: Yes GPS: Yes Dimensions: Height: 4.87 in. Depth: 0.30 in. Width: 2.31 in. Choose Aggregation Set Extract Relevant Attributes Aggregate Similar Items Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
  • 23. Creating Products from Items 23Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Structured Aggregation Top-Down Unstructured Clustering Bottom-Up Items Products Aggregation Set Aggregation Set
  • 24. Overview – eBay Use Case 24Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Example
  • 25. Cluster Analysis Discovering groups and structures that are in some way similar
  • 26. K-Means Cluster Analysis Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 26 𝒙𝒋 𝒎𝒊𝒏 𝒊=𝟏 𝑲 𝑺𝑺𝑬𝒊 𝑺𝑺𝑬𝒊 = 𝒙 𝒋∈𝑪 𝒊 𝒙𝒋 − 𝝁𝒊 𝟐 𝝁𝒊 𝑪𝒊 Model Total Within Cluster Variance Observation Center Cluster Objective
  • 27. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 27 Choose K Random Points Initial Center
  • 28. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 28 Assign Points to Clusters Cluster Center
  • 29. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 29 Recalculate the Clusters Means
  • 30. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 30 1 33.7 Solution In Iteration Total Within Cluster Variance Solution Score Recalculate the Clusters Means
  • 31. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 31 2 26.8 Solution In Iteration Total Within Cluster Variance Solution Score
  • 32. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 32 3 23.6 Solution In Iteration Total Within Cluster Variance Solution Score
  • 33. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 33 4 21.6 Solution In Iteration Total Within Cluster Variance Solution Score
  • 34. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 34 5 19.5 Solution In Iteration Total Within Cluster Variance Solution Score
  • 35. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 35 6 18.8 Solution In Iteration Total Within Cluster Variance Solution Score
  • 36. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 36 7 18.7 Solution In Iteration Total Within Cluster Variance Solution Score Local Optimum
  • 37. Standard K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 37 Initial Cluster Centers Initial Center Local Optimum
  • 38. Overview – Standard K-Means 38Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Standard K-Means Local Optimum
  • 39. Genetic K-Means Algorithm Applying genetic algorithm to the standard K-Means heuristic
  • 40. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 40 0 0 1 0 0 1 0 Chromosome Adi 7 Genes Solution Representation
  • 41. Genetic K-Means Algorithm Solution Representation Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 41
  • 42. Genetic K-Means Algorithm Solution Representation Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 42
  • 43. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 43 Solution Fitness Neck Length
  • 44. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 44 𝒊=𝟏 𝑲 𝒙 𝒋∈𝑪 𝒊 𝒙𝒋 − 𝝁𝒊 𝟐 Total Within Cluster Variance Solution Fitness
  • 45. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 45 𝑲 𝑴𝒆𝒂𝒏𝒔 𝑰𝒕𝒆𝒓𝒂𝒕𝒊𝒐𝒏 Solution Fitness
  • 46. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 46 𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓 Solution 1 Solution 2 Crossover Ron Zoe
  • 47. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 47 Offspring 1 Offspring 2 Crossover Ron Junior Zoe Junior 𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓
  • 48. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 48 Mutation Adi
  • 49. Genetic K-Means Algorithm Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 49 Mutation Adi Junior
  • 50. Overview – Genetic K-Means Algorithm 50Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Apply GA to K-MeansStandard K-Means Local Optimum
  • 51. Demo Genetic Algorithm VS Standard K-Means
  • 52. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 52 1 32.5 Best Solution In Generation Cluster Center Total Within Cluster Variance Solution Fitness
  • 53. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 53 2 21.9 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 54. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 54 3 14.7 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 55. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 55 4 12.3 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 56. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 56 5 9.7 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 57. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 57 6 9.1 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 58. Demo Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 58 7 8.9 Best Solution In Generation Total Within Cluster Variance Solution Fitness
  • 59. Genetic Algorithm VS Standard K-Means Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 59 0 5 10 15 20 25 30 35 40 0 2 4 6 8 10 12 Total Within Cluster Variance Generations Total Within Cluster Variance Per Generation Genetic Algorithm K-Means
  • 60. Genetic Algorithm VS Standard K-Means 60Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 0 5 10 15 20 25 30 0 5 10 15 20 Total Within-Cluster Variance on Different Runs K-Means Multiple K-Means Genetic Algorithm Local Optimum High Volatility Global Optimum 51% 32% Average Improvement Across 20 Different Runs VS Standard K-Means VS Multiple K-Means Total Within Cluster Variance
  • 61. Overview – GA VS Standard K-Means 61Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Apply GA to K-Means Global Optimum Standard K-Means Local Optimum
  • 62. eBay Use Case Extract Structured Data from groups of similar items
  • 63. eBay Use Case 63Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Lumia 920 Red 32GB Lumia 520 Yellow 8GB Lumia 620 Green 8GB Lumia 800 Blue 16GB
  • 64. eBay Use Case 64Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Nokia Lumia 800 Blue 8GB phone 0.05 0.12 0 0.31 0 0.20 0.12 0 0.14 Clean Up TF-IDF Weights NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE NOKIA LUMIA 800 BLUE 8GB PHONE Number of Unique Terms in All Titles Original Title 9 7 9 25 520 620 800 920 50 Random Items Text Dictionary: All Titles Importance of A term to a title {Stop Words} brand new
  • 65. eBay Use Case 65Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Aggregation Set: Model Color Storage 0.08 0.11 0 0.13 0 0.06 0.05 0 0.03 8GB 620 Average Weight GREEN 5MP CAMERA PHONE Cluster Center 1 Item NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE 46% 23% Average Improvement Across 20 Different Runs VS Standard K-Means VS Multiple K-Means Accurate Item Classifications
  • 66. Overview – Example 66Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Example Apply GA to K-Means Global Optimum Standard K-Means Local Optimum
  • 67. Questions & Answers Open Discussion ?
  • 69. Conclusion 69Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Use Case Example Apply GA to K-Means Global Optimum Standard K-Means Local Optimum +50% Accuracy
  • 70. Thank You! Or Levi Data Analyst Catalog & Classification eBay Structured Data olevi@ebay.com Linked
  • 71. Appendix – Genetic Algorithm Parameters 71Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization Crossover Probability 65% 90% 5% 10% Mutation Probability Population Size: 10 Number of Generations: 10 Crossover Probability: 75% Mutation Probability: 9% Normalized Score 100 0 Total Within Cluster Variance Average of 5 Runs

Editor's Notes

  1. Still, you could say “ovcourse genetic algorithm is better In GA you have a population size of say 10 indivudlas and in each generation you run 1 iteration of kmeans for all the solutions, so it’s like you’re running k-means 10 times Simultaneously.” So I’ve also compared the genetic algorithm to what you could call mutiple kmeans and what I’ve found was really interesting. The genentic mostly returned the optimal solution, the multiple kmeans kept getting stuck around local optimums and Standard kmeans was just all over the place. So you can see that multiple k means can help us reduce the volatility of the results, but it is still can’t get past local optimums, and this really showes the added value of evolution and in particular the crossover and mutation operators. What it means is that, On average, genetic algorithm can help us reduce the total variance within each cluster by more than half compared to the standard k means (And across 20 different runs, On average, the genetic algorithm was able to find solutions that are more than 50% better than standard kmeans and more than 30% better than multiple kmeans.) * don’t just the describe what you anaylzed and the results