SlideShare une entreprise Scribd logo
1  sur  31
Machine Learning and
Data Mining:
case studies
2013, April 02nd, 14:00
Dmitry Efimov
http://mech.math.msu.su/~efimov/
3
Outline
1. Machine Learning problems
2. Methods: Regression, Distance, Probability
3. Case studies
4. How to solve problems?
4
How to teach
computer to
grade
students
essays?
Essay grading
5
How to
predict prices
in the next
year?
Heavy Machines sales
6
How to
predict
molecule
response for
medicines?
Molecule response
7
How to repair
missed
connections?
How to give
weights to
connections?
People relationships
8
What is Kaggle?
9
Definitions
•
Regression
11
• What about this case?
• Or if there are many features?
• Powerful method: Neural Networks
But…
Distance approach: SVM
•
12Vapnik, 1995
SVM (non-linear case)
•
13Vapnik, 1995
14
Probability approach:
decision trees
Ensembling: Random Forests
• Boosting = average of many simple
algorithms
• Simple algorithm = one decision tree
• Boosting + decision trees = Random Forests
15Breiman, 2001
16
Case 1. Social ties strength
• Organized by Panjia (www.panjiaco.com)
• Problem: predict the strength of social ties
• The prize pool: 75 000 $
• Training set size: 50 000
• Test set size: 40 000
17
Description of problem
• Number of features:
more than 500!
• Features example:
1) Number of friends (node feature)
2) Number of common friends (edge feature)
3) Number of common albums (combined
Number of all albums feature)
18
Features engineering
Stochastic gradient descent in
decision trees (GBM)
19Ridgeway, 2007
20
Obtained accuracy
21
Case 2. Biological Response
prediction
Functional Ensembling
•
22Efimov & Nikulin, 2012
Functional Ensembling:
Example
•
23Efimov & Nikulin, 2012
Functional Ensembling: Algorithm
•
24
Final ensembling
•
25
min min 0.55
0.1 mean 0.9
0.75 max max
Obtained accuracy
26
Winner
result
0.37356
Our result
0.37363
Our best
result
0.37093
0.3705
0.371
0.3715
0.372
0.3725
0.373
0.3735
0.374
27
How to solve problems?
• Algorithm perfectly works on Training set
• But! Algorithm does not work on Test set!
28
Overfitting
• Target is unknown for the Test set
• Separate Training set in two parts:
• 1st part: New Training set
• 2nd part: New Test set (with known target)
29
Crossvalidation
If you are interested in this topic…
• Read papers and books about Machine
Learning
• Communicate with people (Kaggle, LinkedIn)
• Participate in competitions
• Study Mathematics
30
What’s next?
Thank you!
Any questions?
Dmitry Efimov
defimov@aus.edu

Contenu connexe

Tendances

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Universitat Politècnica de Catalunya
 

Tendances (20)

Iris - Most loved dataset
Iris - Most loved datasetIris - Most loved dataset
Iris - Most loved dataset
 
Deep learning with tensorflow
Deep learning with tensorflowDeep learning with tensorflow
Deep learning with tensorflow
 
Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning project presentation
Machine Learning project presentationMachine Learning project presentation
Machine Learning project presentation
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm Heart disease prediction using machine learning algorithm
Heart disease prediction using machine learning algorithm
 
(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging(2017/06)Practical points of deep learning for medical imaging
(2017/06)Practical points of deep learning for medical imaging
 
Machine learning
Machine learning Machine learning
Machine learning
 
Adversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdfAdversarial Attacks and Defenses in Deep Learning.pdf
Adversarial Attacks and Defenses in Deep Learning.pdf
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.Heart Disease Identification Method Using Machine Learnin in E-healthcare.
Heart Disease Identification Method Using Machine Learnin in E-healthcare.
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine learning overview
Machine learning overviewMachine learning overview
Machine learning overview
 
Artificial-Intelligence-Use-Cases.pdf
Artificial-Intelligence-Use-Cases.pdfArtificial-Intelligence-Use-Cases.pdf
Artificial-Intelligence-Use-Cases.pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Similaire à Introduction to Machine Learning (case studies)

Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
SisayNegash4
 
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
Codiax
 
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
Zuzan Michael Japang
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
butest
 

Similaire à Introduction to Machine Learning (case studies) (20)

Introduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdfIntroduction to machine learning-2023-IT-AI and DS.pdf
Introduction to machine learning-2023-IT-AI and DS.pdf
 
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
Sean Holden (University of Cambridge) - Proving Theorems_ Still A Major Test ...
 
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
CHI (Computer Human Interaction) 2019 enhancing online problems through instr...
 
NPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdfNPTL - Machine Learning by Madhur Jatiya.pdf
NPTL - Machine Learning by Madhur Jatiya.pdf
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Learning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning TechniquesLearning to Teach: Improving Instruction with Machine Learning Techniques
Learning to Teach: Improving Instruction with Machine Learning Techniques
 
Unit-V Machine Learning.ppt
Unit-V Machine Learning.pptUnit-V Machine Learning.ppt
Unit-V Machine Learning.ppt
 
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
JURNAL: An Action Research The Effect of Computer-based Mathematics on Proble...
 
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
Joseph Jay Williams - WESST - Bridging Research and Practice via MOOClets & C...
 
micro testing teaching learning analytics
micro testing teaching learning analyticsmicro testing teaching learning analytics
micro testing teaching learning analytics
 
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
Joseph Jay Williams - WESST - Bridging Research via MOOClets and Collaborativ...
 
Lecture 5
Lecture 5Lecture 5
Lecture 5
 
CS faculty newsletter oct 17
CS faculty newsletter oct 17CS faculty newsletter oct 17
CS faculty newsletter oct 17
 
Ensemble methods in Machine learning technology
Ensemble methods in Machine learning technologyEnsemble methods in Machine learning technology
Ensemble methods in Machine learning technology
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Presentation v2
Presentation v2Presentation v2
Presentation v2
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Ensemble Learning and Boosting
Ensemble Learning and BoostingEnsemble Learning and Boosting
Ensemble Learning and Boosting
 
1. Intoduction to ML.pptx
1. Intoduction to ML.pptx1. Intoduction to ML.pptx
1. Intoduction to ML.pptx
 
Machine learning --Introduction.pptx
Machine learning --Introduction.pptxMachine learning --Introduction.pptx
Machine learning --Introduction.pptx
 

Dernier

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Dernier (20)

Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 

Introduction to Machine Learning (case studies)