SlideShare une entreprise Scribd logo
1  sur  88
An introduction to machine learning and probabilistic graphical models Kevin Murphy MIT AI Lab  Presented at Intel’s workshop on “Machine learning for the life sciences”, Berkeley, CA, 3 November 2003
Overview ,[object Object],[object Object],[object Object],[object Object],Thanks to Nir Friedman, Stuart Russell, Leslie Kaelbling and various web sources for letting me use many of their slides
Supervised learning yes no N Small Arrow Red Y Small Star Blue Y Small Square Blue Y Big Torus Blue Output Size Shape Color F(x1, x2, x3) -> t Learn to approximate function from a training set of (x,t) pairs
Supervised learning  Learner Training data Hypothesis Testing data Prediction N S A R Y S S B Y S S B Y B T B T X3 X2 X1 ? S C Y ? S A B T X3 X2 X1 N Y T
Key issue: generalization yes no ? ? Can’t just memorize the training set (overfitting)
Hypothesis spaces ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Perceptron (neural net with no hidden layers) Linearly separable data
Which separating hyperplane?
The linear separator with the largest margin is the best one to pick margin
What if the data is not linearly separable?
Kernel trick kernel Kernel implicitly maps from 2D to 3D, making problem linearly separable x 1 x 2 z 1 z 2 z 3
Support Vector Machines (SVMs) ,[object Object],[object Object],[object Object]
Boosting Simple classifiers (weak learners) can have their performance boosted by taking weighted combinations Boosting maximizes the margin
Supervised learning success stories ,[object Object],[object Object],[object Object],[object Object],[object Object]
Unsupervised learning ,[object Object]
K-means clustering ,[object Object],[object Object],[object Object],[object Object],Reiterate
AutoClass (Cheeseman et al, 1986) ,[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchical clustering
Principal Component Analysis (PCA) PCA seeks a projection that best represents the data in a least-squares sense. PCA reduces the dimensionality of feature space by restricting attention to those directions along which the scatter of the cloud is greatest.
Discovering nonlinear manifolds
Combining supervised and unsupervised learning
Discovering rules (data mining) Find the most frequent patterns (association rules) Num in household = 1 ^ num children = 0 => language = English Language = English ^ Income < $40k ^ Married = false ^ num children = 0 => education  {college, grad school} HS MD PhD MA Educ. $30k $80k $20k $10k Income Retired Doctor Student Student Occup. 60 M F 30 M M 24 S F 22 S M Age Married  Sex
Unsupervised learning: summary ,[object Object],[object Object],[object Object],[object Object],[object Object]
Discovering networks ? From data visualization to causal discovery
Networks in biology ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Decreasing detail
Molecular level: Lysis-Lysogeny circuit in Lambda phage Arkin et al. (1998), Genetics 149(4):1633-48 ,[object Object],[object Object]
Concentration level: metabolic pathways ,[object Object],w 23 g1 g2 g3 g4 g5 w 12 w 55
Qualitative level: Boolean Networks
Probabilistic graphical models ,[object Object],[object Object],[object Object],[object Object],&quot;The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful. Therefore the true logic for this world is the calculus of probabilities.&quot; -- James Clerk Maxwell  &quot;Probability theory is nothing but common sense reduced to calculation.&quot; -- Pierre Simon Laplace
Graphical models: outline ,[object Object],[object Object],[object Object]
Simple probabilistic model: linear regression Y Y =    +    X + noise Deterministic (functional) relationship X
Simple probabilistic model: linear regression Y Y =    +    X + noise Deterministic (functional) relationship X “ Learning” = estimating parameters   ,   ,    from (x,y) pairs. Can be estimate by least squares Is the empirical mean Is the residual variance
Piecewise linear regression Latent “switch” variable – hidden process at work
Probabilistic graphical model for piecewise linear regression ,[object Object],[object Object],output input ,[object Object],Learning is harder because Q is hidden, so we don’t know which data points to assign to each line; can be solved with EM  (c.f., K-means) X Y Q
Classes of graphical models Probabilistic models Graphical models Directed Undirected Bayes nets MRFs DBNs
Bayesian Networks ,[object Object],[object Object],[object Object],[object Object],Quantitative part :  Set of conditional probability distributions Earthquake Radio Burglary Alarm Call Compact representation of probability distributions via conditional independence Together: Define a unique distribution in a factored form Family of  Alarm 0.9 0.1 e b e 0.2 0.8 0.01 0.99 0.9 0.1 b e b b e B E P(A | E,B)
Example: “ICU Alarm” network ,[object Object],[object Object],[object Object],[object Object],PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT ANAPHYLAXIS MINOVL PVSAT FIO2 PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
Success stories for graphical models ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Graphical models: outline ,[object Object],[object Object],[object Object]
Probabilistic Inference ,[object Object],[object Object],[object Object],Radio Call Earthquake Radio Burglary Alarm Call
Viterbi decoding Y 1 Y 3 X 1 X 2 X 3 Y 2 Compute most probable explanation (MPE) of observed data Hidden Markov Model (HMM) “ Tomato” hidden observed
Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs Many difference inference algorithms, both exact and approximate PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
Bayesian inference ,[object Object],[object Object],[object Object], X 1 Y 1 X n Y n Parameters are tied (shared) across repetitions of the data
Bayesian inference ,[object Object],[object Object],[object Object],[object Object]
Graphical models: outline ,[object Object],[object Object],[object Object],p p
Why Struggle for Accurate Structure? ,[object Object],[object Object],[object Object],[object Object],Adding an arc Missing an arc Earthquake Alarm Set Sound Burglary Earthquake Alarm Set Sound Burglary Earthquake Alarm   Set Sound Burglary Truth
Score ­b ased Learning E B A E B A E B A Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
Learning Trees ,[object Object],[object Object]
Heuristic Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Local Search Operations ,[object Object],Reverse  C   E Delete  C   E Add  C   D  score =  S({C,E}   D)  - S({E}   D)  S C E D S C E D S C E D S C E D
Problems with local search  S(G|D) Easy to get stuck in local optima “ truth” you
Problems with local search II Picking a single best model can be misleading E R B A C P(G|D)
Problems with local search II ,[object Object],[object Object],[object Object],Picking a single best model can be misleading E R B A C E R B A C E R B A C E R B A C E R B A C P(G|D)
Bayesian Approach to Structure Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],Feature of  G , e.g.,  X  Y Indicator function for feature  f Bayesian score for G
Bayesian approach: computational issues ,[object Object],How compute sum over super-exponential number of graphs? ,[object Object],[object Object]
Structure learning: other issues ,[object Object],[object Object],[object Object],[object Object]
Discovering latent variables a) 17 parameters b) 59 parameters There are some techniques for automatically detecting the possible presence of latent variables
Learning causal models ,[object Object],[object Object],[object Object],[object Object]
Learning causal models ,[object Object],[object Object],[object Object],X Y Z X Y Z X Y Z X Y Z
Learning from interventional data ,[object Object],[object Object],smoking Yellow fingers P(smoker|observe(yellow)) >> prior smoking Yellow fingers P(smoker | do(paint yellow)) = prior Cut arcs coming into nodes which were set by intervention
Active learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning from relational data Can we learn concepts from a set of relations between objects, instead of/ in addition to just their attributes?
Learning from relational data: approaches ,[object Object],[object Object],[object Object],[object Object],[object Object]
ILP for learning protein folding: input yes no TotalLength(D2mhr, 118) ^ NumberHelices(D2mhr, 6) ^ … 100 conjuncts describing structure of each pos/neg example
ILP for learning protein folding: results ,[object Object],[object Object]
ILP: Pros and Cons ,[object Object],[object Object],[object Object],[object Object]
The future of machine learning for bioinformatics? Oracle
The future of machine learning for bioinformatics Learner Prior knowledge Replicated experiments Biological literature Hypotheses Expt. design Real world ,[object Object]
The end
Decision trees blue? big? oval? no no yes yes
Decision trees blue? big? oval? no no yes yes + Handles mixed variables + Handles missing data + Efficient for large data sets + Handles irrelevant attributes + Easy to understand - Predictive power
Feedforward neural network input Hidden layer Output Weights on each arc Sigmoid function at each node
Feedforward neural network input Hidden layer Output - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predicts poorly
Nearest Neighbor ,[object Object],[object Object],[object Object],[object Object]
Nearest Neighbor ? - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predictive power
Support Vector Machines (SVMs) ,[object Object],[object Object],[object Object]
SVM: mathematical details ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],margin
Replace all inner products with kernels Kernel function
SVMs: summary - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predictive power ,[object Object],[object Object],General lessons from SVM success:
Boosting: summary ,[object Object],[object Object],+ Handles mixed variables + Handles missing data + Efficient for large data sets + Handles irrelevant attributes - Easy to understand + Predictive power
Supervised learning: summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Inference ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Radio Call Earthquake Radio Burglary Alarm Call
Assumption needed to make learning work ,[object Object],[object Object]
Structure learning success stories: gene regulation network (Friedman et al.)  ,[object Object],[object Object],[object Object]
Structure learning success stories II: Phylogenetic Tree Reconstruction (Friedman et al.) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],10 billion years Uses structural EM, with max-spanning-tree in the inner loop leaf
Instances of graphical models Probabilistic models Graphical models Directed Undirected Bayes nets MRFs DBNs Hidden Markov Model (HMM) Naïve Bayes classifier Mixtures of experts Kalman filter model Ising model
ML enabling technologies ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationSara Hooker
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade offVARUN KUMAR
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revisedKrish_ver2
 

Tendances (20)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 

En vedette

Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
Inference of the JAK-STAT Gene Network via Graphical Models
Inference of the JAK-STAT Gene Network via Graphical ModelsInference of the JAK-STAT Gene Network via Graphical Models
Inference of the JAK-STAT Gene Network via Graphical ModelsSSA KPI
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...zukun
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques butest
 
確率的主成分分析
確率的主成分分析確率的主成分分析
確率的主成分分析Mika Yoshimura
 
Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...potaters
 

En vedette (9)

Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Inference of the JAK-STAT Gene Network via Graphical Models
Inference of the JAK-STAT Gene Network via Graphical ModelsInference of the JAK-STAT Gene Network via Graphical Models
Inference of the JAK-STAT Gene Network via Graphical Models
 
Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...Principal component analysis and matrix factorizations for learning (part 1) ...
Principal component analysis and matrix factorizations for learning (part 1) ...
 
Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
 
確率的主成分分析
確率的主成分分析確率的主成分分析
確率的主成分分析
 
Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...Machine learning fro computer vision - a whirlwind of key concepts for the un...
Machine learning fro computer vision - a whirlwind of key concepts for the un...
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Hidden markov model ppt
Hidden markov model pptHidden markov model ppt
Hidden markov model ppt
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 

Similaire à . An introduction to machine learning and probabilistic ...

Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIUProf. Neeta Awasthy
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural NetworksMasahiro Suzuki
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Rakibul Hasan Pranto
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selectionchenhm
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
ppt slides
ppt slidesppt slides
ppt slidesbutest
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsSalah Amean
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..butest
 

Similaire à . An introduction to machine learning and probabilistic ... (20)

Artificial Neural Networks for NIU
Artificial Neural Networks for NIUArtificial Neural Networks for NIU
Artificial Neural Networks for NIU
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
Statistical Physics Studies of Machine Learning Problems by Lenka Zdeborova, ...
 
(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks(研究会輪読) Weight Uncertainty in Neural Networks
(研究会輪読) Weight Uncertainty in Neural Networks
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
Intro to Model Selection
Intro to Model SelectionIntro to Model Selection
Intro to Model Selection
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
ppt slides
ppt slidesppt slides
ppt slides
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 

Plus de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Plus de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

. An introduction to machine learning and probabilistic ...

  • 1. An introduction to machine learning and probabilistic graphical models Kevin Murphy MIT AI Lab Presented at Intel’s workshop on “Machine learning for the life sciences”, Berkeley, CA, 3 November 2003
  • 2.
  • 3. Supervised learning yes no N Small Arrow Red Y Small Star Blue Y Small Square Blue Y Big Torus Blue Output Size Shape Color F(x1, x2, x3) -> t Learn to approximate function from a training set of (x,t) pairs
  • 4. Supervised learning Learner Training data Hypothesis Testing data Prediction N S A R Y S S B Y S S B Y B T B T X3 X2 X1 ? S C Y ? S A B T X3 X2 X1 N Y T
  • 5. Key issue: generalization yes no ? ? Can’t just memorize the training set (overfitting)
  • 6.
  • 7. Perceptron (neural net with no hidden layers) Linearly separable data
  • 9. The linear separator with the largest margin is the best one to pick margin
  • 10. What if the data is not linearly separable?
  • 11. Kernel trick kernel Kernel implicitly maps from 2D to 3D, making problem linearly separable x 1 x 2 z 1 z 2 z 3
  • 12.
  • 13. Boosting Simple classifiers (weak learners) can have their performance boosted by taking weighted combinations Boosting maximizes the margin
  • 14.
  • 15.
  • 16.
  • 17.
  • 19. Principal Component Analysis (PCA) PCA seeks a projection that best represents the data in a least-squares sense. PCA reduces the dimensionality of feature space by restricting attention to those directions along which the scatter of the cloud is greatest.
  • 21. Combining supervised and unsupervised learning
  • 22. Discovering rules (data mining) Find the most frequent patterns (association rules) Num in household = 1 ^ num children = 0 => language = English Language = English ^ Income < $40k ^ Married = false ^ num children = 0 => education {college, grad school} HS MD PhD MA Educ. $30k $80k $20k $10k Income Retired Doctor Student Student Occup. 60 M F 30 M M 24 S F 22 S M Age Married Sex
  • 23.
  • 24. Discovering networks ? From data visualization to causal discovery
  • 25.
  • 26.
  • 27.
  • 29.
  • 30.
  • 31. Simple probabilistic model: linear regression Y Y =  +  X + noise Deterministic (functional) relationship X
  • 32. Simple probabilistic model: linear regression Y Y =  +  X + noise Deterministic (functional) relationship X “ Learning” = estimating parameters  ,  ,  from (x,y) pairs. Can be estimate by least squares Is the empirical mean Is the residual variance
  • 33. Piecewise linear regression Latent “switch” variable – hidden process at work
  • 34.
  • 35. Classes of graphical models Probabilistic models Graphical models Directed Undirected Bayes nets MRFs DBNs
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41. Viterbi decoding Y 1 Y 3 X 1 X 2 X 3 Y 2 Compute most probable explanation (MPE) of observed data Hidden Markov Model (HMM) “ Tomato” hidden observed
  • 42. Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
  • 43. Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs Many difference inference algorithms, both exact and approximate PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
  • 44.
  • 45.
  • 46.
  • 47.
  • 48. Score ­b ased Learning E B A E B A E B A Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
  • 49.
  • 50.
  • 51.
  • 52. Problems with local search S(G|D) Easy to get stuck in local optima “ truth” you
  • 53. Problems with local search II Picking a single best model can be misleading E R B A C P(G|D)
  • 54.
  • 55.
  • 56.
  • 57.
  • 58. Discovering latent variables a) 17 parameters b) 59 parameters There are some techniques for automatically detecting the possible presence of latent variables
  • 59.
  • 60.
  • 61.
  • 62.
  • 63. Learning from relational data Can we learn concepts from a set of relations between objects, instead of/ in addition to just their attributes?
  • 64.
  • 65. ILP for learning protein folding: input yes no TotalLength(D2mhr, 118) ^ NumberHelices(D2mhr, 6) ^ … 100 conjuncts describing structure of each pos/neg example
  • 66.
  • 67.
  • 68. The future of machine learning for bioinformatics? Oracle
  • 69.
  • 71. Decision trees blue? big? oval? no no yes yes
  • 72. Decision trees blue? big? oval? no no yes yes + Handles mixed variables + Handles missing data + Efficient for large data sets + Handles irrelevant attributes + Easy to understand - Predictive power
  • 73. Feedforward neural network input Hidden layer Output Weights on each arc Sigmoid function at each node
  • 74. Feedforward neural network input Hidden layer Output - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predicts poorly
  • 75.
  • 76. Nearest Neighbor ? - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predictive power
  • 77.
  • 78.
  • 79. Replace all inner products with kernels Kernel function
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87. Instances of graphical models Probabilistic models Graphical models Directed Undirected Bayes nets MRFs DBNs Hidden Markov Model (HMM) Naïve Bayes classifier Mixtures of experts Kalman filter model Ising model
  • 88.