. An introduction to machine learning and probabilistic ...
1. An introduction to machine learning and probabilistic graphical models Kevin Murphy MIT AI Lab Presented at Intel’s workshop on “Machine learning for the life sciences”, Berkeley, CA, 3 November 2003
2.
3. Supervised learning yes no N Small Arrow Red Y Small Star Blue Y Small Square Blue Y Big Torus Blue Output Size Shape Color F(x1, x2, x3) -> t Learn to approximate function from a training set of (x,t) pairs
4. Supervised learning Learner Training data Hypothesis Testing data Prediction N S A R Y S S B Y S S B Y B T B T X3 X2 X1 ? S C Y ? S A B T X3 X2 X1 N Y T
19. Principal Component Analysis (PCA) PCA seeks a projection that best represents the data in a least-squares sense. PCA reduces the dimensionality of feature space by restricting attention to those directions along which the scatter of the cloud is greatest.
22. Discovering rules (data mining) Find the most frequent patterns (association rules) Num in household = 1 ^ num children = 0 => language = English Language = English ^ Income < $40k ^ Married = false ^ num children = 0 => education {college, grad school} HS MD PhD MA Educ. $30k $80k $20k $10k Income Retired Doctor Student Student Occup. 60 M F 30 M M 24 S F 22 S M Age Married Sex
31. Simple probabilistic model: linear regression Y Y = + X + noise Deterministic (functional) relationship X
32. Simple probabilistic model: linear regression Y Y = + X + noise Deterministic (functional) relationship X “ Learning” = estimating parameters , , from (x,y) pairs. Can be estimate by least squares Is the empirical mean Is the residual variance
41. Viterbi decoding Y 1 Y 3 X 1 X 2 X 3 Y 2 Compute most probable explanation (MPE) of observed data Hidden Markov Model (HMM) “ Tomato” hidden observed
42. Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
43. Inference: computational issues Easy Hard Chains Trees Grids Dense, loopy graphs Many difference inference algorithms, both exact and approximate PCWP CO HRBP HREKG HRSAT ERRCAUTER HR HISTORY CATECHOL SAO2 EXPCO2 ARTCO2 VENTALV VENTLUNG VENITUBE DISCONNECT MINVOLSET VENTMACH KINKEDTUBE INTUBATION PULMEMBOLUS PAP SHUNT MINOVL PVSAT PRESS INSUFFANESTH TPR LVFAILURE ERRBLOWOUTPUT STROEVOLUME LVEDVOLUME HYPOVOLEMIA CVP BP
44.
45.
46.
47.
48. Score b ased Learning E B A E B A E B A Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data E, B, A <Y,N,N> <Y,Y,Y> <N,N,Y> <N,Y,Y> . . <N,Y,Y>
49.
50.
51.
52. Problems with local search S(G|D) Easy to get stuck in local optima “ truth” you
53. Problems with local search II Picking a single best model can be misleading E R B A C P(G|D)
54.
55.
56.
57.
58. Discovering latent variables a) 17 parameters b) 59 parameters There are some techniques for automatically detecting the possible presence of latent variables
59.
60.
61.
62.
63. Learning from relational data Can we learn concepts from a set of relations between objects, instead of/ in addition to just their attributes?
64.
65. ILP for learning protein folding: input yes no TotalLength(D2mhr, 118) ^ NumberHelices(D2mhr, 6) ^ … 100 conjuncts describing structure of each pos/neg example
66.
67.
68. The future of machine learning for bioinformatics? Oracle
72. Decision trees blue? big? oval? no no yes yes + Handles mixed variables + Handles missing data + Efficient for large data sets + Handles irrelevant attributes + Easy to understand - Predictive power
74. Feedforward neural network input Hidden layer Output - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predicts poorly
75.
76. Nearest Neighbor ? - Handles mixed variables - Handles missing data - Efficient for large data sets - Handles irrelevant attributes - Easy to understand + Predictive power