Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Machine Learning for Dummies

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 42 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Publicité

Similaire à Machine Learning for Dummies (20)

Plus par Venkata Reddy Konasani (20)

Publicité

Plus récents (20)

Machine Learning for Dummies

  1. 1. Introduction to Machine Learning ThePhilosophy Advantages Applications Algorithms Venkat Reddy
  2. 2. The Learning 1 2 3 4 5 6 BigdataAnalytics VenkatReddy 2
  3. 3. Machine Learning • How did our brain process the images? • How did the grouping happen? • Human brain processed the given images - learning • After learning the brain simply looked at the new image and compared with the groups classified the image to the closest group - Classification • If a machine has to perform the same operation we use Machine Learning • We write programs for learning and then classification, this is nothing but machine learning BigdataAnalytics VenkatReddy 3
  4. 4. Machine Learning Learning algorithm TRAINING DATA Answer Trained machine Query BigdataAnalytics VenkatReddy 4
  5. 5. Example: Classification using ML • Image processing: • Machine learning can be used in classification Images & objects in an image • Ship • Water • Rock • Iron object • Fiber Object etc., • Does this really help? BigdataAnalytics VenkatReddy 5
  6. 6. Image processing Example BigdataAnalytics VenkatReddy 6
  7. 7. Machine learning application BigdataAnalytics VenkatReddy 7
  8. 8. The main advantage of ML • Learning and writing an algorithm • Its easy for human brain but it is tough for a machine. It takes some time and good amount of training data for machine to accurately classify objects • Implementation and automation • This is easy for a machine. Once learnt a machine can process one million images without any fatigue where as human brain can’t • That’s why ML with bigdata is a deadly combination BigdataAnalytics VenkatReddy 8
  9. 9. Applications of Machine Learning • Banking / Telecom / Retail • Identify: • Prospective customers • Dissatisfied customers • Good customers • Bad payers • Obtain: • More effective advertising • Less credit risk • Fewer fraud • Decreased churn rate BigdataAnalytics VenkatReddy 9
  10. 10. Applications of Machine Learning • Biomedical / Biometrics • Medicine: • Screening • Diagnosis and prognosis • Drug discovery • Security: • Face recognition • Signature / fingerprint / iris verification • DNA fingerprinting BigdataAnalytics VenkatReddy 10
  11. 11. Applications of Machine Learning • Computer / Internet • Computer interfaces: • Troubleshooting wizards • Handwriting and speech • Brain waves • Internet • Hit ranking • Spam filtering • Text categorization • Text translation • Recommendation BigdataAnalytics VenkatReddy 11
  12. 12. Main Parts in Machine Learning process • Any Machine Learning algorithm has three parts 1. The Output 2. The Objective Function or Performance Matrix 3. The Input Email Spam Classification • The Output: Categorize email messages as spam or legitimate. • Objective Function: Percentage of email messages correctly classified. • The Input: Database of emails, some with human-given labels Hand Writing Recognition • The Output: Recognizing hand-written words • Objective Function: Percentage of words correctly classified • The Input: Database of human-labeled images of handwritten words BigdataAnalytics VenkatReddy 12
  13. 13. Some Machine Learning Methods BigdataAnalytics VenkatReddy 13
  14. 14. Everything should be made as simple as possible, but no simpler BigdataAnalytics VenkatReddy 14
  15. 15. Bayes Network BigdataAnalytics VenkatReddy 15
  16. 16. The Bayes Method A(2%) B(90%) C(8%) 1% 60% 2% • A, B, C Generate bolts, given a machine what is the probability that it will generate a defective bolt. • Now given a defective bolt, can we tell which machine has generated it? BigdataAnalytics VenkatReddy 16
  17. 17. Several Machine in Several Layers • Several intermediate machines • Each has its on error chance, This problem requires a Bayesian network to solve • Does C has anything to do if the bolt is finally coming out of F? A(20%) B(50%) C(30%) 1% 5% 2% D(60%) E(40%) F G H BigdataAnalytics VenkatReddy 17
  18. 18. Artificial Neural Networks BigdataAnalytics VenkatReddy 18
  19. 19. The Philosophy of Neural Network • If we have to predict whether we win in a chess game • Can we use logistic regression? • Since there can be almost infinite number of intermediate steps its almost impossible to predict the probability of winning • If each move is considered as an input variable then there are almost (countable)infinite moves. A coefficient for each of them is always negligible. Neural network uses a simple technique, instead of predicting the final output they predict the intermediate output that lead to a win previously, this will be an input for predicting the next move BigdataAnalytics VenkatReddy 19
  20. 20. Biological Motivation for NN • Aartificial neural networks are built out of a densely interconnected set of simple units, where each unit takes a number of real-valued inputs (possibly the outputs of other units) and produces a single real-valued output (which may become the input to many other units). • The human brain is estimated to contain a densely interconnected network of approximately 1011 neurons, each connected, on average, to 104 others. • Neuron activity is typically inhibited through connections to other neurons. BigdataAnalytics VenkatReddy 20
  21. 21. ANN in Self Driving Car BigdataAnalytics VenkatReddy 21
  22. 22. PCA and FA BigdataAnalytics VenkatReddy 22
  23. 23. Look at below Cricket Team Players Data Player Avg Runs Total wickets Height Not outs Highest Score Best Bowling 1 45 3 5.5 15 120 1 2 50 34 5.2 34 209 2 3 38 0 6 36 183 0 4 46 9 6.1 78 160 3 5 37 45 5.8 56 98 1 6 32 0 5.10 89 183 0 7 18 123 6 2 35 4 8 19 239 6.1 3 56 5 9 18 96 6.6 5 87 7 10 16 83 5.9 7 32 7 11 17 138 5.10 9 12 6 BigdataAnalytics VenkatReddy 23
  24. 24. Describe the players • If we have to describe or segregate the players do we really need Avg Runs, Total wickets, Height, Not outs, Highest Score, Best bowling? • Can we simply take • Avg Runs+ Not outs+ Highest Score as one factor? • Total wickets+ Height+ Best bowling as second factor? Defining these imaginary variables or a linear combination of variables to reduce the dimensions is called PCA or FA BigdataAnalytics VenkatReddy 24
  25. 25. Look at below Cricket Team Players Data Player Avg Runs Total wickets Height Not outs Highest Score Best Bowling 1 45 3 5.5 15 120 1 2 50 34 5.2 34 209 2 3 38 0 6 36 183 0 4 46 9 6.1 78 160 3 5 37 45 5.8 56 98 1 6 32 0 5.10 89 183 0 7 18 123 6 2 35 4 8 19 239 6.1 3 56 5 9 18 96 6.6 5 87 7 10 16 83 5.9 7 32 7 11 17 138 5.10 9 12 6 BigdataAnalytics VenkatReddy 25
  26. 26. SVM Classification BigdataAnalytics VenkatReddy 26
  27. 27. Linear Classifier Consider a two dimensional dataset with two classes How would we classify this dataset? BigdataAnalytics VenkatReddy 27
  28. 28. Linear Classifier Both of the lines can be linear classifiers. BigdataAnalytics VenkatReddy 28
  29. 29. Linear Classifier There are many lines that can be linear classifiers. Which one is the optimal classifier. BigdataAnalytics VenkatReddy 29
  30. 30. Classifier Margin Define the margin of a linear classifier as the width that theboundary could be increased by before hitting a datapoint. BigdataAnalytics VenkatReddy 30
  31. 31. Maximum Margin • The maximum margin linear classifier is the linear classifier • with the maximum margin. • This is the simplest kind of SVM (Called Linear SVM) BigdataAnalytics VenkatReddy 31
  32. 32. Support Vectors • Examples closest to the hyper plane are support vectors. • Margin ρ of the separator is the distance between support vectors. ρw . x + b > 0 w . x + b < 0 w . x + b = 0 Support Vectors f(x) = sign(w . x + b) red is +1 blue is -1 BigdataAnalytics VenkatReddy 32
  33. 33. Difference Traditional Models & Machine Learning • ML is more heuristic • Focused on improving performance of a learning agent • Also looks at real-time self learning and robotics – areas not part of data mining • Some algorithms are too heuristic that there is no one right or wrong answer • Lets take K-Means example BigdataAnalytics VenkatReddy 33
  34. 34. K-Means clustering Overall population BigdataAnalytics VenkatReddy 34
  35. 35. K-Means clustering Fix the number of clusters BigdataAnalytics VenkatReddy 35
  36. 36. K-Means clustering Calculate the distance of each case from all clusters BigdataAnalytics VenkatReddy 36
  37. 37. K-Means clustering Re calculate the cluster centers BigdataAnalytics VenkatReddy 37
  38. 38. K-Means clustering Continue till there is no significant change between two iterations BigdataAnalytics VenkatReddy 38
  39. 39. Calculating the distance Weight Cust1 68 Cust2 72 Cust3 100 Weight Age Cust1 68 25 Cust2 72 70 Cust3 100 28 Weight Age Income Cust1 68 25 60,000 Cust2 72 70 9,000 Cust3 100 28 62,000 Which of the two customers are similar? Which of the two customers are similar now? Which two of the customers are similar in this case? BigdataAnalytics VenkatReddy 39
  40. 40. Distance Measures • Euclidean distance • City-block (Manhattan) distance • Chebychev similarity • Minkowski distance • Mahalanobis distance • Maximum distance • Cosine similarity • Simple correlation between observations • Minimum distance • Weighted distance • Venkat Reddy distance • Not sure all these measures will result in same clusters in the above example. • So, same ML algorithms with different results. Generally this is not the case with conventional methods BigdataAnalytics VenkatReddy 40
  41. 41. Thank you BigdataAnalytics VenkatReddy 41
  42. 42. More .. BigdataAnalytics VenkatReddy 42 http://www.slideshare.net/21_venkat/presentations https://www.facebook.com/groups/SASanalysts/

×