Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Supervised Learning Algorithms - Analysis of different approaches

222 vues

Publié le

This talk is about - why AI/ML is so important today and how to follow an approach of doing Supervised Machine Learning algorithms from prototype to production - how to choose and transform the data, how to choose the model/algorithm and evaluate it so you have the best results in the future.

Publié dans : Sciences
  • Soyez le premier à commenter

Supervised Learning Algorithms - Analysis of different approaches

  1. 1. Supervised Learning Algorithms Analysis of Different approaches Evgeniy Marinov ML Consultant Philip Yankov x8academy
  2. 2. ML DefiniCon •  There are plenty of definiCons... •  Informal: The field of study that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959) •  Formal: A computer program is said to learn from experience E, with respect to some task T, and some performance measure P, if its performance on T as measured by P improves with experience E (Tom Mitchell, 1998).
  3. 3. From Wikipedia •  Machine learning is: – a subfield of computer science that evolved from the study of paRern recogniCon and in AI in the 1980s (ML is a separate field flourishing from the 1990s, first benefited from staCsCcs and then from the increasing availability of digiCzed informaCon at that Cme).
  4. 4. Why ML?
  5. 5. Why ML?
  6. 6. Key factors enabling ML growth today •  Cloud Compu)ng •  Internet of Things •  Big Data (+ Unstructured Data)
  7. 7. Why Data is so important?
  8. 8. Why Data is so important? •  Google Photos – Unlimited storage •  Google voice – OK, Google
  9. 9. Nowadays •  It is so easy to get data you need and to use an API or service of some company to experiment with them
  10. 10. Methods for collecCng data
  11. 11. Methods for collecCng data •  Download – Spreadsheet – Text •  API •  Crawling / scraping
  12. 12. Supervised Learning
  13. 13. Task Description
  14. 14. Pipeline
  15. 15. IniCal example
  16. 16. NotaCon
  17. 17. •  Asdasd •  Asdasd •  Asdasd •  Asdasd The regression funcCon f(x)
  18. 18. •  as •  as •  as
  19. 19. How to evaluate our model?
  20. 20. Pipeline
  21. 21. Assessing the Model Accuracy
  22. 22. Bias-variance trade-off
  23. 23. Bias-variance trade-off
  24. 24. Cross-validaCon
  25. 25. GeneralizaCon Error and Overfi`ng
  26. 26. Choosing a Model by data types of response
  27. 27. Pipeline
  28. 28. Data types and Generalized Linear model •  Simple and General linear models •  RestricCons of the linear model •  Data type of the response Y 1)  (General) Linear model R, Y ~ Gaussian(µ, σ^2) -- conCnuous 2)  LogisCc regression {0, 1}, Y ~ Bernoulli(p) -- binary data 3) Poisson regression {0, 1,...}, Y ~ Poisson(µ) -- counCng data
  29. 29. Simple and General linear models Simple: General:
  30. 30. Error of the General Linear model Click to add Text
  31. 31. RestricCons of Linear models Although the General linear model is a useful framework, it is not appropriate in the following cases: •  The range of Y is restricted (e.g. binary, count, posiCve/negaCve) •  Var[Y] depends on the mean E[Y] (for the Gaussian they are independent) Name Mean Variance Bernoulli(p) p p(1 - p) Binomial(p, n) np np(1 - p) Poisson(p) p p
  32. 32. Binary response Y – {0, 1} •  The Bernoulli(p) is discrete r.v. with two possible outcomes: •  p and q = 1 – p •  The parameter p does not change over Cme •  Bernoulli is building block for other more complicated distribuCons •  Examples: •  Coin flips {Heads, Tails} – if unbiased •  then p = 0.5 •  Click on Ad, Fail/Success on Exam
  33. 33. Generalized Linear model - IntuiCon
  34. 34. ExponenCal Family
  35. 35. General linear model
  36. 36. Binary Data
  37. 37. Modeling CounCng / Poisson Data
  38. 38. Maximizing the Log-Likelihood and Parameters esCmaCon
  39. 39. Preprocessing
  40. 40. Pipeline
  41. 41. Problems with feature types •  Big number of features -> Dimensionality reducCon -> SVD, PCA – Dimensionality reduc)on: “compress” the data from a high-dimensional representaCon into a lower-dimensional one (useful for visualizaCon or as an internal transformaCon for other ML algorithms) •  Sparse features -> Hashing
  42. 42. •  Instead of using two coordinates ( 𝒙, 𝒚) to describe point locaCons, let’s use only one coordinate (𝒛) •  Point’s posiCon is its locaCon along vector ​ 𝒗↓ 𝟏  •  How to choose ​ 𝒗↓ 𝟏 ? Minimize reconstruc)on error SVD – Dimensionality ReducCon v1 first right singular vector Movie 1 rating Movie2rating
  43. 43. SVD - Dimensionality ReducCon More details •  Q: How exactly is dim. reduc)on done? •  A: Set smallest singular values to zero 46 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 x x 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 ≈
  44. 44. SVD - Dimensionality ReducCon More details •  Q: How exactly is dim. reduc)on done? •  A: Set smallest singular values to zero 47 x x 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.59 0.65 0.07 -0.73 -0.67 0.07 -0.29 0.32 12.4 0 0 0 9.5 0 0 0 1.3 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.80 0.40 0.09 0.09 ≈
  45. 45. SVD - Dimensionality ReducCon More details •  Q: How exactly is dim. reduc)on done? •  A: Set smallest singular values to zero ≈ x x 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 0.41 0.07 0.55 0.09 0.68 0.11 0.15 -0.59 0.07 -0.73 0.07 -0.29 12.4 0 0 9.5 0.56 0.59 0.56 0.09 0.09 0.12 -0.02 0.12 -0.69 -0.69
  46. 46. ǁA-BǁF = √ Σij (Aij-Bij)2 is “small” SVD – Dimensionality ReducCon (PCA generalizaCon) More details •  Q: How exactly is dim. reduc)on done? •  A: Set smallest singular values to zero ≈ 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.92 0.95 0.92 0.01 0.01 2.91 3.01 2.91 -0.01 -0.01 3.90 4.04 3.90 0.01 0.01 4.82 5.00 4.82 0.03 0.03 0.70 0.53 0.70 4.11 4.11 -0.69 1.34 -0.69 4.78 4.78 0.32 0.23 0.32 2.01 2.01 Frobenius norm: ǁMǁF = √Σij Mij 2
  47. 47. Feature selection - example
  48. 48. Dummy Encoding
  49. 49. (De)MoCvaCon
  50. 50. SoluCon to those problems with features
  51. 51. Pipeline
  52. 52. Factorization Machine (degree 2)
  53. 53. General Applications of FMs
  54. 54. Summary Pipeline
  55. 55. Pipeline
  56. 56. From prototype to producCon •  Prototype vs ProducCon Cme? – model (pipeline) should stay the same
  57. 57. Libraries
  58. 58. QuesCons?
  59. 59. Thank you!!!
  60. 60. References •  hRps://www.coursera.org/learn/machine- learning •  hRp://www.cs.cmu.edu/~tom/ •  hRp://scikit-learn.org/stable/ •  hRp://www.scalanlp.org/ •  hRp://www.algo.uni-konstanz.de/members/ rendle/pdf/Rendle2010FM.pdf •  hRps://securityintelligence.com/factorizaCon- machines-a-new-way-of-looking-at-machine- learning/
  61. 61. References •  An IntroducCon to Generalized Linear Models – AnneRe Dobson, Adrian BarneR •  Applying Generalized Linear Models – James Lindsey •  hRps://www.codementor.io/jadianes/ building-a-recommender-with-apache-spark- python-example-app-part1-du1083qbw •  hRps://www.chrisstucchio.com/blog/ index.html

×