Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Introduction to Data Science and Large-scale Machine Learning

Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Publicité
Chargement dans…3
×

Consultez-les par la suite

1 sur 271 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Les utilisateurs ont également aimé (20)

Publicité

Similaire à Introduction to Data Science and Large-scale Machine Learning (20)

Publicité

Introduction to Data Science and Large-scale Machine Learning

  1. 1. Large Scale Distributed Data Science using Spark © KDD2015James G. Shanahan Contact:James.Shanahan @ gmail.com 1 November 16, 2016 How Data and Data Science are revolutionizing the world James G. Shanahan1,2 1IoTGurus., 2iSchool UC Berkeley, CA, EMAIL: James_DOT_Shanahan_AT_gmail_DOT_com
  2. 2. Large Scale Distributed Data Science using Spark © KDD2015James G. Shanahan Contact:James.Shanahan @ gmail.com 2 Outline • Introduction • Artificial Intelligence • Machine Learning – Emperical Sport – NetFlix – Dashboards • Data Science • Applications • Architecture • What’s next?
  3. 3. Large Scale Distributed Data Science using Spark © KDD2015James G. Shanahan Contact:James.Shanahan @ gmail.com 3 James G. Shanahan 25+ years in data science Systems,Parallel Computing, Hadoop, Spark, Python, R, Scala,Java Digital Advertising & Marketing, Web+mobile+local Search, Anticipatory info. systems, Cellular Networks, Social Networks Statistics, Optimization Theory, Probability Social Network Analytics, Geo-InformationalScience, HCI, Graphs, NLP Math&Theory Domain Expertise Led teams of R&D,r&D Xerox Research,AT&T, Turn, NativeX, Adobe Entrepreneur Teach at UC Berkeley Technology 16+ 25+ 25+years 16+ Leadership, Business Acumen, Teacher
  4. 4. Large Scale Distributed Data Science using Spark © KDD2015James G. Shanahan Contact:James.Shanahan @ gmail.com 4 James G. Shanahan • 25+ years in data science • Currently – Principal and Founder, Data Science Consultancy • Clients: Target, Adobe, Akamai, Ancestry, AT&T, Nokia Siemens, SearchMe, … – Teaching • Co-creator of UC Berkeley MIDS program; curriculum development • Teach Large Scale Machine Learning (Fall 2014,2015,2016) • Teach Machine Learning and Optimization Theory at University of California Santa Cruz (UCSC), TIM 206, TIM 209, TIM 250, TIM 251 (since 2008) – Advising: Quixey, InferSystems, Knotch • Previously – NativeX: SVP of Data Science, Chief Scientist, and board member – Founding Chief Scientist, Turn Inc. – Principal Scientist, Clairvoyance Corp (CMU spinoff; sister lab to JRC) – Research Scientist, Xerox Research; – Entrepreneur: Cofounder of Document Souls and RTB Fast • Education: PhD in ML, University of Bristol, UK; B.Sc. CS, Uni. of Limerick, Ireland
  5. 5. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 5 Audience Participation is encouraged!
  6. 6. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 6 Outline • Introduction • Artificial Intelligence • Machine Learning • Data Science • Applications • What’s next?
  7. 7. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 7 Data science everywhere • .
  8. 8. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 8 Traditional Data Science • ..
  9. 9. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 9 Deep Learning • ..
  10. 10. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 10 • ..
  11. 11. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 11 What is Intelligence? • Intelligence: – “the capacity to learn and solve problems” (Websters dictionary) – in particular, • the ability to solve novel problems • the ability to act rationally • the ability to act like humans • Artificial Intelligence – build and understand intelligent entities or agents – 2 main approaches: “engineering” versus “cognitive modeling”
  12. 12. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 13 What’s involved in Intelligence? • Ability to interact with the real world – to perceive, understand, and act – e.g., speech recognition and understanding and synthesis – e.g., image understanding – e.g., ability to take actions, have an effect • Reasoning and Planning – modeling the external world, given input – solving new problems, planning, and making decisions – ability to deal with unexpected problems, uncertainties • Learning and Adaptation – we are continuously learning and adapting – our internal models are always being “updated” • e.g., a baby learning to categorize and recognize animals
  13. 13. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 14 Can machines think?  Turing Test • In the test, an interrogator converses with a man and a machine via a text-based channel. – If the interrogator fails to guess which one is the machine, then the machine is said to have passed the Turing test. (This is a simplification; there are more nuances in and variants of the Turing test, but these are not relevant for our present purposes.) • The beauty of the Turing test is its simplicity and its objectivity, because it is only a test of behavior, not of the internals of the machine. It doesn't care whether the machine is using logical methods or neural networks. This decoupling of what to solve from how to solve is an important theme in this class.
  14. 14. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 15 • ..
  15. 15. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 16 What AI can do for you? • . Instead of asking what AI is, let us turn to the more pragmatic question of what AI can do for you. We will go through some examples where AI has already had a substantial impact on society.
  16. 16. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 17 Academic Disciplines relevant to AI • Philosophy Logic,methodsof reasoning,mind as physical system,foundations oflearning,language, rationality. • Mathematics Formalrepresentation and proof,algorithms, computation,(un)decidability,(in)tractability • Probability/Statistics modeling uncertainty,learning from data • Economics utility, decisiontheory,rationaleconomic agents • Neuroscience neuronsas information processingunits. • Psychology/ how do people behave,perceive,process cognitive Cognitive Science information, representknowledge. • Computer building fastcomputers engineering • Controltheory design systems thatmaximize an objective function over time • Linguistics knowledgerepresentation,grammars
  17. 17. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 18 History of AI • 1943: early beginnings – McCulloch & Pitts: Boolean circuit model of brain • 1950: Turing – Turing's "Computing Machinery and Intelligence“ • 1956: birth of AI – Dartmouth meeting: "Artificial Intelligence“ name adopted • 1950s: initial promise – Early AI programs, including – Samuel's checkers program – Newell & Simon's Logic Theorist • 1955-65: “great enthusiasm” – Newell and Simon: GPS, general problem solver – Gelertner: Geometry Theorem Prover – McCarthy: invention of LISP
  18. 18. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 19 History of AI • 1966—73: Reality dawns – Realization that many AI problems are intractable – Limitations of existing neural network methods identified • Neural network research almost disappears • 1969—85: Adding domain knowledge – Development of knowledge-based systems – Success of rule-based expert systems, • E.g., DENDRAL, MYCIN • But were brittle and did not scale well in practice • 1986-- Rise of machine learning – Neural networks return to popularity – Major advances in machine learning algorithms and applications • 1990-- Role of uncertainty – Bayesian networks as a knowledge representation framework • 1995-- AI as Science – Integration of learning, reasoning, knowledge representation – AI methods used in vision, language, data mining, etc
  19. 19. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 20 • .. http://www.andreykurenkov.com/writing/images/2 016-4-15-a-brief-history-of-game-ai/0-history.png
  20. 20. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 21 Success Stories • Deep Blue defeated the reigning world chess champion Garry Kasparov in 1997 • AI program proved a mathematical conjecture (Robbins conjecture) unsolved for decades • During the 1991 Gulf War, US forces deployed an AI logistics planning and scheduling program that involved up to 50,000 vehicles, cargo, and people • NASA's on-board autonomous planning program controlled the scheduling of operations for a spacecraft • Proverb solves crossword puzzles better than most humans
  21. 21. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 22 Can Computers beat Humans at Chess? • Chess Playing is a classic AI problem – well-defined problem – very complex: difficult for humans to play well 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000 1966 1971 1976 1981 1986 1991 1997 Ratings Human World Champion Deep Blue Deep Thought PointsRatings
  22. 22. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 23 Summary of State of AI Systems in Practice • Speech synthesis, recognition and understanding – very useful for limited vocabulary applications – unconstrained speech understanding is still too hard • Computer vision – works for constrained problems (hand-written zip-codes) – understanding real-world, natural scenes is still too hard • Learning – adaptive systems are used in many applications: have their limits • Planning and Reasoning – only works for constrained problems: e.g., chess – real-world is too complex for general systems • Overall: – many components of intelligent systems are “doable” – there are many interesting research problems remaining
  23. 23. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 24 Can Computers Talk? • This is known as “speech synthesis” – translate text to phonetic form • e.g., “fictitious” -> fik-tish-es – use pronunciation rules to map phonemes to actual sound • e.g., “tish” -> sequence of basic audio sounds • Difficulties – sounds made by this “lookup” approach sound unnatural – sounds are not independent • e.g., “act” and “action” • modern systems (e.g., at AT&T) can handle this pretty well – a harder problem is emphasis, emotion, etc • humans understand what they are saying • machines don’t: so they sound unnatural • Conclusion: – NO, for complete sentences – YES, for individual words
  24. 24. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 25 Can Computers Recognize Speech? • Speech Recognition: – mapping sounds from a microphone into a list of words – classic problem in AI, very difficult • “Lets talk about how to wreck a nice beach” • (I really said “________________________”) • Recognizing single words from a small vocabulary • systems can do this with high accuracy (order of 99%) • e.g., directory inquiries – limited vocabulary (area codes, city names) – computer tries to recognize you first, if unsuccessful hands you over to a human operator – saves millions of dollars a year for the phone companies
  25. 25. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 26 Recognizing human speech (ctd.) • Recognizing normal speech is much more difficult – speech is continuous: where are the boundaries between words? • e.g., “John’s car has a flat tire” – large vocabularies • can be many thousands of possible words • we can use context to help figure out what someone said – e.g., hypothesize and test – try telling a waiter in a restaurant: “I would like some dream and sugar in my coffee” – background noise, other speakers, accents, colds, etc – on normal speech, modern systems are only about 60-70% accurate • Conclusion: – NO, normal speech is too complex to accurately recognize – YES, for restricted problems (small vocabulary, single speaker)
  26. 26. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 27 Can Computers Understand speech? • Understanding is different to recognition: – “Time flies like an arrow” • assume the computer can recognize all the words • how many different interpretations are there?
  27. 27. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 28 Can Computers Understand speech? • Understanding is different to recognition: – “Time flies like an arrow” • assume the computer can recognize all the words • how many different interpretations are there? – 1. time passes quickly like an arrow? – 2. command: time the flies the way an arrow times the flies – 3. command: only time those flies which are like an arrow – 4. “time-flies” are fond of arrows
  28. 28. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 29 Can Computers Understand speech? • Understanding is different to recognition: – “Time flies like an arrow” • assume the computer can recognize all the words • how many different interpretations are there? – 1. time passes quickly like an arrow? – 2. command: time the flies the way an arrow times the flies – 3. command: only time those flies which are like an arrow – 4. “time-flies” are fond of arrows • only 1. makes any sense, – but how could a computer figure this out? – clearly humans use a lot of implicit commonsense knowledge in communication • Conclusion: NO, much of what we say is beyond the capabilities of a computer to understand at present
  29. 29. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 30 Can Computers Learn and Adapt ? • Learning and Adaptation – consider a computer learning to drive on the freeway – we could teach it lots of rules about what to do – or we could let it drive and steer it back on course when it heads for the embankment • systems like this are under development (e.g., Daimler Benz) • e.g., RALPH at CMU – in mid 90’s it drove 98% of the way from Pittsburgh to San Diego without any human assistance – machine learning allows computers to learn to do things without explicit programming – many successful applications: • requires some “set-up”: does not mean your PC can learn to forecast the stock market or become a brain surgeon • Conclusion: YES, computers can learn and adapt, when presented with information in the appropriate way
  30. 30. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 31 • Recognition v. Understanding (like Speech) – Recognition and Understanding of Objects in a scene • look around this room • you can effortlessly recognize objects • human brain can map 2d visual image to 3d “map” • Why is visual recognition a hard problem? • Conclusion: – mostly NO: computers can only “see” certain types of objects under limited circumstances – YES for certain constrained problems (e.g., face recognition) Can Computers “see”?
  31. 31. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 32 Can computers plan and make optimal decisions? • Intelligence – involves solving problemsand making decisionsand plans – e.g., you want to take a holiday in Brazil • you need to decide on dates, flights • you need to get to the airport, etc • involves a sequence of decisions,plans, and actions • What makes planning hard? – the world is not predictable: • your flight is canceled or there’s a backup on the 405 – there are a potentially huge number of details • do you considerall flights? all dates? – no: commonsenseconstrains your solutions – AI systems are only successfulin constrained planning problems • Conclusion: NO, real-world planning and decision-making is still beyond the capabilities of modern computers – exception:very well-defined,constrained problems
  32. 32. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 33 Summary of State of AI Systems in Practice • Speech synthesis, recognition and understanding – very useful for limited vocabulary applications – unconstrained speechunderstanding is still too hard • Computer vision – works for constrained problems (hand-written zip-codes) – understanding real-world, natural scenes is still too hard • Learning – adaptive systems are used in many applications:have their limits • Planning and Reasoning – only works for constrained problems:e.g.,chess – real-world is too complexfor general systems • Overall: – many components of intelligent systems are “doable” – there are many interesting research problemsremaining
  33. 33. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 34 • ..
  34. 34. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 35 • ..
  35. 35. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 36 • . Separate what to compute (modeling) from how to compute it (algorithms)
  36. 36. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 37 Lecture Outline • Introduction • Artificial Intelligence • Machine Learning • Data Science • Applications • What’s next?
  37. 37. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 38 • .
  38. 38. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 39 What is machine learning? • ,,
  39. 39. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 40 machine learning • Supporting all of these models is machine learning. • In the non-machine learning approach, one would write a complex program (remember, we are solving tasks of significant complexity), but this gets very tedious. – For example, how should a spellcheckerknow that for "hte", "the" (transposition) is more likely to be the correctoutput as compared to "hate" (insertion)? • The machine learning approach is to instead write a really simple program with unknown parameters (e.g., numbers measuring how bad it is to transpose or insert characters). • Then, we obtain a set of training examples that partially specifies the desired system behavior. A learning algorithm takes these training examples and sets the parameters of our simple program so that the resulting program approximately produces the desired system behavior. • Abstractly, machine learning allows us to shift the complexity from the program to the data, which is much easier to obtain (either naturally occurring or via crowdsourcing).
  40. 40. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 41 Equation of a line y = mx +b f(x) = mx +b
  41. 41. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 42 Machine Learning in one slide • Machine learning, a branch of artificial intelligence, is a scientific discipline that is concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases. • A learner can take advantage of examples (data) to capture characteristics of interest of their unknown underlying probability distribution. Data can be seen as examples that illustrate relations between observed variables. • A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data; the difficulty lies in the fact that the set of all possible behaviors given all possible inputs is too large to be covered by the set of observed examples (training data). • Hence the learner must generalize from the given examples, so as to be able to produce a useful output in new cases. Machine learning, like all subjects in artificial intelligence, require cross- disciplinary proficiency in several areas, such as probability theory, statistics, pattern recognition, cognitive science, data mining, adaptive control, computational neuroscience and theoretical computer science.
  42. 42. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 43 What is the Learning Problem? • Improve over Task T • with respect to performance measure P • based on experience E Learning = Improving with experience at some task
  43. 43. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 44 Types of Learning • Supervised learning - Generates a function that mapsinputs to desired outputs. For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function. • Unsupervised learning - Models a set of inputs: like clustering • Semi-supervised learning - Combines both labeled and unlabeled examples to generate an appropriate function or classifier. • Reinforcement learning - Learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm. Transduction - Tries to predict new outputs based on training inputs, training outputs, and test inputs.
  44. 44. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 45 Supervised Learning :Regression • Regression – Linear Regression • Classification – Logistic Regression • Generalized Linear Models (GLMs) – Broader family of models (that subsume Linear Regression and logistic regress and more – In R checkout ?glm() Parametric Approaches vs. Non-parametric Convex/Concave Discriminative versus generative
  45. 45. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 46 Classification versus Regression • Classification is just like a regression problem, except where the values of y that we now want to predict take on only a small number of discrete values (assume no order in y) • Binary logistic regression – For now let’s focus on binary classification where y can take on two values 0 and 1 (can be generalized to multi-class case) • E.g., building an ancestor class; a person is an ancestor (where y might take the value of 1) or not (y=0). – Given Xi the corresponding yi is AKA the label for the training data
  46. 46. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 47 • Generative Classifier (Bottom-up learning) – Build model of each class – Assume the underlying form of the classes and estimate their parameters (e.g., a Gaussian) • Discriminative Classifier (Top down) – Build model of boundary between classes – Assume the underlying form of the discriminant and estimate its parameters (e.g., a hyperplane) Families of Supervised Learning Sports Arts Business Health Sports Arts BusinessHealth
  47. 47. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 48 Terminology: linear regression Predicted Predictor variables Response variable Explanatory variables Outcomevariable Covariables Dependent Independent variables ...1 nn2210 xwxwxwwy  Wi are the model coefficients Xi’sy Y-intercept/threshold
  48. 48. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 49 Pr(Click): Advertising Problem • Predict Pr(Click|dwellTimeOnWebpage) – at the times 1, 2, 3, 4, and 5 seconds after loading the page. • Graph each data point with time on the x-axis and CTR on the y-axis. Your data should follow a straight line. • Use locator() to input data • Find the equation of this line. # x y% 1 1 2 . 2 3 . 3 7 . 4 8 m 5 9 F(x) x X are features, aka variables, continuous, discrete, ordinal ( X  n )
  49. 49. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 50 Least Square Fit Approximations Suppose we want to fit the data set. We would like to find the best straight line to fit the data? # x y 1 1 2 . 2 3 . 3 7 . 4 8 m 5 9
  50. 50. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 51 Fit a line based on… • If we assume that the first two points are correct and choose the line that goes through them, we get the line y = 1 + x. • If we substitute our points (x-values) into this equation, we get the following chart. • How good is this line? – The sum of the squares of the errors is 27. SSE = 27 Do you think that we can do better than this?
  51. 51. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 52 Linear Model More Generally • E.g., y=mx+b can be more generally seen a function of the form • Here the W’s are the parameters (also called weights) parametering the space of linear function mapping from X  Y=F(x) # X0 x1 y 1 1 1 2 . 1 2 3 . 1 3 7 . 1 4 8 m 1 5 9        n i T ii n i T ii Xxxxfy W XWxw xwxwxxfy 1 10 1 110010 ),( ofinsteaduseSometimes ),(  mslope  x1 F(x) b
  52. 52. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 53 Types of Learning • Supervised learning - Generates a function that mapsinputs to desired outputs. For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function. • Unsupervised learning - Models a set of inputs: like clustering • Semi-supervised learning - Combines both labeled and unlabeled examples to generate an appropriate function or classifier. • Reinforcement learning - Learns how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm. Transduction - Tries to predict new outputs based on training inputs, training outputs, and test inputs.
  53. 53. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 54 Machine Learning Background Machine Learning (ML):”a computer program that improves its performance at some task through experience” [Mitchell 1997] GIVEN: Input data is a table of attribute values and associated class values (in the case of supervised learning) GOAL: Approximate f(x1,…,xn)->y InstanceAttr x1 x2 … xn y 1 3 0 .. 7 -1 2 +1 … … … … … … L (aka m) 0 4 ... 8 -1 Y is categorical
  54. 54. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 55 Machine Learning: Regression Machine Learning (ML):”a computer program that improves its performance at some task through experience” [Mitchell 1997] GIVEN: Input data is a table of attribute values and associated class values (in the case of supervised learning) GOAL: Approximate f(x1,…,xn)->y InstanceAttr x1 x2 … xn y 1 3 0 .. 7 73 2 76 … … … … … … L (aka m) 0 4 ... 8 97 Y is real valued
  55. 55. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 56 Machine Learning semi-supervised Machine Learning (ML):”a computer program that improves its performance at some task through experience” [Mitchell 1997] GIVEN: Input data is a table of attribute values and associated class values (in the case of supervised learning) GOAL: Approximate f(x1,…,xn)->y InstanceAttr x1 x2 … xn y 1 3 0 .. 7 73 2 76 … … … … … … L (aka m) 0 4 ... 8 97 Y is only partially available
  56. 56. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 57 Machine Learning Unsupervised Machine Learning (ML):”a computer program that improves its performance at some task through experience” [Mitchell 1997] GIVEN: Input data is a table of attribute values and associated class values (in the case of supervised learning) GOAL: Approximate f(x1,…,xn)->y InstanceAttr x1 x2 … xn y 1 3 0 .. 7 73 2 76 … … … … … … L (aka m) 0 4 ... 8 97 Y is not available
  57. 57. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 58 • Generative Classifier (Bottom-up learning) – Build model of each class – Assume the underlying form of the classes and estimate their parameters (e.g., a Gaussian) • Discriminative Classifier (Top down) – Build model of boundary between classes – Assume the underlying form of the discriminant and estimate its parameters (e.g., a hyperplane) Families of Supervised Learning Sports Arts Business Health Sports Arts BusinessHealth
  58. 58. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 59 Generative vs. Discriminative • Generative learning (e.g., Bayesian Networks, HMM, Naïve Bayes, EM GMM) typically more flexible – More complex problems – More flexible predictions • Discriminative learning (e.g., ANN, SVM) typically more accurate – Better with small datasets – Faster to train
  59. 59. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 60 Parametric vs. Non-Parametric ML Algorithms • Parametric ML Algorithms (e.g., OLS, Decision Trees; SVMs, NNs) – Model-based methods, such as neural networks and the mixture of Gaussians, use the data to build a parameterized model. After training, the model is used for predictions and the data are generally discarded. • Non-Parametric (lowess(); knn; some flavours of SVMs) – In contrast, ``memory-based'' methods are non-parametric approaches that explicitly retain the training data, and use it each time a prediction needs to be made. – The term “non-parametric” (roughly) refers to the fact that the amount of stuff we need to keep in order to represent the hypothesis/model grows linearly with the size of the training set.
  60. 60. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 61 Linear Model: Ordinary Least Squares • How do we pick, or learn, the parameters W (aka θ)? • One reasonable method seems to be to make f(x) close to y, at least for the training examples. • To formalize, let’s define a function that measures, for each possible model/hypothesis, W, how close fθ(xi)’s are to the corresponding yi ’s: • Sum of squared error • AKA Residual Sum of Squares (Residual squared) Measuring Quality    m i ii yWXWJ 1 2 2 1 )( This error minimization is going to have problems?   m i ii yWXWJ 1 )( Residual sum of squares
  61. 61. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 62 Residual 0 10 20 30 40 50 60 0 2 4 6 8 10 12 14 16 x y Residuali  ii yWX i Residual
  62. 62. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 63 Which Line is it anyway? • Select another two points and build a line • If we choose the line that goes through the points when x = 3 and 4, we get the line y = 4 + x. Will we get a better fit? Let's look at it. SSE = 18. Getting better but can we do better?
  63. 63. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 64 Can we do better than guesswork? • Let's try the line that is half way between these two lines. The equation would be y = 2.5 + x. • Is there a more scientific or efficient way than guessing at which line would give the best fit. – Surely there is a methodical way to determine the best fit line. Let's think about what we want. SSE = 11.25. Getting better but can we do better?
  64. 64. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 65 Hypothesis Space of Linear Models • Here the W’s are the parameters (also called weights) parameterizing the space of linear function mapping from X  Y = f(X) • Augment Training Data with dummy intercept variable (simplifies notation and modeling) # X0 x1 y 1 1 1 2 . 1 2 3 . 1 3 7 . 1 4 8 m 1 5 9        n i T ii n i T ii Xxxxfy W XWxw xwxwxxfy 1 10 1 110010 ),( ofinsteaduseSometimes ),(  
  65. 65. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 66 Space of Hypotheses: Weights example.OLS_Heatmap() • Each model is in our case a coefficient for the y-intercept (bias) and a coefficient for the feature-variable (time) • Plot weight-space in 2D where the third dimesion is the error • Select combination that minimizes the sum of square error HeatMap with isolines overlayed 3D error surface z=log(w0+w1x)    m i iii yXWWJ 1 2 2 1 )(
  66. 66. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 67 Hyperplanes partition the input space(a line in a 2 input variable problem) and do NOT predict real values • Many methods in machine learning are based on finding parameters that minimize some objective function. • Very often, the objective function is a weighted sum of two terms: • a cost function and regularization term. • In statistics terms the (log-)likelihood and (log-)prior. – If both of these components are convex, then their sum is also convex. – Loss functions are summed over examples so the sum of a convex functions is a convex function Minimize Residuals Given a linear regression model W, Please type in the loss function for linear regression y= f(X1) where y is real-valued y= f(X1, X2) where y is in {0,1} or {-1, 1}. Y=mX +b y= f(X1) X1 X2 Y X1 Separating hyperplanepartitions AX1 + BX2 + C Class(X1, X2) = sign(AX1 + BX2 + C) Prediction Line y=mX +b Prediction(X1) =mX Partitions versus predicts
  67. 67. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 68 Unsupervised Learning (Clustering) Input data We want 3 clusters,red, green and blue
  68. 68. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 69 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster Let’s compute the center of those points
  69. 69. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 70 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster We can use the meanon each dimension
  70. 70. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 71 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster We can use the meanon each dimension
  71. 71. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 72 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster We can use the meanon each dimension
  72. 72. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 73 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster But the meanhastroublewith outliers
  73. 73. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 74 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Center of a cluster Using the median on each dimension is more robust
  74. 74. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 75 UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA SCHOOL OF SOFTWARE ENGINEERING OF USTC Assignment All points coloured properly already ⇒ wearedone !
  75. 75. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 76 Three generations of machine learning • First generation: dataset that fits in memory – Single node learning summary statistics and some batch modeling (at small scale); SQL, R – Down sampling the data • Second generation: General purpose clusters and frameworks – Distributed frameworks that allows us to divide and conquer problems – Learning using general purpose frameworks such as hadoop big data analysis offline, realtime decision making, homegrown specialist systems (Hadoop for analysis and modeling; ), Hadoop, R – In-house purpose built systems; specialist sport • Third generation: Purpose-built libraries and frameworks – Built for iterative algorithms that are common place in ML – huge scale realtime analysis and decision making systems – Specialized frameworks for large scale manipulation the type of data you are workign with. – For example, Machine learning libraries like MLLib in Spark, graph processing libraries like Apache Giraph or GraphX in Spark
  76. 76. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 77 Evolution of Map-Reduce frameworks for big data processing mid 90s Jimi’s PhD First generation 2nd generation 2015 Spark 1.5 As of 10/2015Spark 1.0 3rd generation Hadoop V2.0
  77. 77. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 78 Top 10 ML Algorithms • .. https://www.dezyre.com/article/top-10- machine-learning-algorithms/202 Naïve Bayes Classifier K Means Clustering Algorithm Nearest Neighbours Apriori Algorithm Linear Regression Logistic Regression Support Vector Machine Decision Trees Ensembles/Forests Artificial Neural Networks/Deep Learning Reinforcement learning Forecasting Many more!
  78. 78. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 79 • .. 2005
  79. 79. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 80 Lecture Outline • Introduction • Artificial Intelligence • Machine Learning • Data Science • Applications • What’s next?
  80. 80. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 81 Internet companies started the revolution • ..
  81. 81. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 82 Internet companies started the revolution • .. But more traditional companies are leveraging their data and DS Tech
  82. 82. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 83 Data Analysis Has Been Around for a While R.A. Fisher Howard Dresner Peter Luhn W.E. Demming 2012: Deep Learning 2013: Spark 1997 Google
  83. 83. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 84 Data Science DS Skillset • Linear regression, DT models for domain experts Domain Expertise A venn diagram with a Danger Bearing [adapted from Drew Conway]
  84. 84. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 85 Data Science Technology Hadoop, Spark,Python, Scala, Java, R Digital Advertising & Marketing, Econometrics, Web Search, Cellular Networks, Social Networks Statistics, Optimization Theory, Social Network Analytics, Geo-Informational Science Math Domain Expertise Mobile Advertising Adapted from Drew Conway’s Venn diagram of data science DS
  85. 85. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 86 Data Scientist Technology Hadoop, Spark,Python, Scala, Java, R Digital Advertising & Marketing, Econometrics, Web Search, Cellular Networks, Social Networks Statistics, Optimization Theory, Social Network Analytics, Geo-Informational Science MathDomain Expertise Mobile Advertising Communication DS
  86. 86. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 87 • .. RockStars and Super Models Technology Math Domain expertise RockStar
  87. 87. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 88 Data Analytics at Scale Algorithms: Machine Learning and Analytics, Representation, Vizualization Big Data: human-centric, M2M, IoT Machines: Cloud Computing Storage and compute Frameworks: MapReduce,HDFS, Hadoop, Spark, MPI Security/Privacy Data Analytic sat Scale
  88. 88. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 89 DS is Systems + Theory + Verticals • .. http://support.sas.com/resources/papers/proceedings14/SAS313- 2014.pdf Systems - NoSQL - Hadoop - Spark - MPIVerticals - Advertising - Voting - Sports - Autonomous Agents - Healtcare - Education Theory Visualization Legal
  89. 89. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 90 1 2 Understand domain, Collect requirements Exploratorydata analysis Modeling FeatureEngineering3 4 5 6 Deploy Models in the wild (e.g., AB test) Lab-based experiments Typical Abstract Data Analytics Pipeline WarehouseData 7 Reports and Decisions Models and decisions
  90. 90. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 91 Lecture Outline • Google Doc and Group • Welcome & Class Introductions • Big Data and Applications • Course introduction • Class logistics • Systems (part 1 of N)
  91. 91. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 92 Data Science at Scale Security/Privacy Big Data: human- centric, M2M, IoT Machines: Cloud Computing Parallel Frameworks: MapReduce:cmdLine, Hadoop, MRJob,Spark Algorithms: Machine Learning and Analytics Machine learning at Scale
  92. 92. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 93 Big data Definition: use • Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. – PROCESSING: • Think of your laptop that gets overwhelmed with 3-4 gig of data (disk space is 1TB) – STORAGE: • Laptop : 1 TB (1012 bytes) – THROUGH-PUT (Read 108 (100 meg/sec) 104 seconds) • 1TB would take 3 hours to read it using your laptop • Challenges – Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, security, and information privacy.
  93. 93. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 94 Big Data • In 2012, Gartner updated its definition as follows: "Big data is high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."[18]
  94. 94. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 95 Big Data: V3 • .. 10121021 speed of generation of data or how fast the data is generated and processed 2015: 1-2 TB per online individual 4ZB (1021) Today  40ZB in 2020
  95. 95. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 96 Sources Driving Big Data It’s All Happening On-line Every: Click Ad impression Billing event Fast Forward, pause,… Friend Request Transaction Network message Fault … User Generated (Web, Social & Mobile) … .. Internet of Things / M2M Scientific Computing Quantified Self
  96. 96. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 97 Big Data Infographic • .. http://www.ibmbigdatahub.com/sites/defaul t/files/infographic_file/4-Vs-of-big-data.jpg http://www.ibmbigdatahub.com/infographic/ four-vs-big-data By 2005 we had 120* 1018 By 2007 we had 280*2018 By 2020 we will have 40* 1021 The quality of the data being captured can vary greatly
  97. 97. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 98 3 Vs of Big Data • … 40TB per person by 2020 1-2 TB per person today2014/2015
  98. 98. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 100 Lecture Outline • Introduction • Artificial Intelligence • Machine Learning • Data Science • Applications • What’s next?
  99. 99. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 101 Why all the excitement? • Government: – Obama used 80 pieces of information on each person; 4 year history (versus Romney) – Nate Silver used Bayesian techniques to publish analyses and predictions related to the 2008 and 2012 United States presidential election • Sports: – Oakland Athletics baseball team and its manager Billy Beane • Transportation ( e.g., Autonomous Vehicles) • HCI: Speech Recognition and Translation • Healthcare – AI Cure: Do you know if your patients are taking their meds? • Digital Advertising • Search (web, local, mobile)
  100. 100. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 102 How does data, ML, data science work?
  101. 101. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 103 • ..
  102. 102. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 104 • .
  103. 103. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 105 Web search lifecycle • .. http://www.slideshare.net/GaneshVenkataraman3/learn-to-rank-using-machine-learning https://en.wikipedia.org/wiki/Monty_Hall_problem
  104. 104. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 106 Understand user intent • ,,
  105. 105. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 107 Fixing user errors • ..
  106. 106. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 108 • ,,
  107. 107. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 109 Like the Index at end of book • ..
  108. 108. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 110 PageRank • ..
  109. 109. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 111 Search is a ranking problem • ..
  110. 110. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 112 Learning to rank • ..
  111. 111. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 113 • ..
  112. 112. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 114 Training Data • ..
  113. 113. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 115 Supervised Feedback Loop Guided by human editors • ..
  114. 114. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 116 Mining relevance judgements • ..
  115. 115. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 117 Search ranking (web, jobs, local, etc) And Ads • ..
  116. 116. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 118 DB size = 100s billions of sites Google server farms 2 million machines (est) 1011 X 104 = 1015 ~1 Petabyte of data
  117. 117. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 119 Learning to Rank at SearchMe • Page Quality, Page Category, Webspam, Query understanding LETOR
  118. 118. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 120 LeToR: Improve in a measured way Doubled size of index More labeled training data
  119. 119. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 121 More data or more data science? • ..
  120. 120. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 122 Advertising ~2% of US GDP; $140B WW "Half the money I spend on advertising is wasted; the trouble is, I don't know which half." - John Wanamaker, father of modern advertising. – Less than 1% of all impressions lead to measureable ROI Despite its problems (Attribution, etc.) • US GDP = $14.1 Trillion (Global $56 Trillion, 56x1012) • US Advertising Spend – ~$275 Billion across all media • (2% of GDP since the early 1900s) • In 2014, Worldwide online advertising was $140 – I.e., about 20% of all ad spending across all media – $42 billion global mobile-advertising market in 2014
  121. 121. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 123 • Stopped here 11/15/2016 • Jgs
  122. 122. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 124 Making Money from Apps • 93% of downloaded apps in 2013 (globally) are free apps ! • 76% of revenue generated from apps (globally) in 2013 is from in-app purchases – [http://www.forbes.com/sites/chuckjones/2013/03/31/apps-with-in-app- purchase-generate-the-highest-revenue/] • In the Freeium economy – To make money from apps, publishers must maintain customer satisfaction through superior app performance and design, – then monetize though advertising and in-app purchases [http://venturebeat.com/2014/03/27/mobile-app-monetization-freemium-is-king- but-in-app-ads-are-growing-fast/,IDC, AppAnnie]
  123. 123. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 125 Mobile Publisher: How do I make money? Auction Ad Which Ad? Publisher: App Developer Consumer: App user • Paid app download • In app purchases • In app Advertising
  124. 124. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 126 • ..
  125. 125. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 127 Native Advertising
  126. 126. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 128 Rich Media Templates • Advertiser Template/Configuration • Defines an offer design/display for a specific ad unit • Publisher Template/Configuration • Defined and designed to provide native experience in publisher games • Controls allowable content (ad units) with a placement • Is a “shell” to an ad (advertiser offer template) • Tracks placement performance • Allows to control the behavior and look/design from the server
  127. 127. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 129 Native Design and Dynamic Creative Optimization Ad Frame Treatment Variable Intro Text Publisher Game Art or Character Integration Variable Integrated Call to Action Context, where is this solution being shown in the game? N Native Design Blends with Content Dynamic UI elements adapt to the ad and the audience
  128. 128. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 130 • .. http://venturebeat.com/2014/04/29/mobile-apps-could-hit-70b-in-revenues-by-2017-as-non-game- categories-take-off/
  129. 129. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 131 Mobile Ad Spend to Top $100 Billion Worldwide in 2016, 51% of Digital Market • US and China will account for nearly 62% of global mobile ad spending next year • http://www.emarketer.com/Article/Mobile-Ad-Spend-Top-100- Billion-Worldwide-2016-51-of-Digital- Market/1012299#sthash.FBfZAlaC.dpuf
  130. 130. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 132 CPMs on Mobile are catching up • Mobile Advertising: What is the average CPM on mobile? – The effective cost per thousand impressions (CPM) for desktop web ads is about $3.50, while the CPM for mobile ads is just $0.75. – Video-based CPMs typically > $15 http://www.quora.com/Mobile-Advertising/What-is-the-average-CPM-on-mobile http://mashable.com/2012/10/23/mobile-ad-prices/
  131. 131. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 133 NativeX: Art and Science of Native Mobile Advertising A p p P u b l i s h e r N a t I v e X S S P A d N e t E x c h a n g e D S P A d v e r t i s e r A d A g e n c y SUPPLY/PublishersCONSUMERS DEMAND/Advertisers
  132. 132. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 134 NativeX: Art and Science of Native Mobile Advertising A p p P u b l i s h e r N a t I v e X S S P A d N e t E x c h a n g e D S P A d v e r t i s e r A d A g e n c y • DOE NativeAds • Yieldmgt • LTV/Churn • SDK • LTV/Churn • Event-based CPA • Flexible, multiple conversions • Segment-based targeting • Forecasting • Coldstart • Pacing • Metrics and Evaluation SUPPLY/PublishersCONSUMERS DEMAND/Advertisers
  133. 133. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 135 “OLTP” Data Pipeline “OLAP” Data Pipeline • Offline Data • Logging Data • Used for Reporting and Modeling • Online Data • Used in Real Time • Used for Offer Serving Realtime Batch NativeX Data Pipelines Data Science Predictive Analytics Pipeline • Offline Batch Modeling • Real-timeAd Serving ETL (Extract, Transform,and Load) Ad serving data pipelines
  134. 134. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 136 “OLTP” Data Pipeline “OLAP” Data Pipeline • Offline Data • Logging Data • Used for Reporting and Modeling • Online Data • Used in Real Time • Used for Offer Serving Realtime Batch NativeX Data Pipelines Data Science Predictive Analytics Pipeline • Offline Batch Modeling • Real-timeAd Serving ETL (Extract, Transform,and Load) Ad serving data pipelines
  135. 135. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 137 Devices Ad serving architecture SDK Kinesis Lambda Spark & Scala Spark Ad Servers Aurora SSAS Cassandra SQL Server EMR Modeling Java / Python / R Excel Pivots Self-Service S3 S3 S3 Ad Hoc / Deep Analysis Pipeline BI Pipeline Data Science Pipeline Glacie r Spark ELB HA Proxy Elasticache Activity Tracking Raw Data Archived Activity Tracking EC2 Cluster Tableau Reporting Services Reporting APIs Hourly ETL EC2 Instance Data Warehouse Alerts Dashboards Debugging / Ops Ad-hoc Analysis EventTracking Data (Logs) Device Profiles Device Data Configuration / Lookup Data
  136. 136. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 138 Publisher: Which ad to show? Bids Auction getAd Ad Which Ad?
  137. 137. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 139 Publisher: Which ad to show? Ads, Bid (CPI) Auction getAd Ad Which Ad?
  138. 138. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 140 NativeX conducts an eCPM-based Auction Ads Pick best ads Bids Auction Action argmaxAd eCPM=bid*CR getAd Ad Transaction Logs $5×0.010×1000=$50 $10×0.002×1000=$20 $3×0.002×1000=$6 $4×0.001×1000=$4
  139. 139. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 141 NativeX conducts an eCPM-based Auction Ads Pick best ads Bids Auction Action argmaxAd eCPM=bid*CR getAd Ad Transaction Logs $5×0.010×1000=$50 $10×0.002×1000=$20 $3×0.002×1000=$6 $4×0.001×1000=$4 eCPMAd = CRAd × BidAd× 1000
  140. 140. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 142 1 2 Understand domain, Collect requirements Exploratorydata analysis Modeling: Conversion Rate Models Feature Engineering3 4 5 6 Deploy Models in the wild (e.g., AB test) Lab-based experiments 7 Steps in Modeling: E.g., Conversion Rate Modeling WarehouseData 7
  141. 141. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 143 Multiple Ad Sources: • DSPs,Exchanges • Ad Networks • Internal/Self-service Multiple conversiontypes: • CPM, CPC, CPI, CPCV, CPA,CPE De-duplication Optimization by geo Modeling Features: • Geo location • Device • Reviews (star rating, review text; Geo location of reviews) • Social media Tweets/ FB posts • Categories on Android and iOS • Creative Message • User profiles (RFM based on network behavior) • Device Behavioral (based on installed apps on device RFM, recommendations, categories) • Graph-based features • Others…. Campaign-specific models for CTR/CR
  142. 142. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 144 Modeling • ML Approaches – Gradient boosted decision trees – Bayesian hierarchical approaches – Segmentation via matrix factorization • Feature engineering – Feature invention • Metrics and evaluation • Storing and accessing data • Perennial Challenges – Coldstart – Bias – Scale
  143. 143. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 145 • .. https://upload.wikimedia.org/wikipedia/commons/ thumb/5/5f/Minard%27s_Map_%28vectorized%29. svg/2023px- Minard%27s_Map_%28vectorized%29.svg.png
  144. 144. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 146 If we can’t measure it then… • … Data Science Updates: 2013/10/25 ©2013 NativeX Holdings, LLC For 16%  40%
  145. 145. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 147 From a systems perspective: Three generations of machine learning • First generation: dataset that fit in memory – Single node learning summary statistics and some batch modeling (at sma scale); SQL, R – Down sampling the data • Second generation: General purpose clusters and framework – Distributedframeworks that allows us to divide and conquer problems – Learning using general purpose frameworks such as hadoop big data analysis offline, realtime decision making, homegrown specialist systems (Hadoop for analysis and modeling; ), Hadoop, R – In-house purpose built systems; specialist sport • Third generation: Purpose-built libraries and frameworks – Built for iterative algorithms that are common place in ML – huge scale realtime analysis and decision making systems – Specialized frameworks for large scale manipulation the type of data you a workign with. – For example, Machine learning libraries like MLLib in Spark, graph
  146. 146. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 148 Ranking Ads (more) at Turn Inc.
  147. 147. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 149 Text Processing • .. http://aylien.com/ http://aylien.com/ Deep Learning based CNN RNN
  148. 148. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 150 • .. Linking other things such as groups
  149. 149. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 151 • .. Growing
  150. 150. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 152 • Deep Learning
  151. 151. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 153 • ..
  152. 152. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 154 Logistic Regression Model Inputs Coefficients a, b, c Output Independent variables x1, x2, x3 Dependent variable p Prediction Age 34 1Gender Stage 4 “Probability of beingAlive” 5 8 4 0.6 S
  153. 153. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 155 S is the sum of inputs * weights Inputs Coefficients Output Independent variables Prediction Age 34 1Gender Stage 4 5 8 4 S  34.5  1.4  4.8  20.6
  154. 154. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 156 Neural Network Model Inputs Weights Output Independent variables Dependent variable Prediction Age 34 2Gender Stage 4 .6 .5 .8 .2 .1 .3 .7 .2 WeightsHiddenLa yer “Probability of beingAlive” 0.6 S S . 4 .2 S
  155. 155. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 157 Intelligent Systems in Your Everyday Life • Post Office – automatic address recognitionand sorting of mail • Banks – automatic check readers,signature verification systems – automated loan application classification • Customer Service – automatic voice recognition • The Web – Identifying your age, gender,location, from your Web surfing – Automated fraud detection • Digital Cameras – Automated face detectionand focusing • Computer Games – Intelligent characters/agents
  156. 156. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 158 • ..
  157. 157. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 159 • ..
  158. 158. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 160 • ..
  159. 159. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 161 • ..
  160. 160. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 162 • ..
  161. 161. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 163 • ..
  162. 162. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 164 • ..
  163. 163. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 165 • ..
  164. 164. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 166 • ..
  165. 165. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 167 • .. http://3.bp.blogspot.com/-iEx- C0ljkKk/VV38zjj_vdI/AAAAAAAAA7w/aron8CBjm os/s1600/alexnet.png
  166. 166. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 168 • . Daterequirements
  167. 167. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 169 • ..
  168. 168. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 170 • ..
  169. 169. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 171 • ..
  170. 170. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 172 Conversational UI • We’re witnessing an explosion of applications that no longer have a graphical user interface (GUI). • They’ve actually been around for a while, but they’ve only recently started spreading into the mainstream. • They are called bots, virtual assistants, invisible apps. • They can run on Slack, WeChat, Facebook Messenger, plain SMS, or Amazon Echo. • They can be entirely driven by artificial intelligence, or there can be a human behind the curtain.
  171. 171. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 173 Conversational UI • ..
  172. 172. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 174 • .. Check Balance replenish Charts --__--
  173. 173. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 175 Conversational UI • Amazon Echo is controlled by voice, but has a companion app.
  174. 174. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 1766Microsoft Research Cortan a
  175. 175. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 177 • ..
  176. 176. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 178 • ..
  177. 177. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 179 Speech Recognition Breakthrough for the Spoken, Translated Word • Published on Nov 8, 2012 • Chief Research Officer Rick Rashid demonstrates a speech recognition breakthrough via machine translation that converts his spoken English words into computer- generated Chinese language. The breakthrough is patterned after deep neural networks and significantly reduces errors in spoken as well as written translation. • For moreinformation on Speech Recognition and Translation, visit – http://www.microsoft.com/translator/skype.aspx • Excellent Video (please watch all this video!) – https://www.youtube.com/watch?v=Nu-nlQqFCKg (Minute 7:11) – English text (ASR)  Chinese Text  Text to speech system (sound like english speaker)
  178. 178. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 180 • /.. English text (ASR)  Chinese Text  Text to speech system (sound like english speaker)
  179. 179. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 181 ASR (Audio signal  word sequence) • .. HMM, Deep Learning, Language models
  180. 180. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 182 Tipping point: Humans no longer the center to the data universe • ..
  181. 181. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 183 IoT/IoE • ..
  182. 182. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 184 Personal; society; M2M; crowdsourcing • Society – Graphs: Social, professional; – Quantified self: Eating; Sleeping; exercising – Voting – Education – Healthcare…. Economics, shopping, etc. • Internet of things – Tracking Wildebeests in Serengeti, Tanzania (not just with GPS tags, but also with cameras at key strategic locations through out the Serengeti • Population changes in species; Scheduling safaris – 1 Billion smart meters by 2020; • 1 Petabyte of data per day? 10^9 =10^12 10^15 • 1 Billion smart meters (One megabye of data per device per day; Poll meter 1000 times per day; 1000 bytes of data each time – Smart cities • Etc.
  183. 183. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 185 Japanese to English • .. http://www.ustar-consortium.com/research.html
  184. 184. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 186 From analytics to closed loop control systems Historical Realtime Future Analytical Now Predictive Customerexitrate
  185. 185. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 187 From analytics to closed loop control systems Historical Realtime Future Analytical Now Predictive Customerexitrate Decisive
  186. 186. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 188 From analytics to closed loop control systems Historical Realtime Future Analytical $ Now Predictive $$ Customerexitrate Decisive $$$
  187. 187. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 189 Managers and CEOs see the value of DA Data (Science) improves KPIs dramatically Summary Stats and Reports Offline Data Mining (e.g, user Profiles) Realitime decision making Personalization LTV Advanced BI, Regional Sales KPIPerformanceImprovement (e.g.,Sales) 10-20% 20-30% 2X-10X 10X+ Churn, Repeat, BigSpender Realtime Recommendations, LookAlike Modeling Historical Realtime Future Ads (DSP/DMP) Amazon Google Netflix Oracle, SQL Hadoop (Omniture, Hyperion) SAS, SPSS Cloudera, R
  188. 188. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 190 Autonomous Vehicles • ..
  189. 189. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 191 • ..
  190. 190. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 192 Autonomous Vehicles • .. An image of what Google's self-driving car sees when it makes a left turn. http://www.rand.org/pubs/research_briefs/RB9755.html
  191. 191. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 193 autonomous vehicles • Research in autonomous cars started in the 1980s, but the technology wasn't there. • Perhaps the first significant event was the 2005 DARPA Grand Challenge, in which the goal was to have a driverless car go through a 132-mile off-road course. Stanford finished in first place. The car was equipped with various sensors (laser, vision, radar), whose readings needed to be synthesized (using probabilistic techniques that we'll learn from this class) to localize the car and then to generate control signals for the steering, throttle, and brake. • In 2007, DARPA created an even harder Urban Challenge, which was won by CMU. • In 2009, Google started a self-driving car program, and since then, their self-driving cars have driven over 1 million miles on freeways and streets. • In January 2015, Uber hired about 50 people from CMU's robotics department to build self-driving cars. • While there are still technological and policy issues to be worked out, the potential impact on transportation is huge.
  192. 192. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 194 • .. http://www.nature.com/news/auto omous-vehicles-no-drivers- required-1.16832 http://asirt.org/initiatives/informing road-users/road-safety-facts/road crash-statistics 800Million parking spots in US
  193. 193. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 195 Save fuel, Safer logistics • .. http://peloton-tech.com/
  194. 194. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 196 Data Science in Ecommerce • .. This is just a subset
  195. 195. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 197 Defining Product Strategy for the optimum product mix • Ecommerce, and bricks and mortar businesses – What products should they sell? – What price should be offered for the products and when? • Data science algorithms help ecommerce businesses define and optimize the product mix. – Every ecommerce business has a product team that looks into the design process where data science algorithms can help the business with forecasting like- • What are the loopholes in the product mix? • What should they make? • How many quantities should be ordered as initial batch from the factory outlet? • When should they halt the supply of those products? • When should they sell? • Data scientists versus Data Analysts – work on advanced predictive and prescriptive analytics – whereas data analysts will merely look into the retrospective analysis like
  196. 196. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 198 • https://www.aicure.com/
  197. 197. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 199 Do you know if your patients are taking their meds? • ..
  198. 198. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 200 Trust but verify! • ..
  199. 199. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 201 Rank patients • ..
  200. 200. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 202 Alerts • ..
  201. 201. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 203 Machine learning at Scale Algorithms: Machine Learning and Analytics Big Data: human-centric, M2M, IoT Machines: Cloud Computing Parallel Frameworks: MapReduce:cmdLine, Hadoop, MRJob,Spark Security/Privacy Machine learning at Scale
  202. 202. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 204 Lecture Outline • Introduction • Artificial Intelligence • Machine Learning • Data Science • Applications • What’s next?
  203. 203. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 205 150,000 Data Scientists needed in US [McKinsey Report on Big Data 2011] With such enormous potential to change the world, it will come as no surprise that data scientists are in huge demand
  204. 204. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 206 Top 10 Best Jobs in the US as of 2/2016 How much you make The demand for your skills How easily you can advance 117K Median salary 1,700 openings right now
  205. 205. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 207 From analytics to closed loop control systems Historical Realtime Future Analytical $ Now Predictive $$ Customerexitrate Decisive $$$ IoE, Deep Learning, GPU, Data,Bandwidth (5G)
  206. 206. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 208 •Architecture
  207. 207. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 209 Cool Thing #2: Schema on Read LOAD DATA FIRST, ASK QUESTIONS LATER Data is parsed/interpreted as it is loaded out of HDFS What implications does this have? BEFORE: ETL, schema design upfront, tossing out original data, comprehensive data study Keep original data around! Have multiple views of the same data! Work with unstructured data sooner! Store first, figure out what to do with it later! WITH HADOOP:
  208. 208. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 210 Cool Thing #4: Unstructured Data • Unstructured data: media, text, forms, log data lumped structured data • Query languages like SQL and Pig assume some sort of “structure” • MapReduce is just Java: You can do anything Java can do in a Mapper or Reducer
  209. 209. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 211 Left outer join: return all rows from left table even if there are no matches in the right lable • .. Customers is the left Customers Orders CustomerName OrderID A 2 A 4 A 3 B BLANK
  210. 210. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 212 Inner Join or simply join • . CustomerName OrderID A 2 A 4 A 3 B BLANK
  211. 211. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 213 Join Question: Ecommerce Company • Given – Transaction Logfile/DB/CSVfile (1 Billion transactions) • User ID, Date, Time, Referring URL, item purchased, price, etc.. – User Information/Location file/DB (1Million records) • User ID, HomeCountry, HomeState, HomeZipCode, etc.. • 5 numbers X 2 bytes X * 10^6 = 10^7 (Around 10 MEG) • Join Transaction DB with Location DB using the USER_ID (e.g., Phone number) • Complete this job within one hour every hour! • Using Hadoop, what type of join would you recommend? – NOTE: remember to specify type of join, role of each table, and how do it in Hadoop TASK
  212. 212. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 214 • In memory join with user table broadcast to all nodes • Left = User table; right = Transactions table • Right outer join: – Transaction + User
  213. 213. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 215 Join part 2 • Left table (Customer information table) • Right table (Transaction table) • Question: Left/Right/Inner/Outer Join? • Right join: – some customers may not exist • HashJoin? Reduce side Join?
  214. 214. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 216 Advertising ~2% of US GDP; $140B WW "Half the money I spend on advertising is wasted; the trouble is, I don't know which half." - John Wanamaker, father of modern advertising. – Less than 1% of all impressions lead to measureable ROI Despite its problems (Attribution, etc.) • US GDP = $14.1 Trillion (Global $56 Trillion, 56x1012) • US Advertising Spend – ~$275 Billion across all media • (2% of GDP since the early 1900s) • In 2015, Worldwide online advertising was $150Billion – I.e., about 20% of all ad spending across all media – $42 billion global mobile-advertising market in 2014 – $100 billion global mobile-advertising market in 2016 $400 Million on Super Bowl Advertising TV/Online Cover in more detail in Week 12
  215. 215. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 217 NativeX: Art and Science of Native Mobile Advertising SUPPLY/PublishersCONSUMERS DEMAND/Advertisers A p p P u b l i s h e r N a t I v e X S S P A d N e t E x c h a n g e D S P A d v e r t i s e r A d A g e n c y
  216. 216. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 218 “OLTP” Data Pipeline “OLAP” Data Pipeline • Offline Data • Logging Data • Used for Reporting and Modeling • Online Data • Used in Real Time • Used for Offer Serving Realtime Batch NativeX Data Pipelines Data Science Predictive Analytics Pipeline • Offline Batch Modeling • Real-timeAd Serving ETL (Extract, Transform,and Load) Ad serving data pipelines Devices SDK Bid X CTRAD, Context X 1000 =eCPMAd 100 Milliseconds
  217. 217. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 219 “OLTP” Data Pipeline “OLAP” Data Pipeline • Offline Data • Logging Data • Used for Reporting and Modeling • Online Data • Used in Real Time • Used for Offer Serving Realtime Batch NativeX Data Pipelines Data Science Predictive Analytics Pipeline • Offline Batch Modeling • Real-timeAd Serving ETL (Extract, Transform,and Load) Ad serving data pipelines
  218. 218. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 220 Ranking Ads (more) at Turn Inc.
  219. 219. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 221 Devices Potential Ad serving architecture SDK Streaming Spark Ad Servers Aurora Cube Cassandra SQL Server EMR Modeling Java / Python / R Excel Pivots Self-Service S3 S3 S3 BI Pipeline Data Science Pipeline Glacie r Spark MemCache December 2015 View Activity Tracking Raw Data Archived Activity Tracking EC2 Cluster Tableau Reporting Services Reporting APIs Hourly ETL EC2 Instance Data Warehouse EventTracking Data (Logs) Device Profiles Device Data Configuration / Lookup Data
  220. 220. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 222 NativeX: Art and Science of Native Mobile Advertising SUPPLY/PublishersCONSUMERS DEMAND/Advertisers A p p P u b l i s h e r N a t I v e X S S P A d N e t E x c h a n g e D S P A d v e r t i s e r A d A g e n c y • DOE Native Ads • Yield mgt • LTV/Churn • SDK • LTV/Churn • Event-based CPA • Flexible, multiple conversions • Segment-based targeting • Forecasting • Coldstart • Pacing • Metrics and Evaluation
  221. 221. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 223 • End Deep Artificial Intelligence Talk
  222. 222. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 224 Data Mining Lectures Lecture 18: Credit Scoring ICS 278: Data Mining Lecture 18: Credit Scoring Padhraic Smyth Department of Information and Computer Science University of California, Irvine
  223. 223. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 225 Data Mining Lectures Lecture 18: Credit Scoring Presentations for Next Week • Names for each day will be emailed out by tomorrow • Instructions: – Email me your presentations by 12 noon the day of your presentation (no later please) – I will load them on my laptop (so no need to bring a machine) – Each presentation will be 6 minutes long + 2 minutes questions • So probably about 4 to 8 (max) slides per presentation
  224. 224. Large-Scale Machine Learning, MIDS, UC Berkeley © 2015 James G. Shanahan Contact:James.Shanahan @ gmail.com 226 Data Mining Lectures Lecture 18: Credit Scoring References on Credit Scoring Statistical Classification Methods in Consumer Credit Scoring: a Review D. J. Hand and W. E. Henley Journal of the Royal Statistical Society: Series A Volume 160: Issue 3, November 1997 Available online at class Web page under lecture notes Also: Credit Scoring and its Applications: L. C