Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Takes a Village to Raise a Machine Learning Model

555 vues

Publié le

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1Qm6RNP.

Lucian Vlad Lita focuses on the crucial next step in personalization: well-designed software architectures for storing, computing, and delivering responsive, accurate in-product predictions and experiments. To make it concrete, he presents several Intuit use cases around anomaly detection and personalized in-product experiences. Filmed at qconsf.com.

Lucian Vlad Lita is director of Data Engineering at Intuit, leading a big data platform and large-scale real-time data services group in the US and the EU.

Publié dans : Technologie
  • Soyez le premier à commenter

Takes a Village to Raise a Machine Learning Model

  1. 1. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver
  2. 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /intuit-machine-learning
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  4. 4. It Takes a Village to Raise a Machine Learning Model Lucian Lita @datariver
  5. 5. @datariver Algorithms
  6. 6. @datariver more clean data is better than more data #BigData Big Data Sheep @bigdatasheep n 4yr more labeled data is better than more data #BigData Big Data Sheep @bigdatasheep n 3yr more smart data is better than purple data #BigData Big Data Sheep @bigdatasheep n 2yr Data more data is better than complex algorithms #BigData Big Data Sheep @bigdatasheep n 5yr **inflated historical depiction
  7. 7. @datariver Data
  8. 8. @datariver Next Frontier: well designed software architectures Personalization, experimentation, anomaly detection, fraud detection …
  9. 9. @datariver Battle Plan Anomaly detection quick peek Personalization deep dive sw architecture flavor Music streaming, advertising, medical informatics brief stories
  10. 10. @datariver
  11. 11. @datariver Reasonable coverage. Segmentation. Reasonable coverage. Personalization. Product as is. No customization. x all x 1 … x 1 … x 1 … x 1 … x 1
  12. 12. @datariver Childhood. Approaches.
  13. 13. @datariver DeepBroad
  14. 14. @datariver Push-scientistPush-button storage delivery API AppApp Optimization -- ML algorithms -- data: more, better, smarter -- features, selection
  15. 15. @datariver Push-scientistPush-button storage delivery API AppApp storage delivery API Scale & Automation -- model build -- model deploy -- single instrumentation Optimization -- ML algorithms -- data: more, better, smarter -- features, selection
  16. 16. @datariver Push-scientist Invest in ML; start with a thin system How much effort put into Platform & Automation? (A)  best you can do in x weeks (B)  one step above prototype (C)  enough baling wire & duct tape to support a first use case
  17. 17. @datariver Push-button Invest in scale & automation; basic ML How much effort put into ML? (A)  best generic model setup in y weeks? (B)  noticeably better than random? (C)  pack enough punch to be visible, but not more
  18. 18. @datariver Push-scientistPush-button
  19. 19. @datariver Adolescence. Platform Patterns.
  20. 20. @datariver periodically batch train model App API (retrieve) pre-computed content personalized content API (capture) feedback periodically run models (A) Stored
  21. 21. @datariver periodically batch train model App API (compute) compute on-the-fly personalized content API (capture) feedback (B) On-the Fly
  22. 22. @datariver App API (deliver) personalized content API (capture) feedback Challenge accepted: asymptotically real time! (C) Aggressive
  23. 23. @datariver App API (deliver) personalized content API (capture) feedback Challenge accepted: asymptotically real time! (C) Aggressive
  24. 24. @datariver Maturity. Patterns and Assumptions.
  25. 25. @datariver Content Delivery Data Capture Model Deployment Model Building Analytics Data Store What do you really need? Do you need it now?
  26. 26. @datariver Model Building. What do you really need? algos space data eval compute scalability HAsecuritymetrics 101010 operators
  27. 27. @datariver Model Building. What do you really need? algos space data eval compute scalability HAsecuritymetrics 101010 operators
  28. 28. @datariver Model Deployment. What do you really need? API envt ditto versioning deploy sharing scalability HAsecurityperformance Mi Mi+1
  29. 29. @datariver Personalization Delivery. What do you really need?
  30. 30. @datariver Personalization Delivery. What do you really need? API instrument ditto exploit explore sharing scalability HAsecurityperformance
  31. 31. @datariver Data Store. What do you really need? API t content ditto performance HA history scalability triggers consumers governance sharing
  32. 32. @datariver Data Store. To HA or not to HA. in-app revenue driver infrastructure cost build & operate now later (blasphemy) critical user benefit known use cases
  33. 33. @datariver Data Store. APIs
  34. 34. @datariver Data Capture. What do you really need? API t content ditto historytriggers consumers sharing scalability HAsecurityperformance
  35. 35. @datariver Analytics. What do you really need? API t content ditto performance history scalability consumersflexibility
  36. 36. @datariver Analytics. Experimentation & Personalization
  37. 37. @datariver Data Lake. What do you really need? say ‘big data lake’ one more time!
  38. 38. @datariver Evolving Architecture. Before you know it…
  39. 39. Apps API (delivery) personalized content API (capture) feedback API (compute) in-app data personalized content API (push) direct content Event Lograw data or features run models train models periodically re-run new models periodically 1 1 2 2 3 3 RT Analytics Model Deployment Model Building 4 API (analytics) **terribly incomplete, mildly inaccurate 4
  40. 40. Not an Exact Blueprint
  41. 41. As you embark … Know this non-trivial no one-size fits all Upfront what do you really need? know thy target architecture Do it! working system in weeks fast iterations – ship & test interfaaaaaaaces!
  42. 42. village model **not drawn to effort scale
  43. 43. @datariver Software architecture is the next frontier! Fail fast still applies! Personalize your personalization platform!
  44. 44. @datariver better algorithms more, better, smarter data well designed software architectures next frontier
  45. 45. @datariver A Brief Look at Anomaly Detection
  46. 46. @datariver Applications ¡  System health – servers, network ¡  Cyber-intrusion detection ¡  Enterprise anomaly detection ¡  Image processing ¡  Textual anomaly detection ¡  Sensor networks ¡  Fraud detection ¡  Medical anomaly detection ¡  Industrial damage detection ¡  …
  47. 47. @datariver Algorithms ¡  Supervised ¡  Unsupervised ¡  Generic statistical ¡  Information theory ¡  … “What algorithms are you going to use?”
  48. 48. @datariver Data Low data volume Invest in data acquisition Invest in high coverage High data volume Invest in defining signal Invest in labeling, tools, and crowdsourcing
  49. 49. @datariver Architectures Again Capture Data Collectors Clickstream, User Input … Real time, DBs … Compute run models Labeling Labeling Crowdsourcing Active learning Processors (M&A) broad: time bounded deep: open ended **check assumptions
  50. 50. @datariver Advertising
  51. 51. @datariver Music Streaming
  52. 52. @datariver Medical Informatics
  53. 53. @datariver better algorithms more, better, smarter data well designed software architectures next frontier
  54. 54. @datariver Thank you! Lucian Lita @datariver [always hiring] data@intuit.com
  55. 55. @datariver Thank you! Lucian Lita @datariver [always hiring] data@intuit.com
  56. 56. @datariver
  57. 57. @datariver Extra Content
  58. 58. @datariver Security. What do you really need?
  59. 59. @datariver
  60. 60. @datariver App. Who does the App talk to? App API (compute) -- retrieve static data -- apply op logic -- compute features -- run model -- log actions App API (retrieve) -- apply op logic -- retrieve pre-computed content personalized content dynamic data personalized content (a) (b)
  61. 61. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/intuit- machine-learning

×