SlideShare une entreprise Scribd logo
1  sur  61
It Takes a Village to Raise a
Machine Learning Model
Lucian Lita
@datariver
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/intuit-machine-learning
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
It Takes a Village to Raise a
Machine Learning Model
Lucian Lita
@datariver
@datariver
Algorithms
@datariver
more clean data is better than more data #BigData
Big Data Sheep @bigdatasheep n 4yr
more labeled data is better than more data #BigData
Big Data Sheep @bigdatasheep n 3yr
more smart data is better than purple data #BigData
Big Data Sheep @bigdatasheep n 2yr
Data
more data is better than complex algorithms #BigData
Big Data Sheep @bigdatasheep n 5yr
**inflated historical depiction
@datariver
Data
@datariver
Next Frontier: well designed software architectures
Personalization, experimentation, anomaly detection,
fraud detection …
@datariver
Battle Plan
Anomaly detection quick peek
Personalization deep dive
sw architecture flavor
Music streaming, advertising, medical informatics brief stories
@datariver
@datariver
Reasonable coverage.
Segmentation.
Reasonable coverage.
Personalization.
Product as is.
No customization.
x all x 1
… x 1
… x 1
… x 1
… x 1
@datariver
Childhood. Approaches.
@datariver
DeepBroad
@datariver
Push-scientistPush-button
storage
delivery
API
AppApp
Optimization
-- ML algorithms
-- data: more, better, smarter
-- features, selection
@datariver
Push-scientistPush-button
storage
delivery
API
AppApp
storage
delivery
API
Scale & Automation
-- model build
-- model deploy
-- single instrumentation
Optimization
-- ML algorithms
-- data: more, better, smarter
-- features, selection
@datariver
Push-scientist
Invest in ML; start with a thin system
How much effort put into Platform & Automation?
(A)  best you can do in x weeks
(B)  one step above prototype
(C)  enough baling wire & duct tape to support a first use case
@datariver
Push-button
Invest in scale & automation; basic ML
How much effort put into ML?
(A)  best generic model setup in y weeks?
(B)  noticeably better than random?
(C)  pack enough punch to be visible, but not more
@datariver
Push-scientistPush-button
@datariver
Adolescence. Platform Patterns.
@datariver
periodically batch
train model
App
API (retrieve)
pre-computed
content
personalized
content
API (capture)
feedback
periodically
run models
(A) Stored
@datariver
periodically batch
train model
App
API (compute)
compute
on-the-fly
personalized
content
API (capture)
feedback
(B) On-the Fly
@datariver
App
API (deliver)
personalized
content
API (capture)
feedback
Challenge accepted:
asymptotically
real time!
(C) Aggressive
@datariver
App
API (deliver)
personalized
content
API (capture)
feedback
Challenge accepted:
asymptotically
real time!
(C) Aggressive
@datariver
Maturity. Patterns and Assumptions.
@datariver
Content Delivery
Data Capture
Model Deployment
Model Building
Analytics
Data Store
What do you really need?
Do you need it now?
@datariver
Model Building. What do you really need?
algos space data eval compute
scalability HAsecuritymetrics
101010
operators
@datariver
Model Building. What do you really need?
algos space data eval compute
scalability HAsecuritymetrics
101010
operators
@datariver
Model Deployment. What do you really need?
API
envt ditto versioning deploy
sharing scalability HAsecurityperformance
Mi Mi+1
@datariver
Personalization Delivery. What do you really need?
@datariver
Personalization Delivery. What do you really need?
API
instrument ditto exploit explore
sharing scalability HAsecurityperformance
@datariver
Data Store. What do you really need?
API t
content ditto performance HA history
scalability triggers consumers governance sharing
@datariver
Data Store. To HA or not to HA.
in-app
revenue
driver
infrastructure
cost
build &
operate
now later (blasphemy)
critical user
benefit
known
use cases
@datariver
Data Store. APIs
@datariver
Data Capture. What do you really need?
API t
content ditto historytriggers consumers
sharing scalability HAsecurityperformance
@datariver
Analytics. What do you really need?
API t
content ditto performance history
scalability consumersflexibility
@datariver
Analytics. Experimentation & Personalization
@datariver
Data Lake. What do you really need?
say ‘big data lake’
one more time!
@datariver
Evolving Architecture. Before you know it…
Apps
API (delivery)
personalized
content
API (capture)
feedback
API (compute)
in-app
data
personalized
content
API (push)
direct
content
Event
Lograw data
or features
run models
train models
periodically
re-run new
models
periodically
1
1
2 2
3
3
RT
Analytics
Model Deployment
Model Building
4
API (analytics)
**terribly incomplete, mildly inaccurate
4
Not an Exact Blueprint
As you embark …
Know this
non-trivial
no one-size fits all
Upfront
what do you really need?
know thy target architecture
Do it!
working system in weeks
fast iterations – ship & test
interfaaaaaaaces!
village
model
**not drawn to effort scale
@datariver
Software architecture is the next frontier!
Fail fast still applies!
Personalize your personalization platform!
@datariver
better
algorithms
more, better, smarter
data
well designed
software architectures
next frontier
@datariver
A Brief Look at Anomaly Detection
@datariver
Applications
¡  System health – servers, network
¡  Cyber-intrusion detection
¡  Enterprise anomaly detection
¡  Image processing
¡  Textual anomaly detection
¡  Sensor networks
¡  Fraud detection
¡  Medical anomaly detection
¡  Industrial damage detection
¡  …
@datariver
Algorithms
¡  Supervised
¡  Unsupervised
¡  Generic statistical
¡  Information theory
¡  …
“What algorithms are you going to use?”
@datariver
Data
Low data volume
Invest in data acquisition
Invest in high coverage
High data volume
Invest in defining signal
Invest in labeling, tools, and crowdsourcing
@datariver
Architectures Again
Capture
Data Collectors
Clickstream, User Input …
Real time, DBs …
Compute
run models
Labeling
Labeling
Crowdsourcing
Active learning
Processors (M&A)
broad: time bounded
deep: open ended
**check assumptions
@datariver
Advertising
@datariver
Music Streaming
@datariver
Medical Informatics
@datariver
better
algorithms
more, better, smarter
data
well designed
software architectures
next frontier
@datariver
Thank you! Lucian Lita
@datariver
[always hiring]
data@intuit.com
@datariver
Thank you! Lucian Lita
@datariver
[always hiring]
data@intuit.com
@datariver
@datariver
Extra Content
@datariver
Security. What do you really need?
@datariver
@datariver
App. Who does the App talk to?
App
API (compute)
-- retrieve static data
-- apply op logic
-- compute features
-- run model
-- log actions
App
API (retrieve)
-- apply op logic
-- retrieve pre-computed
content
personalized
content
dynamic
data
personalized
content
(a) (b)
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/intuit-
machine-learning

Contenu connexe

Plus de C4Media

Plus de C4Media (20)

Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Dernier (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

Takes a Village to Raise a Machine Learning Model