Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1Qm6RNP.
Lucian Vlad Lita focuses on the crucial next step in personalization: well-designed software architectures for storing, computing, and delivering responsive, accurate in-product predictions and experiments. To make it concrete, he presents several Intuit use cases around anomaly detection and personalized in-product experiences. Filmed at qconsf.com.
Lucian Vlad Lita is director of Data Engineering at Intuit, leading a big data platform and large-scale real-time data services group in the US and the EU.
Presentation on how to chat with PDF using ChatGPT code interpreter
Takes a Village to Raise a Machine Learning Model
1. It Takes a Village to Raise a
Machine Learning Model
Lucian Lita
@datariver
2. InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/intuit-machine-learning
3. Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
4. It Takes a Village to Raise a
Machine Learning Model
Lucian Lita
@datariver
6. @datariver
more clean data is better than more data #BigData
Big Data Sheep @bigdatasheep n 4yr
more labeled data is better than more data #BigData
Big Data Sheep @bigdatasheep n 3yr
more smart data is better than purple data #BigData
Big Data Sheep @bigdatasheep n 2yr
Data
more data is better than complex algorithms #BigData
Big Data Sheep @bigdatasheep n 5yr
**inflated historical depiction
16. @datariver
Push-scientist
Invest in ML; start with a thin system
How much effort put into Platform & Automation?
(A) best you can do in x weeks
(B) one step above prototype
(C) enough baling wire & duct tape to support a first use case
17. @datariver
Push-button
Invest in scale & automation; basic ML
How much effort put into ML?
(A) best generic model setup in y weeks?
(B) noticeably better than random?
(C) pack enough punch to be visible, but not more
31. @datariver
Data Store. What do you really need?
API t
content ditto performance HA history
scalability triggers consumers governance sharing
32. @datariver
Data Store. To HA or not to HA.
in-app
revenue
driver
infrastructure
cost
build &
operate
now later (blasphemy)
critical user
benefit
known
use cases
39. Apps
API (delivery)
personalized
content
API (capture)
feedback
API (compute)
in-app
data
personalized
content
API (push)
direct
content
Event
Lograw data
or features
run models
train models
periodically
re-run new
models
periodically
1
1
2 2
3
3
RT
Analytics
Model Deployment
Model Building
4
API (analytics)
**terribly incomplete, mildly inaccurate
4
41. As you embark …
Know this
non-trivial
no one-size fits all
Upfront
what do you really need?
know thy target architecture
Do it!
working system in weeks
fast iterations – ship & test
interfaaaaaaaces!
48. @datariver
Data
Low data volume
Invest in data acquisition
Invest in high coverage
High data volume
Invest in defining signal
Invest in labeling, tools, and crowdsourcing
60. @datariver
App. Who does the App talk to?
App
API (compute)
-- retrieve static data
-- apply op logic
-- compute features
-- run model
-- log actions
App
API (retrieve)
-- apply op logic
-- retrieve pre-computed
content
personalized
content
dynamic
data
personalized
content
(a) (b)
61. Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/intuit-
machine-learning