3. Dato Confidential3
Business
must be intelligent
Machine learning
applications
• Recommenders
• Fraud detection
• Ad targeting
• Financial models
• Personalized medicine
• Churn prediction
• Smart UX
(video & text)
• Personal assistants
• IoT
• Socials networks
• Log analysis
Last decade:
Data management
Now:
Intelligent apps
?
Last 5 years:
Traditional analytics
9. Dato Confidential
Why use GraphLab Create?
9
• Efficient storage
GraphLab Sframe compressed column store:
• x20 smaller than pandas
• x2 smaller than Gzip
Size on disk (the lower the better!)
10. Dato Confidential
No need for huge RAM!
10
Effective Delay vs RAM
x2x5
Data size limited by disk size
My data is larger than my machine RAM
12. Dato Confidential
Summary of differences vs. sklearn
12
• Better multicore support
• Out of core implementation (working from disk)
• Automatic feature expansion
• Automatic parameter selection
• Support for model serving
• Additional algorithms
14. Dato Confidential14
Dato on Coursera
40,000 students in 4 months
https://www.coursera.org/learn/ml-foundations
Specialization
content:
● Machine Leraning
Foundations
● Regression
● Classification
● Clustering &
Retrieval
● Recommendation
Systems &
Dimentionality
Reduction
● Capstone: An
Intelligent
Application with
Deep Learning
17. Dato Confidential17
Create an intelligent world!
Data
Engineering
Sophisticated
ML
Deployment
• Fast & scalable
• Rich data types
• Built for ML
• App-oriented ML
• Scalable ML
• Extensibility
• Batch & always-on
• RESTful interface
• Elastic & robust
bickson@dato.com
27. Dato Confidential27
• Subscription license which
includes support and and
upgrades
• Licensed by user for
Create & by machine for
production use
• Training & technical
services also available
• Discounts available for 10
or more users
28. Dato Confidential
Deployment Scenarios
28
“Getting Started”
“Real-time Predictions”
“Scaling Up”
GraphLab Create
Dato Predictive Services
Dato Distributed
Key
GraphLab Create – installed on each team member machine
• Working with data, training new models, doing ad-hoc analysis
GraphLab Create
• Installed on central team server
• Trains production models periodically (ex. nightly)
• Generates predictions and records to data store
GraphLab Create – installed on each team member machine
• Installed on team member laptops
• Working with data, ad-hoc analysis, training new models
• Deploy new models to Predictive Services deployment
GraphLab Create – installed on central team server
• Trains production models periodically (ex. nightly)
• Deploys models to Dato Predictive Services
Dato Predictive Services – installed on central team cluster
• Hosting & Serving deployed models
• REST API for application integration
GraphLab Create – installed on each team member machine
• Working with data, training new models, doing ad-hoc analysis
• Deploys models to Predictive Services
• Submits jobs to Distributed
Dato Distributed – installed on central team cluster
• Train models in parallel on larger dataset periodically (ex. nightly)
• Deploys newly trained models to Dato Predictive Services
Dato Predictive Services – installed on central team cluster
• Hosting deployed models
• REST API for applicationintegration