Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Dato Confidential1
GraphLab Create Benchmarks
April 21, 2016
Guy Rapaport, Data Scientist, Dato EMEA
guy@dato.com
Dato Confidential2
Dato: We Intelligent Applications
Dato Confidential
Some of our Customers
3
Dato Confidential4
Business
must be intelligent
Machine learning
applications
• Recommenders
• Fraud detection
• Ad target...
Dato Confidential
Example Intelligent Applications
- images
- text
- graphs
- tabular data
5
Dato Confidential
Creating a model pipeline
exploration
data
modeling
Dato Confidential
Creating a model pipeline
Ingest Transform Model Deploy
Unstructured Data
Dato Confidential
Dato Confidential9
GraphLab Create in a Line
“A general-purpose machine learning Python library that
scales on large datas...
Dato Confidential10
What will we cover today?
1. Instantiating a machine in the Amazon EC2 cloud
• r3.8xlarge instance
• 3...
Dato Confidential11
What will you be able to do afterwards?
Instantiate an EC2 instance, grab our benchmark
notebooks, and...
Dato Confidential12
Screen Primer
Command Action
sudo apt-get install –y screen Install screen
screen –S my_session Start ...
Dato Confidential
Confidential – Dato internal use only. ©2015 Dato, Inc.
Questions?
“For the purpose of learning the Answ...
Dato Confidential14
Our Machine Learning Specialization
in Coursera
https://www.coursera.org/learn/ml-foundations
Dato Confidential
Confidential – Dato internal use only. ©2015 Dato, Inc.
Thanks!
Install using pip: $ pip install -U grap...
Prochain SlideShare
Chargement dans…5
×

Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets

478 vues

Publié le

Presented by Guy Rapaport

Publié dans : Technologie
  • Soyez le premier à commenter

Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets

  1. 1. Dato Confidential1 GraphLab Create Benchmarks April 21, 2016 Guy Rapaport, Data Scientist, Dato EMEA guy@dato.com
  2. 2. Dato Confidential2 Dato: We Intelligent Applications
  3. 3. Dato Confidential Some of our Customers 3
  4. 4. Dato Confidential4 Business must be intelligent Machine learning applications • Recommenders • Fraud detection • Ad targeting • Financial models • Personalized medicine • Churn prediction • Smart UX (video & text) • Personal assistants • IoT • Socials networks • Log analysis Last decade: Data management Now: Intelligent apps ? Last 5 years: Traditional analytics
  5. 5. Dato Confidential Example Intelligent Applications - images - text - graphs - tabular data 5
  6. 6. Dato Confidential Creating a model pipeline exploration data modeling
  7. 7. Dato Confidential Creating a model pipeline Ingest Transform Model Deploy Unstructured Data
  8. 8. Dato Confidential
  9. 9. Dato Confidential9 GraphLab Create in a Line “A general-purpose machine learning Python library that scales on large datasets.”  General purpose: classification, graph analytics…  Python API on top, C++ open-source engine below.  Scales vertically: more CPUs, RAM and faster disks.  Large datasets: disk bound, not RAM bound. 9
  10. 10. Dato Confidential10 What will we cover today? 1. Instantiating a machine in the Amazon EC2 cloud • r3.8xlarge instance • 32 cores, 244GBs of RAM, 2 SSDs of 320GBs each 2. Run PageRank on a large graph • CommonCrawl 2012 dataset – the internet as a graph • 3.5 billion nodes, 128 billion links 3. Run Gradient Boosted Trees on a large dataset • Criteo 1TB Click Logs Dataset • 4.3 billion rows, 39 features (13 numerical, 26 categorical) 10
  11. 11. Dato Confidential11 What will you be able to do afterwards? Instantiate an EC2 instance, grab our benchmark notebooks, and try it yourself! Everything is publicly available on github: https://github.com/guy4261/glc_pagerank_benchmark 11
  12. 12. Dato Confidential12 Screen Primer Command Action sudo apt-get install –y screen Install screen screen –S my_session Start a session named my_session PS1=‘u@h(${STY}:${WINDOW}):w$’ Change your screen prompt (helpful) # CTRL+A, then D Key combination to detach screen -ls List all open screens screen –r my_session Reattach to your screen exit Exit the session and terminate the screen
  13. 13. Dato Confidential Confidential – Dato internal use only. ©2015 Dato, Inc. Questions? “For the purpose of learning the Answer to the Ultimate Question of Life, The Universe, and Everything, the supercomputer Deep Thought was specially built. It takes Deep Thought 7½ million years to compute and check the answer, which turns out to be 42. Deep Thought points out that the answer seems meaningless because the beings who instructed it never actually knew what the Question was.” - Douglas Adams, “The Hitchhiker’s Guide to the Galaxy”
  14. 14. Dato Confidential14 Our Machine Learning Specialization in Coursera https://www.coursera.org/learn/ml-foundations
  15. 15. Dato Confidential Confidential – Dato internal use only. ©2015 Dato, Inc. Thanks! Install using pip: $ pip install -U graphlab-create Dato Launcher Download: https://dato.com/download/ The benchmarks on GitHub: https://github.com/guy4261/glc_pagerank_benchmark Coursera Course: https://www.coursera.org/learn/ml-foundations Reach out: guy@dato.com

×