Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
1© Cloudera, Inc. All rights reserved.
Matt Brandwein | Director, Products
Tristan Zajonc | Head of Data Science Engineeri...
2© Cloudera, Inc. All rights reserved.
Age of Machine Learning
2
Cost of compute
Data volume
Time
Machine
Learning
NO
Mach...
3© Cloudera, Inc. All rights reserved.
The Enterprise Platform for
Data Science and Machine Learning
The data is now here
...
4© Cloudera, Inc. All rights reserved.
Sample data science / machine learning workflow
From data to exploration to action
...
5© Cloudera, Inc. All rights reserved.
The good news
Data Engineering Data Science (Exploratory) Production (Operational)
...
6© Cloudera, Inc. All rights reserved.
Poll: Which of the following
languages/tools do you use?
Python
R
Scala
Spark MLlib...
7© Cloudera, Inc. All rights reserved.
The bad news
Data Engineering Data Science (Exploratory) Production (Operational)
D...
8© Cloudera, Inc. All rights reserved.
Additional challenges
Access
For sensitive data, secure clusters are
difficult to a...
9© Cloudera, Inc. All rights reserved.
This year, our goal is to enable data science
and machine learning at scale.
10© Cloudera, Inc. All rights reserved.
Open data science in the enterprise
IT
drive adoption while maintaining compliance...
11© Cloudera, Inc. All rights reserved.
Our goal: An open platform for data science at scale
Help more data scientists
use...
12© Cloudera, Inc. All rights reserved.
Introducing Cloudera Data Science Workbench
Self-service data science for the ente...
13© Cloudera, Inc. All rights reserved.
Demo
14© Cloudera, Inc. All rights reserved.
Data scientists can:
• Use R, Python, or Scala from a web
browser, with no desktop...
15© Cloudera, Inc. All rights reserved.
Poll: Which of the following describes your
production machine learning use case?
...
16© Cloudera, Inc. All rights reserved.
Solving Data Science is a Full-Stack Problem
• Support unlimited data
• Provide su...
17© Cloudera, Inc. All rights reserved.
The importance of an open ecosystem
Open Ecosystem Black Box
18© Cloudera, Inc. All rights reserved.
Join us again for…
April 20th
A Visual Dive into Machine Learning and Deep Learnin...
19© Cloudera, Inc. All rights reserved.
Thank you!
matt@cloudera.com
tristanz@cloudera.com
Continue the webinar series:
ht...
20© Cloudera, Inc. All rights reserved.
21© Cloudera, Inc. All rights reserved.
22© Cloudera, Inc. All rights reserved.
23© Cloudera, Inc. All rights reserved.
24© Cloudera, Inc. All rights reserved.
25© Cloudera, Inc. All rights reserved.
26© Cloudera, Inc. All rights reserved.
Prochain SlideShare
Chargement dans…5
×

Part 1: Introducing the Cloudera Data Science Workbench

4 786 vues

Publié le

3 Things to Learn About:

*The emergence of open source tools for data science
*Common gaps in the ecosystem
*Introduction of a new tool from Cloudera

Publié dans : Logiciels
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Part 1: Introducing the Cloudera Data Science Workbench

  1. 1. 1© Cloudera, Inc. All rights reserved. Matt Brandwein | Director, Products Tristan Zajonc | Head of Data Science Engineering Unlocking Data Science in the Enterprise Webinar Series Part 1: Introducing Cloudera Data Science Workbench
  2. 2. 2© Cloudera, Inc. All rights reserved. Age of Machine Learning 2 Cost of compute Data volume Time Machine Learning NO Machine Learning 1950s 1960s 1970s 1980s 1990s 2000s 2010s
  3. 3. 3© Cloudera, Inc. All rights reserved. The Enterprise Platform for Data Science and Machine Learning The data is now here 30B CONNECTED DEVICES 440x MORE DATA Cloudera first to integrate Spark Modern Platform for Machine Learning and Advanced Analytics Leading adoption among enterprises 500Customers Run Spark on
  4. 4. 4© Cloudera, Inc. All rights reserved. Sample data science / machine learning workflow From data to exploration to action Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Visualization and Analysis Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceGovernance Processing Acquisition Reports, Dashboards
  5. 5. 5© Cloudera, Inc. All rights reserved. The good news Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Visualization and Analysis Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceGovernance Processing Acquisition Reports, Dashboards Data has never been more plentiful Open source data science and machine learning libraries are rapidly evolving Commodity (and on-demand) compute makes scalable production machine learning affordable
  6. 6. 6© Cloudera, Inc. All rights reserved. Poll: Which of the following languages/tools do you use? Python R Scala Spark MLlib H2O TensorFlow Other deep learning tool(s)
  7. 7. 7© Cloudera, Inc. All rights reserved. The bad news Data Engineering Data Science (Exploratory) Production (Operational) Data Wrangling Visualization and Analysis Model Training & Testing Production Data Pipelines Batch Scoring Online Scoring Serving Data GovernanceGovernance Processing Acquisition Reports, Dashboards Most data science done at small scale, individually, and is difficult to replicate Very few models reach production Teams have different, conflicting requests for languages & libraries Data needs to move across multiple different systems
  8. 8. 8© Cloudera, Inc. All rights reserved. Additional challenges Access For sensitive data, secure clusters are difficult to access. And IT typically doesn’t want random packages installed on a secure cluster. Popular open source tools don’t easily connect to these environments, or always support Hadoop data formats. Scale Laptops rarely have capacity for medium, let alone big data. This leads to a lot of sampling. Popular frameworks don’t easily parallelize on a cluster. Typically code has to get rewritten for production. Developer Experience Notebooks, while awesome, don’t easily support virtual environment and dependency management, especially for teams. This makes sharing and reproducibility hard. Notebooks are also challenging to “put into production.”
  9. 9. 9© Cloudera, Inc. All rights reserved. This year, our goal is to enable data science and machine learning at scale.
  10. 10. 10© Cloudera, Inc. All rights reserved. Open data science in the enterprise IT drive adoption while maintaining compliance Data Scientist explore, experiment, iterate
  11. 11. 11© Cloudera, Inc. All rights reserved. Our goal: An open platform for data science at scale Help more data scientists use the power of Hadoop Use a powerful, familiar environment with direct access to Hadoop data and compute Data Scientist Data Engineer Make it easy and secure to add new users, use cases Offer secure self-service analytics and a faster path to production on common, affordable infrastructure Enterprise Architect Hadoop Admin
  12. 12. 12© Cloudera, Inc. All rights reserved. Introducing Cloudera Data Science Workbench Self-service data science for the enterprise Accelerates data science from development to production with: • Secure self-service environments for data scientists to work against Cloudera clusters • Support for Python, R, and Scala, plus project dependency isolation for multiple library versions • Workflow automation, version control, collaboration and sharing
  13. 13. 13© Cloudera, Inc. All rights reserved. Demo
  14. 14. 14© Cloudera, Inc. All rights reserved. Data scientists can: • Use R, Python, or Scala from a web browser, with no desktop footprint • Install any library or framework within isolated project environments • Directly access data in secure clusters with Spark and Impala • Share insights with their team for reproducible, collaborative research • Automate and monitor data pipelines using built-in job scheduling IT can: • Give their data science team the freedom to work how they want, when they want • Stay compliant with out-of-the-box support for full platform security, especially Kerberos • Run on-premises or in the cloud, wherever data is managed With Cloudera Data Science Workbench…
  15. 15. 15© Cloudera, Inc. All rights reserved. Poll: Which of the following describes your production machine learning use case? Reports, dashboards, or notebooks Batch scoring or ETL Online scoring or model serving Streaming application
  16. 16. 16© Cloudera, Inc. All rights reserved. Solving Data Science is a Full-Stack Problem • Support unlimited data • Provide sufficient tools for Analysts • Provide sufficient tools for Data Scientists + Data Engineers • Enable real-time use cases • Provide data governance • Provide full-stack security • Deploy in the cloud • Integrate with partner tools • Be easy for IT to deploy/maintain ✓Hadoop ✓Impala, Hive, Hue ✓Spark, Data Science Workbench ✓Kafka, Spark Streaming ✓Navigator + Partners ✓Kerberos, Sentry, Record Service, KMS/KTS ✓Cloudera Director ✓Rich Ecosystem ✓Cloudera Manager + Director
  17. 17. 17© Cloudera, Inc. All rights reserved. The importance of an open ecosystem Open Ecosystem Black Box
  18. 18. 18© Cloudera, Inc. All rights reserved. Join us again for… April 20th A Visual Dive into Machine Learning and Deep Learning May 4th Models in Production: A Look From Beginning to End
  19. 19. 19© Cloudera, Inc. All rights reserved. Thank you! matt@cloudera.com tristanz@cloudera.com Continue the webinar series: http://go.cloudera.com/LP=1383
  20. 20. 20© Cloudera, Inc. All rights reserved.
  21. 21. 21© Cloudera, Inc. All rights reserved.
  22. 22. 22© Cloudera, Inc. All rights reserved.
  23. 23. 23© Cloudera, Inc. All rights reserved.
  24. 24. 24© Cloudera, Inc. All rights reserved.
  25. 25. 25© Cloudera, Inc. All rights reserved.
  26. 26. 26© Cloudera, Inc. All rights reserved.

×