Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Optimize your cloud strategy for machine learning and analytics

326 vues

Publié le

Join industry superstars Mike Olson (Cloudera CSO and co-founder) and Jim Curtis (451 Research senior analyst) as they outline the best practices for cloud-based machine learning and analytics in this “can’t miss” webinar.
Hot topics include:
Why enterprises are moving their analytics to the public cloud
How to select the best cloud deployment model
Design tricks that make cloud economics work
Success stories, cautionary tales, and lessons learned
James will share 451 Research findings and offer insights learned from surveying both the vendor landscape and enterprise practitioners.
.
Mike will regale you with his vision for the future of multi-disciplinary machine learning and analytics in hybrid- and multi-cloud environments
3 things to learn:
Why enterprises are moving their analytics to the public cloud
How to select the best cloud deployment model
Design tricks that make cloud economics work

Publié dans : Technologie
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

Optimize your cloud strategy for machine learning and analytics

  1. 1. 1© Cloudera, Inc. All rights reserved. Optimize your cloud strategy for machine learning and analytics Mike Olson CSO co-founder, Cloudera James Curtis Senior analyst, 451 Research
  2. 2. 2© Cloudera, Inc. All rights reserved. Optimizing Cloud Strategy for ML & Analytics James Curtis, Senior Analyst, Data Platforms & Analytics
  3. 3. 3© Cloudera, Inc. All rights reserved. 3 451 Research is a leading IT research & advisory company Founded in 2000 300+ employees, including over 120 analysts 2,000+ clients: Technology & Service providers, corporate advisory, finance, professional services, and IT decision makers 70,000+ IT professionals, business users and consumers in our research community Over 52 million data points published each quarter and 4,500+ reports published each year 3,000+ technology & service providers under coverage 451 Research and its sister company, Uptime Institute, are the two divisions of The 451 Group Headquartered in New York City, with offices in London, Boston, San Francisco, Washington DC, Mexico, Costa Rica, Brazil, Spain, UAE, Russia, Taiwan, Singapore and Malaysia Research & Data Advisory Events Go 2 Market
  4. 4. 4© Cloudera, Inc. All rights reserved. 4 “Every morning in Africa, an antelope wakes up. It knows it must outrun the fastest lion, or it will be killed. Every morning in Africa, a lion wakes up. It knows it must run faster than the antelope, or it will starve....”
  5. 5. 5© Cloudera, Inc. All rights reserved. 5 “…It doesn’t matter if you’re a lion or an antelope—when the sun comes up, you’d better be running.”
  6. 6. 6© Cloudera, Inc. All rights reserved. 6 Approximately what percent of your workloads are deployed in the following environment today? In 2 years? Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017.
  7. 7. 7© Cloudera, Inc. All rights reserved. 7 Thinking of all applications your organization runs, what percentage run in which environments? In 2 years? Key Points  Cloud deployments will be the dominant environment in every category  Every cloud deployment environment will see increases in every workload category  Analytics and App Development Development areas expected strong gains Source: 451 Research, Voice of the Enterprise: Workloads and Key Projects, Cloud Transformation, 2017.
  8. 8. 8© Cloudera, Inc. All rights reserved. 8 8 Drivers for Cloud Adoption DATA GRAVITY TRANSFORMATIONAL CHANGE IT REJUVENATION FLEXIBILITY COST AVOIDANCE
  9. 9. 9© Cloudera, Inc. All rights reserved. Challenges for Cloud Adoption 9 COST PEOPLE AND PROCESS CHANGE PERFORMANCE LIABILITY SECURITY ISSUES (PERCEIVED AND REAL)
  10. 10. 10© Cloudera, Inc. All rights reserved. 10 Big Data Cloud Segmentation USER CSP/VENDOR Responsibility Low High IaaS PaaS SaaS
  11. 11. 11© Cloudera, Inc. All rights reserved. 11 Big Data Cloud Segmentation USER CSP/VENDOR Responsibility Low High IaaS PaaS SaaS Depending on the segment, the responsibilities born by the user and CSP/vendor vary considerably.
  12. 12. 12© Cloudera, Inc. All rights reserved. 12 Big Data on Infrastructure-as-a-Service USER CSP/VENDOR IaaS PaaS SaaS  Provides cloud infrastructure  Configures environment  Selects resources  Develops jobs  Adopts financial risk IaaS DEPLOYMENT OPTIONS  Manually deployed  Marketplace image  On bare metal WHEN IT MAKES SENSE  Control over the environment is required  Customized or specialized use cases  Procurement is a barrier
  13. 13. 13© Cloudera, Inc. All rights reserved. 13 Big Data on Infrastructure-as-a-Service + PLUS USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS DEPLOYMENT OPTIONS  CSP/vendor-specific tools for deploying and configuring the environment WHEN IT MAKES SENSE  Control over the environment is required  Customized or specialized use cases  Procurement is a barrier
  14. 14. 14© Cloudera, Inc. All rights reserved. 14 Big Data-as-a-Service USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS AVAILABILITY  By CSP (single cloud)  By vendor that leverages cloud infrastructure on behalf of user  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity -as-a-Service WHEN IT MAKES SENSE  Full control not required  Limited resources or don’t want responsibility for certain tasks  Alignment with a service provider
  15. 15. 15© Cloudera, Inc. All rights reserved. 15 Managed Big Data Services USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity -as-a-Service  Develops jobs and manages workloads  Provides cloud infrastructure  Masks complexity  Configures  Adopts financial risk Managed Service CONSIDERATIONS  Processing engines, features, and capabilities can vary  Professional services optional  Pricing varies WHEN IT MAKES SENSE  Focus is on the job instead of infrastructure  The ‘managed’ services serve the organization  Resources are not available or organization not willing to invest is in-house skills
  16. 16. 16© Cloudera, Inc. All rights reserved. 16 Managed Big Data Services USER CSP/VENDOR IaaS PaaS SaaS  Automated tools config/resources  Provides cloud infrastructure  Aided/Configures environment  Aided/Selects resources  Develops jobs  Adopts financial risk IaaS PLUS  Configures environment  Develops jobs  Adopts financial risk  Provides cloud infrastructure  Masks complexity  Develops jobs and manages workloads  Perform data processing and analysis  Provides cloud infrastructure  Masks complexity  Configures  Adopts financial risk  Provides cloud infrastructure  Masks complexity  Configures environment  Adopts financial risk -as-a-Service Managed Service Managed Proc. WHERE ARE WE HEADED?  Processing frameworks, engines, databases do not matter to organizations  Automation, advanced methods leveraging machine learning will be integrated  Focus on an ‘outcome’ desired by the organization  User base can be expanded as complexity is abstracted out of the system
  17. 17. 17© Cloudera, Inc. All rights reserved. 17 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Vs
  18. 18. 18© Cloudera, Inc. All rights reserved. 18 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Cost • It costs significantly less to store data in S3 (to use AWS as an example) than HDFS running on EC2 • HDFS requires storing three copies of each block of data for resiliency • S3 offers automated backups and file compression • Users only pay for the compute resources they consume as and when they analyze the data.
  19. 19. 19© Cloudera, Inc. All rights reserved. 19 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Scalability • HDFS relies on local storage • HDFS in the cloud requires manual configuration and management of associated storage. • Cloud storage is designed to automatically scale as more data is added, without any direct user involvement.
  20. 20. 20© Cloudera, Inc. All rights reserved. 20 Why not just use HDFS, either on Amazon EC2 or Azure VMs or via Hadoop as a cloud service? Big data analytics in the cloud Durability and persistence • Data is persisted in EC2 storage instances only for the life of the instance itself, whereas data is always persisted in S3. • If you’re running HDFS on EC2, it’s highly likely that you’ll be storing the data in a persistent data store like S3 anyway and moving it to and fro for the purposes of analysis. • S3 is also designed to deliver durability of 99.999999999%, which would be hard for even the most highly skilled Hadoop administrator to match.
  21. 21. 21© Cloudera, Inc. All rights reserved. 21 ARTIFICIAL INTELLIGENCE The quest to build software running on machines that can ‘think’ and act like humans MACHINE LEARNING A subset of artificial intelligence focused on using algorithms that learn and improve without being explicitly programmed to do so DEEP LEARNING A branch of machine learning based on specific set of algorithms that attempt to mimic the human brain in the form of multi- layered neural networks
  22. 22. 22© Cloudera, Inc. All rights reserved. ML and the Cloud Success in ML depends on a combination of data, algorithms, skills and compute resources. While the public cloud is by no means essential for ML, low-cost storage and compute services enable storing and processing data at larger volumes. Deep learning – which typically involves modeling many layers of neural networks and, thus, is highly resource-intensive – particularly benefits from recent computing advancements and increasing comfort levels with the cloud. 22
  23. 23. 23© Cloudera, Inc. All rights reserved. • Organizations succeed when they match their cloud requirements with their resources in selecting their cloud deployment preferences, thus enabling the strengths of the user and CSP/vendor. • The journey to the cloud requires significant planning, but the goal remains the same and that is to better manage and leverage the data. The cloud is a means to that end. • Carrying out analytics in the cloud works best when organizations utilize the infrastructure advantages of the cloud such as the ability to scale and secure large amounts of compute and storage, especially for ML. Some final thoughts to consider 23
  24. 24. 24© Cloudera, Inc. All rights reserved. 24 Thank you james.curtis@451research.com @jmscrts www.451research.com
  25. 25. 25© Cloudera, Inc. All rights reserved. Cloudera & The Cloud Mike Olson CSO co-founder, Cloudera
  26. 26. 26© Cloudera, Inc. All rights reserved. My organization is moving to the cloud, why should we consider Cloudera?
  27. 27. 27© Cloudera, Inc. All rights reserved. Current State
  28. 28. 28© Cloudera, Inc. All rights reserved. Instant, self-service access to data and IT resources Application performance Job-oriented tools Choice and integration Secure, controlled provisioning of data and IT resources Predictable infrastructure costs Systems-oriented tools Standardization and portability KNOWLEDGE WORKERS INFRASTRUCTURE TEAM Stakeholders
  29. 29. 29© Cloudera, Inc. All rights reserved. –+ • Speed of deployment • Tenant isolation • Self-service • Workload elasticity • Shared storage • Pay-as-you-go • Bring your own tools • Bring your own data • Powerful network • Proliferation of data copies • Multiple security frameworks • Difficult to troubleshoot workloads • No shared metadata • Unable to track data lineage • Disjointed services • Few on-premise integration services • Proprietary services • Cloud lock-in CLOUD BENEFITS CLOUD SETBACK S
  30. 30. 30© Cloudera, Inc. All rights reserved. Deployment Options ON- PREMISE CLOUD INFRASTRUCTURE SERVICES PRIVATE CLOUDBARE METAL
  31. 31. 31© Cloudera, Inc. All rights reserved. Deployment model choices Bare Metal Private Cloud IaaS PaaS Applications Applications Applications Applications Clusters Clusters Clusters Clusters Operating System Operating System Operating System Operating System Network Network Network Network Storage Storage Storage Storage Servers Servers Servers Servers Customer managed Vendor managed
  32. 32. 32© Cloudera, Inc. All rights reserved. Traditional applications 32 Data Exploration STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG SQL & BI Analytics STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Operational Real-Time DB STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG ETL & Data Processing STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Custom Functions STORAGE SECURITY GOVERNANCE WORKLOAD MGMT INGEST & REPLICATION DATA CATALOG Many data silos, each with its own proprietary tools and infrastructure Different vendors, products, and services on-premises versus in cloud A fragmented approach is difficult, expensive, and risky
  33. 33. 33© Cloudera, Inc. All rights reserved. The Answer
  34. 34. 34© Cloudera, Inc. All rights reserved. ● The modern platform for machine learning and analytics ● with multiple deployment options ● and one shared data experience
  35. 35. 35© Cloudera, Inc. All rights reserved. One platform. Multiple workloads DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE DATA PROCESSING • Cost efficient • Reliable • Scalable • Based on Spark, MapReduce, Hive & Pig • Supported by Workload Analytics FAST BI & SQL • Flexibilty • Elastic scale • Go beyond SQL • Based on Impala & Hive • SQL dev enviro • Supported by Workload Analytics MACHINE LEARNING • Fast dev to production • Secure self-serve • Based on Python, R, and Spark • ML dev environment (CDSW) ONLINE & REAL-TIME • High throughput, low latency • Strongly consistent • Based on Hbase, Kudu & Spark streaming
  36. 36. 36© Cloudera, Inc. All rights reserved. Multiple deployment options OPERATIONAL DATABASE DATA SCIENCE ANALYTIC DATABASE DATA ENGINEERING DATA ENGINEERING ANALYTIC DATABASE PRIVATE CLOUD BARE METAL INFRASTRUCTURE SERVICES (in beta soon)
  37. 37. 37© Cloudera, Inc. All rights reserved. • Shared catalog • Unified security • Consistent governance • Easy workload management • Flexible ingest and replication Open platform services Built for multi-function analytics | Optimized for cloud
  38. 38. 38© Cloudera, Inc. All rights reserved. 38 The modern platform for machine learning and analytics optimized for the cloud DATA CATALOG SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION EXTENSIBLE SERVICES CORE SERVICES DATA ENGINEERING OPERATIONAL DATABASE ANALYTIC DATABASE DATA SCIENCE S3 ADLS HDFS KUDU STORAGE SERVICES Cloudera Enterprise PRIVATE CLOUDBARE METAL INFRASTRUCTURE DEPLOYMENT OPTIONS SERVICES
  39. 39. 39© Cloudera, Inc. All rights reserved. Run anywhere. Deploy any way. Simple Unified Enterprise Proven at scale Trusted security Hybrid or multi cloud Platform-as-a-Service Simplifies operations Works with your tools
  40. 40. 40© Cloudera, Inc. All rights reserved. "Better to bet on cloud providers for infrastructure, Cloudera for data, compute and security fabric, and leave the rest to the ecosystem" --- Sean Owen, Director, Data Science at Cloudera
  41. 41. 41© Cloudera, Inc. All rights reserved. Thank you Mike Olson @mikeolson

×