Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Multi-Tenant Operations with Cloudera 5.7 & BT

3 146 vues

Publié le

One benefit of Apache Hadoop is the ability to power multiple workloads, across many different users and departments, all within a single, shared cluster. Hear how BT is doing this today and learn about new features in Cloudera Manager to provide better visibility for multi-tenant operations.

Publié dans : Logiciels
  • How do you achieve strict physical isolation between datasets of different tenants. That is not solely rely on HDFS ACLs
       Répondre 
    Voulez-vous vraiment ?  Oui  Non
    Votre message apparaîtra ici

Multi-Tenant Operations with Cloudera 5.7 & BT

  1. 1. 1© Cloudera, Inc. All rights reserved. Multi-Tenant Operations with Cloudera Enterprise A look inside British Telecommunications Phill Radley | Chief Data Architect | BT Matt Schumpert | Director Product Management | Cloudera
  2. 2. 2© Cloudera, Inc. All rights reserved. What is Multi-Tenant Hadoop • Single General Purpose Hadoop Cluster • Multiple distinct user groups with code & data that need to be separated • Sharing storage (HDFS) & processing resources (cores & RAM) • Storage allocated with HDFS Quota • Compute managed with Fair Share Scheduler (at run time) • Mixed work loads storage only, batch & interactive processing • Typically On-Premise run by an in-house data centre team
  3. 3. 3© Cloudera, Inc. All rights reserved. Why Implement Multi-Tenant Hadoop • A single place for all raw enterprise data kept for as long as needed Universally popular concept in the business except for in Finance Target data sets the business will be interested in • Highly efficient use of Infrastructure • Allows small tenants access to big resources • Self-Service fast provisioning enabling fast project spin up • New Low unit cost makes old businesses cases viable (e.g. active archives) • Start small, with one or two small tenants, but plan for many more • E.g. find a struggling old batch applications & re-platform as an internal IT project • Once platform up and running go after a high profile flagship tenant
  4. 4. 4© Cloudera, Inc. All rights reserved. Platform as a Service – Hadoop as a Service Target Users • Application developers, testers & production • Business Analysts/Data Scientists wanting access to live data Service specification • HaaS Version 1.0, change control & roadmap • Features (e.g. HDFS(httpFS/NFS/API)  Map/Reduce  HUE  PIG Hive  Hbase  Search  ) Service Management • Ordering form & process, Helpdesk • Service Manager, Capacity Manager
  5. 5. 5© Cloudera, Inc. All rights reserved. Security & Governance • Tenant data privacy • Microsoft Active Directory integration with Kerberos • All user groups & accounts managed in AD • HDFS Encryption Zones • Data governance to control data sharing • Identified data stewards who approve creation of shared views and grants • Security Logging & Reporting
  6. 6. 6© Cloudera, Inc. All rights reserved. The genesis of HaaS Research & Innovation Adastral Park Business HQ London
  7. 7. 7© Cloudera, Inc. All rights reserved. From Hadoop to HaaS • Standing up a cluster is straightforward • Buy Hadoop optimized servers (lots of local disk) The unit cost is a fraction of a typical private cloud • Install Linux (integrate with Active Directory/Kerberos) • Use Cloudera Manager to create cluster • Decide what services to offer based on the pipeline of tenant workloads. • Feb 2014 HaaS R1: was a “minimum viable product” • Storage + Batch Compute (M/R) + UI (Hue) + Kerberos • Oct 2015 HaaS R2: Added interactive SQL use • Impala + Sqoop + Sentry • Aug 2016 HaaS R3: In Memory • Spark + Second site + Search…
  8. 8. 8© Cloudera, Inc. All rights reserved. HAAS A AP 00307_12126 Microsoft Active Directory Groups What is a HaaS Tenant? • A tenant is synonymous with a HaaS Service instance 1. An identifying Group in Active Directory 2. A set of Hadoop resources owned by the Group • HDFS Quota • YARN Resource Pool • Hive database • ( + other options e.g. Flume port/agent, + data wrangling tool) • All services are accessed through common access points Service ID: HAAS A AP 00307_12126       DFLT QUOTA 500GB   Pig Hql java Hive Database HAASA AP 00307_12126 Table 1 View 1 Q Table 2 View 2 YARN Resource Pool HAASA AP 00307_12126 HDFS Storage /user/HAASA AP 00307_12126 HaaS Service Instance Admin (e.g. developer, data scientist) Hadoop Platform Admin service request Provisioning script “Welcome to HaaS” CLUSTER SERVICE TYPE SERVICE NO. BUS. APP. ID
  9. 9. 9© Cloudera, Inc. All rights reserved. HaaS Tenant Reporting BT has developed a range of supporting tools & training materials to help on-board tenants and monitor the service For example the provisioning script and weekly HDFS capacity reports: One Project: NAD multiple services Service 123=prod Service 153=test P for Production T for Test D for Dev
  10. 10. 10© Cloudera, Inc. All rights reserved. e.g. HAASAAP0067_05038: CMF Customer Master File 1 Pre-Load CSS COSMOSS DISE BTC C2B Antillia Glossi Cyclone Phoenix Radianz Siebel OV Siebel OS “Customer Master file (CMF) ” • A 10 year old batch app needing to re-platform (2014) • Data from 12 Source systems merged with D&B Legal Entities used as Reference Data • Existing SQL modules ported to HQL+PIG Benefits • Business able to do multiple runs in a day (15x faster) • Adding new sources is quicker (schema on read) • Data available for Self-Service Teams (DQ/Data Science)              HAASAAP0067_05038 OLD CMF DBStaging Source Systems 2 Load 3 Match / De-Dupe 4 Key Gen 5 Business Rule 6 Publish 7 Post Load CMF Reference Data
  11. 11. 11© Cloudera, Inc. All rights reserved. HAASA AP 00101_2029 Faults 4369 Orders 3531 CRM 2029  Three existing business applications (CRM, Orders, Faults) extended into HaaS  RDBMS Customer Table RDBMS Orders Table RDBMS Faults Table T_CustomerHive DB HAASA AP 00101_2029 sqoop V_Customer HAASA AP 00202_3531 T_OrdersHive DB HAASA AP 0202_3531 sqoop V_Orders HAASA AP 00303_4369 T_FaultsHive DB HAASA AP 0303_4369 sqoop V_Faults Business Data Stewards Business Analysts / Data Scientists  CRM  Orders  Faults Target for Self-Service Data Access using HaaS 1. Browse & select data 2. Get Steward Approval 3. Create VIEWs & GRANTs 4. Select/join Views Data Catalogue • Self-service workflow-driven access to any table on any system (contrast with design/develop legacy warehouse approach) • Option to add homomorphic encryption to any table to anonymize PII data to further reduce risk
  12. 12. 12© Cloudera, Inc. All rights reserved. Cloudera Manager 5.7 Easier Multi-Tenant Operations
  13. 13. 13© Cloudera, Inc. All rights reserved. Major Enablers of Multi-Tenancy in Cloudera Manager • Dynamic Resource Pools • Cluster Utilization Reporting • HDFS Usage Reports
  14. 14. 14© Cloudera, Inc. All rights reserved. Dynamic Resource Pools Define Tenants! • Hierarchical buckets that • Express prioritization • Protect fixed capacity • Create sensible guardrails
  15. 15. 15© Cloudera, Inc. All rights reserved. Dynamic Resource Pools Define Tenants! • Hierarchical buckets that • Express prioritization • Protect fixed capacity • Create sensible guardrails • Make an admins’ life easy with • User/group-based creation • ACLs • Automatic preemption • Rotating service windows
  16. 16. 16© Cloudera, Inc. All rights reserved. Dynamic Resource Pools Configuration
  17. 17. 17© Cloudera, Inc. All rights reserved. Roadmap: Dynamic Resource Pools • Automatic user/group-based job placement under a tenant’s pool
  18. 18. 18© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting BI Marketing Engineering
  19. 19. 19© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting Usage Data Resource Allocations BI Marketing Engineering
  20. 20. 20© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting Usage Data Resource Allocations Report BI Marketing Engineering • Configurable Time Window • Tenant Aggregation View • User Aggregation View
  21. 21. 21© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting Usage Data Resource Allocations Report BI Marketing Engineering • “How much CPU & memory did each tenant use?” • “I set up fair scheduler. Did each of my tenants get their fair share?” • “Which tenants had to wait the longest for their applications to get resources? • “Which tenants asked for the most memory but used the least?” • “When do I need to add nodes to my cluster?” • Configurable Time Window • Tenant Aggregation View • User Aggregation View
  22. 22. 22© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting
  23. 23. 23© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting
  24. 24. 24© Cloudera, Inc. All rights reserved. Cluster Utilization Reporting
  25. 25. 25© Cloudera, Inc. All rights reserved. Roadmap: Cluster Utilization Reporting • Container Allocation Latency • A definitive wait metric for each bit of YARN workload • Support for more components • HDFS, HBase, Search, etc • Support additional metrics • Disk I/O, Network I/O • Add additional tools to existing metrics: • Showback/chargeback: associate $$ with resource usage • Capacity planning: trend lines • DBA tools: identify/flag rogue queries (Hive, Impala, HBase) • Workload management: tag critical apps with SLAs
  26. 26. 26© Cloudera, Inc. All rights reserved. HDFS Usage Reports • Recently revamped based on known HaaS implementations • Drill-down by user/tenant to do housecleaning
  27. 27. 27© Cloudera, Inc. All rights reserved. More Information & Next Steps Get Started • Download C5.7: www.cloudera.com/downloads Release Notes • www.cloudera.com/documentation/ enterprise/latest/topics/rg_release_ notes.html Training Classes • university.cloudera.com Check out Cloudera Manager Demo Videos at go.cloudera.com/hadoop- demo-cm1
  28. 28. 28© Cloudera, Inc. All rights reserved. Questions?

×