SlideShare une entreprise Scribd logo
1  sur  12
Cloudera, Data Warehouse Optimisation 
Jérôme Campo, Systems Engineering 
MAY 2014
The Enterprise Data Warehouse 
SERVERS 
MARTS 
DW 
DOCUMENTS 
STORAGE 
SEARCH 
ARCHIVE 
ERP, CRM, RDBMS, MACHINES 
FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS 
EXTERNAL DATA SOURCES 
Complex Architecture 
•Many special-purposesystems, silos of data 
•Moving data around 
•No complete views 
4 
Visibility 
•Leaving data behind 
•Risk and compliance 
•High cost of storage 
1 
Time to Data 
•Up-front modeling 
•Transforms slow 
•Transforms lose data 
2 
Cost of Analytics 
•Existing systems strained 
•No agility 
•BI backlog 
3
Cloudera for the Enterprise Data Hub 
Multi-workload analytic platform 
•Bring applications to data 
•Combine different workloads on common data (i.e. SQL + Search) 
•True BI agility 
4 
Active archive 
•Full fidelity original data 
•Indefinite time, any source 
•Lowest cost storage 
1 
Data management, transforms 
•One source of data for all analytics 
•Persist state of transformed data 
•Significantly faster & cheaper 
2 
Self-service exploratory BI 
•Simple search + BI tools 
•“Schema on read” agility 
•Reduce BI user backlog requests 
3 
SERVERS 
MARTS 
DW 
DOCUMENTS 
STORAGE 
SEARCH 
ARCHIVE 
ERP, CRM, RDBMS, MACHINES 
FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS 
EXTERNAL DATA SOURCES
Cloudera for the Enterprise Data Hub
Cloudera for Data Warehouse optimisation
EDW optimisation: Active Archive 
6 
Archive datasets 
Infrequently accessed tables 
Large, corpus of data 
Frequency of data access 
Changing regulatory compliance requirements 
Data volume growth 
Data remains accessible 
Data is not lost 
1/10ththe cost 
What to Migrate 
Influencing Factors 
Better in Cloudera 
Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades 
Low-latency SQL processing, ability to absorb short-cycle ELT 
Broad support of leading data integration tools 
Only Available with Cloudera 
Key Partners
EDW optimisation: Transformation 
7 
High-scale batch data processing 
Implemented as SQL + scripting or ETL running on expensive HW infrastructure 
Staging data stored across diverse, temp tables 
High fraction of overall EDW utilization (25 –80%) 
Difficult to store, manage staging data in relational form 
Limited user adoption risk to migrate 
ETL tools to simplify migration 
Over 2X the performance 
1/10ththe cost 
Persistent staging, 
tracked lineage 
What to Migrate 
Influencing Factors 
Better in Cloudera 
Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades 
Low-latency SQL processing, ability to absorb short-cycle ELT 
Broad support of leading data integration tools 
Only Available with Cloudera 
Key Partners
EDW optimisation: Self Service BI 
8 
Self-Service BI, Exploratory BI, Data Discovery 
Uncertain business questions and uncertain data 
Fastest growing workload for many warehouses 
Comparable support for end user tools between Cloudera and DBMS products 
Schema flexibility 
End user self-service on full fidelity data 
1/10ththe cost 
Workload 
Migration Priority 
Better In Cloudera 
Open source parallel interactive SQL engine: Cloudera Impala 
Integration and certification of every leading SSBI vendor 
Only Available with Cloudera 
Key Partners
EDW optimisation: Multi-workload 
9 
Training & scoringpredictive models 
Deep and broad data sets, within and beyond the warehouse 
Statisticians want unconstrained analysis; limited DW compute resources 
Paying top dollar for warehouse data storage only to load into ML tools 
Inability to analyze data beyond the warehouse 
Greater user productivity(pre-packaged ML libraries, no more down-sampling) 
Support for 3rdparty ML tools 
Greater flexibility(SQL + MR + Search + Spark 
+ SAS procs) 
1/10ththe cost 
Workload and Data 
Influencing Factors 
Better in Cloudera 
Ability to run SAS, R natively on the same cluster 
Interactive search and SQL experience for data exploration 
Built-in analytics libraries (Mahout, DataFu, ClouderaML) Support from Cloudera’s Data Science team 
Only Available with Cloudera 
Key Partners
Why EDW optimisation? 
1.Lower costs of data management, allow growth 
2.Improve quality of service 
•Shorten ETL windows 
•Faster BI queries 
3.Extend existing warehouse capacity 
•Increase ROI from current investments 
•More operational data –volume and schemas 
•More business intelligence and analytics workloads 
4.Retain all data for more varied analysis 
5.Deliver a foundation for innovation 
•Bring more applications to Hadoop data for low incremental cost
Customers agree, Cloudera delivers 
Customer 
Workload 
Results 
Leading Payments Company 
Analytics, ETL Processing, DR 
Largest fraud discovery in firm history 
Time to report collapsedfrom 2 days => 2 hours 
Save $30M on DR 
Global Money Center Bank 
DataProcessing (ELT) 
Avoidedtens of millions in expansion purchases 
42% faster processing 
MobileDevice Manufacturer 
Data Processing (ELT) 
Offloaded 90% ofdata volume; keep all data 
Fortune500 Retailer 
Analytics 
Moreinsights by supporting more exploration of more extensive & granular data 
Leading Financial Regulator 
DataProcessing (ELT) and DR 
Shrank EDW footprint by 4PB, 20X perf. boost
BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

Contenu connexe

Tendances

Tendances (20)

Data warehousing
Data warehousingData warehousing
Data warehousing
 
Azure synapse by usama whaba khan
Azure synapse by usama whaba khanAzure synapse by usama whaba khan
Azure synapse by usama whaba khan
 
Full stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure MonitorFull stack monitoring across apps & infrastructure with Azure Monitor
Full stack monitoring across apps & infrastructure with Azure Monitor
 
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
Develop scalable analytical solutions with Azure Data Factory & Azure SQL Dat...
 
Azure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layerAzure SQL DB Managed Instances Built to easily modernize application data layer
Azure SQL DB Managed Instances Built to easily modernize application data layer
 
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
Sql pass summit
Sql pass summitSql pass summit
Sql pass summit
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Azure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data WarehouseAzure SQL Database & Azure SQL Data Warehouse
Azure SQL Database & Azure SQL Data Warehouse
 
Building a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with DatabricksBuilding a Turbo-fast Data Warehousing Platform with Databricks
Building a Turbo-fast Data Warehousing Platform with Databricks
 
Synapse for mere mortals
Synapse for mere mortalsSynapse for mere mortals
Synapse for mere mortals
 
Microsoft cloud big data strategy
Microsoft cloud big data strategyMicrosoft cloud big data strategy
Microsoft cloud big data strategy
 
Webinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data LakeWebinar - Introduction to Azure Data Lake
Webinar - Introduction to Azure Data Lake
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Afternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data ServicesAfternoons with Azure - Azure Data Services
Afternoons with Azure - Azure Data Services
 
Point of View to Accelerate with dev ops
Point of View to Accelerate with dev opsPoint of View to Accelerate with dev ops
Point of View to Accelerate with dev ops
 
Azure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data LakeAzure Lowlands: An intro to Azure Data Lake
Azure Lowlands: An intro to Azure Data Lake
 

En vedette

Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Cloudera, Inc.
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
skaluska
 

En vedette (6)

La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données  La plateforme OpenData 3.0 pour libérer et valoriser les données
La plateforme OpenData 3.0 pour libérer et valoriser les données
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
A poster version of HadoopXML
A poster version of HadoopXMLA poster version of HadoopXML
A poster version of HadoopXML
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Data Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on HadoopData Ingestion, Extraction & Parsing on Hadoop
Data Ingestion, Extraction & Parsing on Hadoop
 
Efficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in HadoopEfficient processing of large and complex XML documents in Hadoop
Efficient processing of large and complex XML documents in Hadoop
 

Similaire à BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
Cloudera, Inc.
 

Similaire à BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation (20)

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ... Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
Multidisziplinäre Analyseanwendungen auf einer gemeinsamen Datenplattform ers...
 
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution StrategyEnterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
Enterprise Hadoop is Here to Stay: Plan Your Evolution Strategy
 
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Complement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & HadoopComplement Your Existing Data Warehouse with Big Data & Hadoop
Complement Your Existing Data Warehouse with Big Data & Hadoop
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
AWS Big Data Solution Days
AWS Big Data Solution DaysAWS Big Data Solution Days
AWS Big Data Solution Days
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your MindDeliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
Deliver Best-in-Class HPC Cloud Solutions Without Losing Your Mind
 
How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?How to boost your datamanagement with Dremio ?
How to boost your datamanagement with Dremio ?
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 

Plus de Excelerate Systems

Plus de Excelerate Systems (18)

Sécurité Zéro Confiance
Sécurité Zéro ConfianceSécurité Zéro Confiance
Sécurité Zéro Confiance
 
Sécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de SécuritéSécurité Zéro Confiance - La Fin du Périmètre de Sécurité
Sécurité Zéro Confiance - La Fin du Périmètre de Sécurité
 
Zero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro ConfianceZero Trust Security / Sécurité Zéro Confiance
Zero Trust Security / Sécurité Zéro Confiance
 
Vision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data AccuracyVision-AI | the Next AI | the Next Disruption in Data Accuracy
Vision-AI | the Next AI | the Next Disruption in Data Accuracy
 
Plateforme DATA HUB / API
Plateforme DATA HUB / APIPlateforme DATA HUB / API
Plateforme DATA HUB / API
 
PECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLEPECTORIS -|- LA CLINIQUE VIRTUELLE
PECTORIS -|- LA CLINIQUE VIRTUELLE
 
Le Net pour Tou(te)s
Le Net pour Tou(te)sLe Net pour Tou(te)s
Le Net pour Tou(te)s
 
E santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigDataE santé - Entrez dans l'ère du BigData
E santé - Entrez dans l'ère du BigData
 
OpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergenceOpenData - BigData - OpenSource : l'inévitable convergence
OpenData - BigData - OpenSource : l'inévitable convergence
 
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numériqueBigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
BigDataBx #1 - Data Marketing, l'ère de l'intelligence numérique
 
BigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données PrivéesBigDataBx #1 - BigData et Protection de Données Privées
BigDataBx #1 - BigData et Protection de Données Privées
 
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
#BigDataBx 1 - Présentation de la BI au BigData - Solocal Group
 
BigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de BordeauxBigDataBx #1 - Journée BigData à la CCI de Bordeaux
BigDataBx #1 - Journée BigData à la CCI de Bordeaux
 
BigData on change d'ère !
BigData on change d'ère ! BigData on change d'ère !
BigData on change d'ère !
 
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
Enterprise Data Hub - La Clé de la Transformation de la Gestion de Données d'...
 
BigData BigBuzz @ Le Node
BigData BigBuzz @ Le Node BigData BigBuzz @ Le Node
BigData BigBuzz @ Le Node
 
BigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems FranceBigData & Cloud @ Excelerate Systems France
BigData & Cloud @ Excelerate Systems France
 
BigData en France par Excelerate Systems
BigData en France par Excelerate Systems BigData en France par Excelerate Systems
BigData en France par Excelerate Systems
 

Dernier

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

BigDataBx #1 - Atelier 1 Cloudera Datawarehouse Optimisation

  • 1. Cloudera, Data Warehouse Optimisation Jérôme Campo, Systems Engineering MAY 2014
  • 2. The Enterprise Data Warehouse SERVERS MARTS DW DOCUMENTS STORAGE SEARCH ARCHIVE ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES Complex Architecture •Many special-purposesystems, silos of data •Moving data around •No complete views 4 Visibility •Leaving data behind •Risk and compliance •High cost of storage 1 Time to Data •Up-front modeling •Transforms slow •Transforms lose data 2 Cost of Analytics •Existing systems strained •No agility •BI backlog 3
  • 3. Cloudera for the Enterprise Data Hub Multi-workload analytic platform •Bring applications to data •Combine different workloads on common data (i.e. SQL + Search) •True BI agility 4 Active archive •Full fidelity original data •Indefinite time, any source •Lowest cost storage 1 Data management, transforms •One source of data for all analytics •Persist state of transformed data •Significantly faster & cheaper 2 Self-service exploratory BI •Simple search + BI tools •“Schema on read” agility •Reduce BI user backlog requests 3 SERVERS MARTS DW DOCUMENTS STORAGE SEARCH ARCHIVE ERP, CRM, RDBMS, MACHINES FILES, IMAGES, VIDEOS, LOGS, CLICKSTREAMS EXTERNAL DATA SOURCES
  • 4. Cloudera for the Enterprise Data Hub
  • 5. Cloudera for Data Warehouse optimisation
  • 6. EDW optimisation: Active Archive 6 Archive datasets Infrequently accessed tables Large, corpus of data Frequency of data access Changing regulatory compliance requirements Data volume growth Data remains accessible Data is not lost 1/10ththe cost What to Migrate Influencing Factors Better in Cloudera Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades Low-latency SQL processing, ability to absorb short-cycle ELT Broad support of leading data integration tools Only Available with Cloudera Key Partners
  • 7. EDW optimisation: Transformation 7 High-scale batch data processing Implemented as SQL + scripting or ETL running on expensive HW infrastructure Staging data stored across diverse, temp tables High fraction of overall EDW utilization (25 –80%) Difficult to store, manage staging data in relational form Limited user adoption risk to migrate ETL tools to simplify migration Over 2X the performance 1/10ththe cost Persistent staging, tracked lineage What to Migrate Influencing Factors Better in Cloudera Reliability for mission-critical workloads: high availability, disaster recovery, downtime-less upgrades Low-latency SQL processing, ability to absorb short-cycle ELT Broad support of leading data integration tools Only Available with Cloudera Key Partners
  • 8. EDW optimisation: Self Service BI 8 Self-Service BI, Exploratory BI, Data Discovery Uncertain business questions and uncertain data Fastest growing workload for many warehouses Comparable support for end user tools between Cloudera and DBMS products Schema flexibility End user self-service on full fidelity data 1/10ththe cost Workload Migration Priority Better In Cloudera Open source parallel interactive SQL engine: Cloudera Impala Integration and certification of every leading SSBI vendor Only Available with Cloudera Key Partners
  • 9. EDW optimisation: Multi-workload 9 Training & scoringpredictive models Deep and broad data sets, within and beyond the warehouse Statisticians want unconstrained analysis; limited DW compute resources Paying top dollar for warehouse data storage only to load into ML tools Inability to analyze data beyond the warehouse Greater user productivity(pre-packaged ML libraries, no more down-sampling) Support for 3rdparty ML tools Greater flexibility(SQL + MR + Search + Spark + SAS procs) 1/10ththe cost Workload and Data Influencing Factors Better in Cloudera Ability to run SAS, R natively on the same cluster Interactive search and SQL experience for data exploration Built-in analytics libraries (Mahout, DataFu, ClouderaML) Support from Cloudera’s Data Science team Only Available with Cloudera Key Partners
  • 10. Why EDW optimisation? 1.Lower costs of data management, allow growth 2.Improve quality of service •Shorten ETL windows •Faster BI queries 3.Extend existing warehouse capacity •Increase ROI from current investments •More operational data –volume and schemas •More business intelligence and analytics workloads 4.Retain all data for more varied analysis 5.Deliver a foundation for innovation •Bring more applications to Hadoop data for low incremental cost
  • 11. Customers agree, Cloudera delivers Customer Workload Results Leading Payments Company Analytics, ETL Processing, DR Largest fraud discovery in firm history Time to report collapsedfrom 2 days => 2 hours Save $30M on DR Global Money Center Bank DataProcessing (ELT) Avoidedtens of millions in expansion purchases 42% faster processing MobileDevice Manufacturer Data Processing (ELT) Offloaded 90% ofdata volume; keep all data Fortune500 Retailer Analytics Moreinsights by supporting more exploration of more extensive & granular data Leading Financial Regulator DataProcessing (ELT) and DR Shrank EDW footprint by 4PB, 20X perf. boost