SlideShare une entreprise Scribd logo
1  sur  8
Optimize Your Data Warehouse for Improved
           ROI & Data Utilization
Making the Decision

       Investments

       Opportunities

       Constraints



2
Considering the Return-on-Byte

                                 Accessibility

                                 Density

                                 Equality



3
Optimizing Investments
     Enterprise
    Applications
                          Data Warehouse
                               Query


     OLTP          ETL

                         Cloudera
                             Transform       Business
                                           Intelligence
                              Query
                               Store
4
Key Takeaways

     All data has value

     Storage & compute together

     Right tool for the job



15
Starting Point – Consolidate your ILM
                                  Select a key
                                  historical data set
                                  Create a staging
                                  area environment
                                  Process and push
                                  into existing systems


16
Questions?
Thank You!
cloudera.com/clouderasessions

Contenu connexe

Tendances

NetApp Clustered Data ONTAP Operating System and OnCommand Insight - Performance
NetApp Clustered Data ONTAP Operating System and OnCommand Insight - PerformanceNetApp Clustered Data ONTAP Operating System and OnCommand Insight - Performance
NetApp Clustered Data ONTAP Operating System and OnCommand Insight - PerformanceNetApp
 
Clouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereClouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereAvere Systems
 
Better Business in a Flash
Better Business in a FlashBetter Business in a Flash
Better Business in a FlashNetApp
 
Software Defined Infrastructure
Software Defined InfrastructureSoftware Defined Infrastructure
Software Defined Infrastructureinside-BigData.com
 
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerStorage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerHitachi Vantara
 
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud Environments
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud EnvironmentsSlides: Get Breakthrough Efficiency in Virtual and Private Cloud Environments
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud EnvironmentsNetApp
 
10 Good Reasons: NetApp OnCommand Insight
10 Good Reasons: NetApp OnCommand Insight10 Good Reasons: NetApp OnCommand Insight
10 Good Reasons: NetApp OnCommand InsightNetApp
 
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ..."ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...Hitachi Vantara
 
Advantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPAdvantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPHitachi Vantara
 
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012Gigaom
 
10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDINetApp
 
Accelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise StorageAccelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise StorageHitachi Vantara
 
Big Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark DataBig Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark DataHitachi Vantara
 
EMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioEMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioNetApp
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperHitachi Vantara
 
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will WinWebinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will WinStorage Switzerland
 
Hitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Vantara
 
Introduction to APTARE
Introduction to APTAREIntroduction to APTARE
Introduction to APTARENigel Houghton
 

Tendances (19)

NetApp Clustered Data ONTAP Operating System and OnCommand Insight - Performance
NetApp Clustered Data ONTAP Operating System and OnCommand Insight - PerformanceNetApp Clustered Data ONTAP Operating System and OnCommand Insight - Performance
NetApp Clustered Data ONTAP Operating System and OnCommand Insight - Performance
 
Cohesity-One-Pager
Cohesity-One-PagerCohesity-One-Pager
Cohesity-One-Pager
 
Clouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & AvereClouds in Your Coffee Session with Cleversafe & Avere
Clouds in Your Coffee Session with Cleversafe & Avere
 
Better Business in a Flash
Better Business in a FlashBetter Business in a Flash
Better Business in a Flash
 
Software Defined Infrastructure
Software Defined InfrastructureSoftware Defined Infrastructure
Software Defined Infrastructure
 
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business EnablerStorage Analytics: Transform Storage Infrastructure Into a Business Enabler
Storage Analytics: Transform Storage Infrastructure Into a Business Enabler
 
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud Environments
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud EnvironmentsSlides: Get Breakthrough Efficiency in Virtual and Private Cloud Environments
Slides: Get Breakthrough Efficiency in Virtual and Private Cloud Environments
 
10 Good Reasons: NetApp OnCommand Insight
10 Good Reasons: NetApp OnCommand Insight10 Good Reasons: NetApp OnCommand Insight
10 Good Reasons: NetApp OnCommand Insight
 
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ..."ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...
"ESG Whitepaper: Hitachi Data Systems VSP G1000: - Pushing the Functionality ...
 
Advantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSPAdvantages of Mainframe Replication With Hitachi VSP
Advantages of Mainframe Replication With Hitachi VSP
 
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012
SPONSORED WORKSHOP by Cleversafe from Structure:Data 2012
 
10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI10 Reasons to Choose NetApp for EUC/VDI
10 Reasons to Choose NetApp for EUC/VDI
 
Accelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise StorageAccelerate the Business Value of Enterprise Storage
Accelerate the Business Value of Enterprise Storage
 
Big Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark DataBig Data – Shining the Light on Enterprise Dark Data
Big Data – Shining the Light on Enterprise Dark Data
 
EMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized PortfolioEMEA TechTalk – The NetApp Flash Optimized Portfolio
EMEA TechTalk – The NetApp Flash Optimized Portfolio
 
Solve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White PaperSolve the Top 6 Enterprise Storage Issues White Paper
Solve the Top 6 Enterprise Storage Issues White Paper
 
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will WinWebinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
Webinar: NAS vs. Object Storage: 10 Reasons Why Object Storage Will Win
 
Hitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- DatasheetHitachi Unified Storage VM Flash -- Datasheet
Hitachi Unified Storage VM Flash -- Datasheet
 
Introduction to APTARE
Introduction to APTAREIntroduction to APTARE
Introduction to APTARE
 

Similaire à Cloudera Sessions - Optimize Your Data Warehouse

Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dworacle content
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudGlobal Business Events
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...DATAVERSITY
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationZaloni
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudPrecisely
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...Cloudera, Inc.
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Empowered Holdings, LLC
 
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...InSync2011
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Mauricio Godoy
 
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...miguelnoronha
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Cana Ko
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day Peter Ward
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1UGIF
 

Similaire à Cloudera Sessions - Optimize Your Data Warehouse (20)

Oracle: Fundamental Of DW
Oracle: Fundamental Of DWOracle: Fundamental Of DW
Oracle: Fundamental Of DW
 
Oracle: Fundamental Of Dw
Oracle: Fundamental Of DwOracle: Fundamental Of Dw
Oracle: Fundamental Of Dw
 
Netapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your CloudNetapp - An Agile Data Infrastructure to Power Your Cloud
Netapp - An Agile Data Infrastructure to Power Your Cloud
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Strata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma PresentationStrata San Jose 2017 - Ben Sharma Presentation
Strata San Jose 2017 - Ben Sharma Presentation
 
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the CloudFoundational Strategies for Trusted Data: Getting Your Data to the Cloud
Foundational Strategies for Trusted Data: Getting Your Data to the Cloud
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
 
Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012Introduction To Data Vault - DAMA Oregon 2012
Introduction To Data Vault - DAMA Oregon 2012
 
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
E-Business Suite 2 _ Ben Davis _ Achieving outstanding optim data management ...
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
 
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...
Why Oracle on IBM POWER7 is Better Than Oracle Exadata - The Advantages of IB...
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831Talk IT_ Oracle_김태완_110831
Talk IT_ Oracle_김태완_110831
 
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
MongoDB IoT City Tour LONDON: Hadoop and the future of data management. By, M...
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
Ugif 04 2011 france ug04042011-jroy_part1
Ugif 04 2011   france ug04042011-jroy_part1Ugif 04 2011   france ug04042011-jroy_part1
Ugif 04 2011 france ug04042011-jroy_part1
 

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Cloudera Sessions - Optimize Your Data Warehouse

Notes de l'éditeur

  1. IN THIS SESSION, WE WILL EXPLORE USING HADOOP TO ADDRESS QUESTIONS AND ISSUES SURROUNDING * Cost of storage * Value of accessibility * Getting maximum return on your IT investments and all of your data
  2. WHAT CONSTITUTES GOOD ROI FOR YOUR DATA?MIGHT HELP TO ASK YOURSELF SOME QUESTIONS: * Able to use all the data or the data you want in BI and analysis? (current & historical data) * Able to take on new projects without fear of impacting the performance of existing processes & reports? * Are you concerned about maintaining a satisfactory data retention strategy in the face of uncertain data value?* If any of these questions make you stop and think, then you might not be using your data to its maximum potential. SO WHAT IS THE COMMON SOLUTION TO PROVIDE SOME RELIEF TO THESE ISSUES? *Typically its to make more space within existing analytical systems * Means "archiving" data to mediums that are more cost-effective, but lack analytical capabilities (tape, filers)BUT WHAT DOES THIS MEAN FOR YOUR BUSINESS? * It means that the data is unavailable for analysis & reporting * Retrieval is troublesome & in the case of tape, the data has essentially been thrown away * But what is the opportunity lost with this data? Today, tomorrow, a year from now?RETURNS TO CORE ISSUE FACING BUSINESS TODAY:*Are you knowingly working with constrained or limited information when making business decisions?THS ISSUE IS PARTICULARLY ACUTE WHEN DEALING WITH OUTLIER & ANOMOLY DETECTION EXERCISES LIKE FRAUD ANALYSIS & ADVERSE DRUG EVENTS * Sampling is a powerful tool for data scientists * But if you have to sample just so you can run your analyses w/in your system requirements * What data points are going missing?THIS ISSUE ONLY STANDS TO BECOME MORE PREVALENT * As data points from expanding instrumentation become more valuable (in aggregate or as singular entities) * Or as "abundance of data" requirements feed this new era of Exploration * Both of these macro trends will exert tremendous influence on IT infrastructuresIN SHORT, IT ORGANIZATIONS MUST KEEP ACCESSIBLE GROWING AMOUNTS OF DATA *w/ potential, if not questionable, value * lest they close the door on a discovery of a significant windfall or failure
  3. SO WHY NOT JUST KEEP THE DATA AROUND IN A MORE ACCESSIBLE FORM? * Typically means storing in a data warehouseDATA WAREHOUSES ARE GOOD AT STORING & RETRIEVING DATA AS LONG AS*the access patterns are understood & can be effectively modeled * the data has a high degree of intrinsic value * cost model makes poor economics for data that isn't particularly valuable in & of itselfWE LIKE TO CHARACTERIZE THIS AS “DATA DENSITY” * the value of each individual piece of data compared to the cost of storing it * This "return on byte" metric varies depending on organization, stage of the lifecycle and other factors - not a static equationAND, IMPORTANTLY, THE VALUE OF LOW DENSITY DATA CAN INCREASE TREMENDOUSLY WHEN VIEWED IN LARGE, AGGREGATE AMOUNTS * Distillation can yield extremely high value signals and metrics * Ex: ATM transactions * These types of examples are prolific now with the rise of instrumentation and explorationTHE PROBLEM IS THAT NOT ALL DATA IS CREATED EQUAL OR SHOULD “FLY FIRST CLASS” * Some is immediately and clearly valuable = candidate for the data warehouse * Other data is worth more than being banished to tape* But where do you put it so you can make sure it's usable when needed?
  4. IN THIS CONTEXT* Where to put data that has value, but isn't valuable enough for the data warehouse * Hadoop offers a compelling alternative to offline archives and SAN/NAS systems for storing low density dataIN THIS FORM, HADOOP COMPLEMENTS THE TRADITIONAL DATA WAREHOUSE * Optimizes data alignment across the 2 systems * According to value density and the businesses' analysis needsAT THIS POINT, I’M GOING TO ASK TED MALASKA TO JOIN ME ON STAGE * talk about our experiences w/ customers taking this approach to data management.
  5. A financial regulatory body saves tens of millions with a more efficient disaster recovery solution that also offers 20X faster data processing performance. The Challenge:(PRIMARY) Data reliability requirements mandate 7 years’ storage + replication between production & DR(PRIMARY) 40% annual data volume growth; 850TB collected each year from every Wall Street trade80% of firm’s costs are IT-relatedThe SolutionCloudera Enterprise replaces Greenplum + SAN for DR, starting with 2 years’ data4-5PB on CDH by end of 2013--++--++--Link to account record in SFDC (valid for Cloudera employees only): https://na6.salesforce.com/0018000000pvwoy?srPos=0&srKp=001A financial regulatory body saves tens of millions with a more efficient disaster recovery solution that also offers 20X faster data processing performance. Background: A large regulatory body has a data reliability requirement to store 7 years of historical data, and to replicate that data between their production and disaster recovery environments. Meanwhile, the firm’s data volumes are growing 40% every year -- they collect 850TB each year from every Wall Street trade. Challenge: Recognizing that 80% of the firm’s costs were IT-related, they realized the need to investigate other options for data storage, processing, and/or disaster recovery. Solution: The company decided to improve the operational efficiency of their data storage and disaster recovery environment with Cloudera. They’re initially migrating two years’ disaster recovery data from Greenplum onto CDH, and will eventually migrate all 7 years DR data onto the platform. They’ll have 4-5PB on CDH before the end of 2013. Upon successful completion of the DR migration, the company may consider moving their enterprise data warehouse onto Cloudera. Results: The company is saving tens of millions of dollars by replacing their Greenplum + SAN costs with Cloudera. Meanwhile, they’ve recognized a processing performance boost of 20X.
  6. BlackBerry realized ROI on their Cloudera investment through storage savings alone, while reducing ETL code by 90%.The Challenge:(PRIMARY) BlackBerry Services generates .5PB (50-60TB compressed) data per day(PRIMARY) RDBMS is expensive – limited to 1% data sampling for analyticsThe Solution(PRIMARY)Cloudera Enterprise manages global data set of ~100PB(PRIMARY) Collecting device content, machine-generated log data, audit details90% ETL code base reduction(PRIMARY) No longer have to rely on a 1% data sample for analytics; they can query all of their data -- faster, on a much larger data set, and with greater flexibility before(PRIMARY) Predict the impact that the London Olympics would have on their network so they could take proactive measures and prevent a negative customer experience--++--++--Link to account record in SFDC (valid for Cloudera employees only): https://na6.salesforce.com/0018000000l7XjiBlackBerry realized ROI on their Cloudera investment through storage savings alone, while reducing ETL code by 90%.Background: BlackBerry transformed the mobile devices market in 1999 with their introduction of the BlackBerry smartphone. Since then, other industry innovators have introduced devices that compete against BlackBerry, and the company must leverage all of the data it can collect in order to understand its customers, what they need and want in mobile devices, and how to remain an industry leader. Challenge: BlackBerry Services generate ½ PB of data every single day -- or 50-60TB compressed. They couldn’t afford to store all of this data on their relational database, so their analytics were limited to a 1% data sample which reduced the accuracy of those analytic insights. And it took a long time to try to access data in the archive. Their incumbent system couldn’t cope with the multiplying growth of data volumes or constant access requests -- BlackBerry had to pipeline their data flows to prevent the data from hitting disk.Solution: BlackBerry deployed Cloudera Enterprise to provide a queryable data storage environment that would allow them to put all of their data to use. Today, BlackBerry has a global dataset of ~100 PB stored on Cloudera. The platform collects device content, machine-generated log data, audit details and more. BlackBerry has also converted ETL processes to run in Cloudera, and Cloudera feeds data into the data warehouse. Hadoop components in use include Flume, Hive, Hue, MapReduce, Pig and Zookeeper. Results: BlackBerry’s investment in Cloudera was justified through data storage cost savings alone. And by moving data processing over to Hadoop, their ETL code base has been reduced by 90%. They no longer have to rely on a 1% data sample for analytics; they can query all of their data -- faster, on a much larger data set, and with greater flexibility before. One ad hoc query that used to take 4 days to run now finishes in 53 minutes on Cloudera. BlackBerry’s new environment allowed them to do things like predict the impact that the London Olympics would have on their network so they could take proactive measures and prevent a negative customer experience.
  7. ShortcomingsToo much dataForces big DWExpensive: $$ and FTEForced windows or samplingThe 10%Iceberg modelThe other 90%Below waterline existsInaccessible, high cost to retrieve/use“Break glass in case of emergency”Can satisfy some complianceOpportunity cost of storage vs. storage + computeNetwork storageLow cost storageEasier to retrieveNot computeData movement$$ at scaleOverall capacityExisting is growingHitting thresholdsNew workloadsFeasible?“Forces you as a business to make decisions based on inadequacies, not on opportunities”--++--++--All of these customers have faced similar issues. Let’s talk about where the bottlenecks and shortcomings are exposed in current data management infrastructures. The first shortcoming is something we have seen in lots of clients is defining a small (and getting smaller) window of data for analysis because they have to, they are forced to by their current infrastructures. Visa, for example, was constrained to 6 months of transactions to look for fraud until they turned to Hadoop, and they had a 100TB data warehouse designed specifically for this task. This stems from two things: first, the cost/TB calculations that we have discussed earlier – is it worth it to the business to make this investment in storing this data? – and the second, is it even physically possible to do so? A 100TB data warehouse is no small feat, in terms of hardware, systems, skill set, et al., and might be out of reach for the company. So, companies often turn to sampling and high data turnover to get around these two bottlenecks, and this is can be less than ideal for making better decisions.The second shortcoming refers back to the Cost/TB that we just mentioned, and a common solution is to put that data to a “side pocket.” This “iceberg” model for data – you only really see the top 10% of data within the organization, i.e. in the data warehouse and BI systems, because the rest is below the waterline. And as discussed, anything below the waterline is typically more difficult to access, which drives up the real cost of storage, and thus relegates that data to a “break glass in case of emergency” model – it’s there, but the cost to retrieve and use is expensive, so do so judiciously. To be fair, this can satisfy some compliance and retention strategies. However, you need to look at the opportunity cost of not being able to use this data – it could be valuable if you can get at it cheaply and easily. Why not have both compliance and accessibility?!The third shortcoming is that the options that do scale, like a SAN or NAS, don’t provide anything but storage. If we want to do something with the data, we need to move it. And that can be costly, both in terms of network and also in terms of where the data will land for processing. If you need to look at 10 years of ATM transactions, what if that volume doesn’t fit in your data warehouse or staging systems? You are back to some form of windowing, sampling, partitioning, etc. and that can be a lot of overhead and complexity, and you are also back to the topic of the Cost/TB for that processing system.And that gets to the fourth shortcoming, which is the overall capacity of the system. Many of our clients are hitting or approaching maximum capacity with their existing systems, and to take on new workloads, they are facing escalating costs that prohibit expansion or inadequate functionality if they do so. And this shortcoming really affects the previous points – it potentially threatens your compliance policies, constrains your reporting latitude, and in short, forces you as a business to make decisions based on inadequacies, not on opportunities.----How do these relate to the following business issues:Access to relevant and/or all the data in BI and analysis (current and historical)New projects affecting existing projectsCompliance, regulations, and data retention strategies
  8. Compliance and Data RetentionScalabilityAll data, all typesCosts 10x lessMechanicsDASSchema-on-readCluster fault toleranceCompressionReplicationExpand your data capacity at will “Need more space, it can be as simple as adding another node to the cluster”Confidence in data assurance and protection due to distributed storage mechanicsAccessibilityQuery frameworksBI tool integrationSecurityWORMNew WorkloadsCompute, like storage“as simple as adding another node to the cluster”Built for new workloadsCrunching web pagesNew sites, internet expansionResource managementMultiple computesMultiple groupsPlay well togetherNeed to Analyze More DataCombination on one nodeStorageCompute“Cost-effectively store all the data that you want to analyze and at the same time”Orders of magnitude less“A typical data warehouse might run $2-$10M incremental spend to add 100TB to the system. With Cloudera, adding 100TB will cost roughly $200k – 1/10th the spend”--++--++--How then does Hadoop fit into the infrastructure and business processes to enable you to meet these challenges?Let’s start with compliance and data retention strategies, how does Hadoop work to support these policies?Hadoop provides linear scalability storage for all data, regardless of type, in its raw, native form, at a cost point far below that of traditional systems like the data warehouse or SAN. It can do this by relying on a couple of fundamental features of the framework: direct-access storage, schema-on-read, multiple layers of fault tolerance, pluggable compression, and block replication. (The first of the afternoon clinics will go into the details of these features.) Without going deeply into the details, these features provide you and your business the ability to expand your data capacity at will – need more space, it can be as simple as adding another node to the cluster – and arm you with the knowledge that the data is intelligently distributed throughout the cluster in order to afford a high degree of assurance against data loss. But unlike backups and offline archives, these features also allow Hadoop to provide this data immediately, on-demand, through many means of access, including query languages like SQL, various purpose-built connectors, and many industry-leading applications that your organizations already employ like your BI tools. Moreover, Hadoop provides industry-standard security mechanisms for controlling access and visibility, so when coupled with Hadoop’s write-once features, you can maintain your compliance policy while still allowing analysis and work. With Hadoop, you get a cost-effective way to store, access, query, and process your data, all of it, regardless of its “data density.” It’s compliance and then some!How about new projects and workloads? What role can Hadoop play in these kinds of situations?Hadoop also provides scalability for processing as well, so just like with storage, if you need more computing horsepower, it can be as simple as adding another node to the cluster. It does this by using a computing framework called MapReduce and also takes advantage of the block replication we just mentioned to speed things up even faster. (Again, MapReduce is a topic covered during the first afternoon clinic.) Hadoop was initially designed for this kind of problem, which was crunching web pages to build a search index, so adding new workloads, just like new sets of web sites, can be scaled in a straightforward and cost-effective manner. In addition, there are several other compute frameworks that can use the same underlying data, such as Cloudera Impala, and we fully expect to see more and more capabilities work in the same manner – single data set, multiple ways of computing. Hadoop also has features that help keep different units of work or projects from monopolizing all of a cluster. These resource management features are also configurable, so you have the opportunity to tune how new workloads operate with existing projects. Lastly, when you are feeling the effects of sampling and windowing, when you need to be able to look at more data than your current systems can handle or allow, how can Hadoop work to address these situations?This really is the combination of the previous two situations, because you can use Hadoop to cost-effectively store all the data that you want to analyze and at the same time, since Hadoop is both storage and compute on the same node, offer your business computing power across the entire range of data. This is very effective for low-value, low-density, low Return-on-Byte data like historical records and couples really well with many of the compliance and data retention needs we encounter. It’s also a natural fit for scenarios where you really must have a full access to data across a broad dimension, like our examples in fraud or anomaly detection. Given the characteristics of the infrastructure needed to build a Hadoop cluster, your structural cost/TB are typically an order of magnitude less than with a traditional data warehouse. For example, a typical data warehouse might run $2-$10M incremental spend to add 100TB to the system. With Cloudera, adding 100TB will cost roughly $200k – 1/10th the spend.----How do these relate to the following business issues:Access to relevant and/or all the data in BI and analysis (current and historical)New projects affecting existing projectsCompliance, regulations, and data retention strategiesToo feature-y, need to translate into needs/business driversGet slides from Impala launch – check Box/Gdrive/Justin
  9. Features of HDFSFiles split into blocks; Blocks distributed across cluster: disk, node, rack; Blocks replicated for protection and accessibility; Transparent replicationTests for compromises; Self-healing via replicationHigh-bandwidth; DAS for IO; Bring compute to closest replicated block; Minimize network overhead; Read small parts from many places simultaneouslyClustered storage; Need more space? Add a node; That simpleData stored in native fidelity; Byte streams on disk; SerDe; No schema enforcement; Key to flexibility of compute and storageFeatures of MapReduceFault-tolerant; Distributed blocks == options for recompute on compute failure; Easy to persist intermediate results for replay;Distributed processing; Compute brought to blocks, not file; Options for best compute time due to block dispersion in cluster; Parallel programming details abstracted awaySchema-on-read; Read byte streams at query time; Determine schema at runtime; Key to flexibility; Key to multiple computes; Core to now and future workloads
  10. Snappy, g
  11. Interactive BIHad been lackingSome analysis impracticalBatch design decisions“you could run analysis, but it could take a while to get results”Limited audience for data in clusterLack of interaction via common BI tools/SQLResultsLower ROI due to limited audiencePush data to high cost, but approachable, systemsDW, not always ideal landing spot for this analysisImpalaFamiliar to larger audienceMakes Hadoop data accessibleImproved ROI“If your business analysts know SQL and BI tools, they can get immediate value from all your data, from the first-class data within your data warehouse to the data within your Hadoop cluster, too.”--++--++--While Hadoop does offer tremendous value and analytical capabilities for businesses, the lack of a true, Hadoop-native interactive BI and analytics engine has made some analysis impractical. Yes, collecting all of your data into a primary data hub – some call it a data refinery – alone might be value enough to your business, but the ways in which you could evaluate and explore this source might have discounted that value because they focused on resilience and fault tolerance or even access to complete expression and programming latitude at the expense of speed and rapid dialogue. In a nutshell, you could run analysis, but it could take a while to get results. This meant that the data in the cluster, while available and accessible, wasn’t quite accessible enough to larger groups within your business because that particular focus. The lack of interactivity made it frustrating to use many common BI tools to analyze and use the data, and in turn, this led to two outcomes. First, it limited the ROI of the data within Hadoop because only 10s of people within your organization – your ETL developers and your patient data scientists – rather than the 100s or more people that could get at it if the means were more approachable, more familiar. The other outcome is that it forced organizations to push data that was otherwise not ideal into the data warehouse, and all of the topics we have discussed about cost/TB, capabilities, and capacities remained at large.This is why the introduction of Cloudera Impala is so important to these scenarios, because with this real-time query engine, the users that are familiar with the speed and rapid dialogue common to BI tools and data warehouse now have that quickness and agility with your Hadoop data. More people accessing more data and getting more value from the cluster -- this all adds up to improved ROI for Hadoop. So now, if your business analysts know SQL and BI tools, they can get immediate value from all your data, from the first-class data within your data warehouse to the data within your Hadoop cluster, too.
  12. Right Tool, Right JobHadoopLow-density storage and extractionWhat if’s and “how about this” questionsExploratory analysisJust one tool, thoughDon’t ditch the DWComplex transactionsKnown, highly planned reportingBest using schemaFastHigh-value dataReal powerRelationship between the twoFacebook example10TB DWThen 1PB Hadoop clusterResulting in 40TB DWFound much more important data in the cluster--++--++--So, why not just forgo the data warehouse all together if Hadoop can provide cost-effective, scalable storage and an array of methods for analyses, reporting, and getting value out of all your data, not just the selected, high-value, high-density data?The simple answer is that Hadoop gives you and your business the “right tools for the right job.” Hadoop offers you and your business the ability to query on big and growing data sets – it lets you get at the value within the low-density data. And it offers you the flexibility to keep that data in its raw form so that you can ask the “what if” and “how about this” questions on that data at any time, with no data duplication, using the tools that your business knows and uses on a daily basis. Hadoop with Impala excels at this kind of exploratory analysis. Hadoop, though, is only one tool.The data warehouse offers a number of capabilities that are either difficult or missing with Hadoop. If you need to ensure that the order is incremented at the same time the store’s inventory is decremented and that the sales person gets credited for the transaction, then you need the power of a data warehouse. That’s a contrived example, but the point is, data warehouses excel at complex transactions, and these transactions are commonplace throughout your business. If you need to provide the sales team with their quarterly pipeline reports, or if the floor manager needs to know how many boxes were shipped to Los Angeles on Thursday, then you should use a data warehouse. These kinds of questions – known, highly planned, and often repeated with drill-down variations – are best served using the speed and structure offered by the underlying technology powering the data warehouse, the relational schema. This is the high-value data, this is the data the needs to fly first-class.Where the real power lies in this relationship is that Hadoop can feed the data warehouse this high-value data and thus makes the data warehouse even more valuable to its users than previously thought. One example comes from one of our founders and current chief scientist at Cloudera, Jeff Hammerbacher. When Jeff was leading the data science team at Facebook, they had a 10TB data warehouse. He and his team started to use Hadoop to capture the multitude of interaction points that the web property offers, and they quickly established a formidable 10PB cluster. And what happened next? They found so much high-value information in the cluster that they wanted in their traditional BI environments that their data warehouse grew to 40TB.
  13. [Just keeping for graphics possibilities -- added to the Solutions/Differentiators section]
  14. More with relationshipHadoop as pre-processorEDW staging areaSignificant, high-cost data cleansingAs stagingNeed storageNeed processingNeed queryNeed low costsDisaster recovery alternateLow cost storageSatisfactory alterative in query during rebuild--++--++--So what else can be done with this complementary relationship of data warehouse and Hadoop?Let’s say you need to take some data and run it through some paces in order to make some decisions as to whether or not to include it in your BI reporting. Or perhaps you have some significant data cleansing efforts that might take considerable amount of time, like days or weeks, before it might be ready for your business consumers to use. Like we mentioned in Jeff’s story, Hadoop can be the ideal staging area for your data warehouse. Such a system requires scalable, flexible storage. It needs a high degree of processing capabilities, and it needs query abilities. All of which fit Hadoop very well and with a significant cost advantage to your data warehouse, too.Hadoop can also act as a disaster recovery option for your data warehouse. Using standard tools and connectors, the data flowing into your data warehouse can also be sent to the Hadoop cluster and stored, in its native format. So when the time comes, you have your data available and accessible, yet at the fraction of the cost for a duplicate data warehouse. In this scenario, when coupled with Impala, you and your business can still enjoy most of the speed and analytical features of your data warehouse while that system is concurrently rebuilt from the very data you are now serving.
  15. SO, TO WRAP UP, HERE ARE SOME KEY TAKEAWAYSALL DATA HAS VALUE * Why compromise decision making by focusing on only high density or selected? *Use Hadoop to maximize the Return-on-Byte for all your dataEXPLOIT BOTH STORAGE AND COMPUTE * Hadoop gives you storage and computation at a similar cost to storage-only alternatives * The computation is flexible - you can bring multiple processing frameworks to bear on a single set of data * This multi-function is expected & natural for exploratory type workloadsUSE THE RIGHT TOOL FOR THE RIGHT JOB* This approach drives better data & workload alignment * Focus on what’s best of each system * DW for high-density, operational reporting * Hadoop for low-density, exploratory analytics
  16. Consolidate your Information Lifecycle ManagementFind your valuable off-line archivesMake them available and accessibleConnect the archives with your existing reporting systemsCreate a staging area,Processing and push into your systems