Hw09 Rethinking The Data Warehouse With Hadoop And Hive

•Télécharger en tant que PPT, PDF•

10 j'aime•1,363 vues

Cloudera, Inc.

Technologie Business

Rethinking Data Warehousing & Analytics Ashish Thusoo, Facebook Data Infrastructure Team

Why Another Data Warehousing System? ,[object Object],[object Object],[object Object],[object Object]

Trends Leading to More Data Free or low cost of user services Realization that more insights are derived from simple algorithms on more data

Deficiencies of Existing Technologies Cost of Analysis and Storage on proprietary systems does not support trends towards more data Closed and Proprietary Systems Limited Scalability does not support trends towards more data

Hadoop Advantages ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What is HIVE? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Hive: Simplifying Hadoop – New Technology Familiar Interfaces ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hive: Open and Extensible ,[object Object],[object Object],[object Object]

Hive: Smart Execution Plans for Performance ,[object Object],[object Object],[object Object],[object Object],[object Object]

Interoperability ,[object Object],[object Object],[object Object]

Information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Data Warehousing @ Facebook using Hive & Hadoop

Data Flow Architecture at Facebook Web Servers Scribe MidTier Filers Production Hive-Hadoop Cluster Oracle RAC Federated MySQL Scribe-Hadoop Cluster Adhoc Hive-Hadoop Cluster Hive replication

Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4 Gigabit Node = DataNode + Map-Reduce

Hadoop & Hive Cluster @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Hive & Hadoop Usage @ Facebook ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Facebook’s contributions… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Contenu connexe

Tendances

Hadoop An IntroductionMohanasundaram Ponnusamy

Cloud Computing: Hadoopdarugar

Case study on big dataKhushboo Kumari

Hadoop for beginners free course pptNjain85

2 hadoop@e bay-hug-2010-07-21Hadoop User Group

Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...BMonica1

ArcGIS and Multi-D: Tools & RoadmapThe HDF-EOS Tools and Information Center

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for womenmaharajothip1

Hadoop online training by certified trainersriram0233

Short introduction to ML frameworks on HadoopYuya Takashina

Available platforms for Big Data 2.0Petr Novotný

B.MONICA II M.SC COMPUTER SCIENCEBMonica1

Apache Hadoop and Spark: Introduction and Use Cases for Data AnalysisTrieu Nguyen

BIG DATA HADOOPAzmat Siddique

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks

Introduction to Hadoop at Data-360 ConferenceAvkash Chauhan

Data Center Operating SystemKeshav Yadav

Greenplum-Spark November 2018KongYew Chan, MBA

Pig, Making Hadoop EasyNick Dimiduk

Hadoop distributions - ecosystemJakub Stransky

Tendances (20)

Hadoop An Introduction

Cloud Computing: Hadoop

Case study on big data

Hadoop for beginners free course ppt

2 hadoop@e bay-hug-2010-07-21

Hadoop foundation for analytics,B Monica II M.sc computer science ,BON SECOUR...

ArcGIS and Multi-D: Tools & Roadmap

Hadoop Maharajathi,II-M.sc.,Computer Science,Bonsecours college for women

Hadoop online training by certified trainer

Short introduction to ML frameworks on Hadoop

Available platforms for Big Data 2.0

B.MONICA II M.SC COMPUTER SCIENCE

Apache Hadoop and Spark: Introduction and Use Cases for Data Analysis

BIG DATA HADOOP

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)

Introduction to Hadoop at Data-360 Conference

Data Center Operating System

Greenplum-Spark November 2018

Pig, Making Hadoop Easy

Hadoop distributions - ecosystem

En vedette

Expand a Data warehouse with Hadoop and Big Datajdijcks

How Klout is changing the landscape of social media with Hadoop and BIDenny Lee

Aspects of data martOsama Hussain Paracha

ADER RRHH PRESENTACIÓN CORPORATIVAMaría González Fernández

Facebook Retrospective - Big data-world-europe-2012Joydeep Sen Sarma

Facebook's Approach to Big Data Storage ChallengeDataWorks Summit

A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud

Data Warehouse Evolution RoadshowMapR Technologies

Project VoldemortFabiano Da Ventura

Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.

Storage Infrastructure Behind Facebook Messagesyarapavan

Creating a Culture of Data @ Facebook - TCCEU13Andy Kriebel

Dimensional Modelingaksrauf

Using the right data model in a data martDavid Walker

Dimensional Modeling Basic Concept with ExampleSajjad Zaheer

Hive Training -- Motivations and Real World Use Casesnzhang

FBTFTP: an opensource framework to build dynamic tftp serversAngelo Failla

SREConEurope15 - The evolution of the DHCP infrastructure at FacebookAngelo Failla

Honey bees and beekeeping projectNouman Rafique

Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans

En vedette (20)

Expand a Data warehouse with Hadoop and Big Data

How Klout is changing the landscape of social media with Hadoop and BI

Aspects of data mart

ADER RRHH PRESENTACIÓN CORPORATIVA

Facebook Retrospective - Big data-world-europe-2012

Facebook's Approach to Big Data Storage Challenge

A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook

Data Warehouse Evolution Roadshow

Project Voldemort

Facebook - Jonthan Gray - Hadoop World 2010

Storage Infrastructure Behind Facebook Messages

Creating a Culture of Data @ Facebook - TCCEU13

Dimensional Modeling

Using the right data model in a data mart

Dimensional Modeling Basic Concept with Example

Hive Training -- Motivations and Real World Use Cases

FBTFTP: an opensource framework to build dynamic tftp servers

SREConEurope15 - The evolution of the DHCP infrastructure at Facebook

Honey bees and beekeeping project

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop

Similaire à Hw09 Rethinking The Data Warehouse With Hadoop And Hive

Hive @ Hadoop day seattle_2010nzhang

Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar

Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks

Hadoop Adminstration with Latest Release (2.0)Edureka!

Data infrastructure at Facebook AhmedDoukh

Overview of big data & hadoop v1Thanh Nguyen

Hadoop Administration pdfEdureka!

Architecting the Future of Big Data and SearchHortonworks

Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen

Hadoop and BigData - July 2016Ranjith Sekar

Big Data and HadoopFlavio Vit

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.

Hadoop Big Data A big pictureJ S Jodha

Hadoop_Its_Not_Just_Internal_Storage_V14John Sing

How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah

How can Hadoop & SAP be integratedDouglas Bernardini

Oct 2011 CHADNUG Presentation on HadoopJosh Patterson

Hadoop project design and a usecasesudhakara st

THE SOLUTION FOR BIG DATATarak Tar

Similaire à Hw09 Rethinking The Data Warehouse With Hadoop And Hive (20)

Hive @ Hadoop day seattle_2010

Hadoop a Natural Choice for Data Intensive Log Processing

Eric Baldeschwieler Keynote from Storage Developers Conference

Hadoop Adminstration with Latest Release (2.0)

Data infrastructure at Facebook

Overview of big data & hadoop v1

Hadoop Administration pdf

Architecting the Future of Big Data and Search

Overview of big data & hadoop version 1 - Tony Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1

Hadoop and BigData - July 2016

Big Data and Hadoop

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...

Hadoop Big Data A big picture

Hadoop_Its_Not_Just_Internal_Storage_V14

How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook

How can Hadoop & SAP be integrated

Oct 2011 CHADNUG Presentation on Hadoop

Hadoop project design and a usecase

THE SOLUTION FOR BIG DATA

Plus de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Histor y of HAM Radio presentation slidevu2urc

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

A Call to Action for Generative AI in 2024Results

08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

🐬 The future of MySQL is Postgres 🐘RTylerCroy

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

How to convert PDF to text with Nanonetsnaman860154

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

The Codex of Business Writing Software for Real-World Solutions 2.pptx

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

GenCyber Cyber Security Day Presentation

Histor y of HAM Radio presentation slide

Axa Assurance Maroc - Insurer Innovation Award 2024

Exploring the Future Potential of AI-Enabled Smartphone Processors

IAC 2024 - IA Fast Track to Search Focused AI Solutions

A Call to Action for Generative AI in 2024

08448380779 Call Girls In Friends Colony Women Seeking Men

Data Cloud, More than a CDP by Matt Robison

Factors to Consider When Choosing Accounts Payable Services Providers.pptx

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

🐬 The future of MySQL is Postgres 🐘

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service

The 7 Things I Know About Cyber Security After 25 Years | April 2024

How to convert PDF to text with Nanonets

Hw09 Rethinking The Data Warehouse With Hadoop And Hive

1. Rethinking Data Warehousing & Analytics Ashish Thusoo, Facebook Data Infrastructure Team

4. Trends Leading to More Data Free or low cost of user services Realization that more insights are derived from simple algorithms on more data

5. Deficiencies of Existing Technologies Cost of Analysis and Storage on proprietary systems does not support trends towards more data Closed and Proprietary Systems Limited Scalability does not support trends towards more data

10.

11.

12.

13. Data Warehousing @ Facebook using Hive & Hadoop

14. Data Flow Architecture at Facebook Web Servers Scribe MidTier Filers Production Hive-Hadoop Cluster Oracle RAC Federated MySQL Scribe-Hadoop Cluster Adhoc Hive-Hadoop Cluster Hive replication

15. Looks like this .. Disks Node Disks Node Disks Node Disks Node Disks Node Disks Node 1 Gigabit 4 Gigabit Node = DataNode + Map-Reduce

16.

17.

18.

19.

20.

Notes de l'éditeur

Cost of training people is high – have to reduce cost by making system easy to use.
Why Hive? Petabytes of structured data User base familiar with SQL and Python/Perl/PHP Commercial Warehousing Software .. Does not scale, very expensive, inflexible Closed source, not programmable using Python/Perl/PHP Solution: SQL layer on top of scalable storage and map-reduce (Hadoop) Openness: Use any data format, embed any programming language
Nomenclature: Core switch and Top of Rack
1GB connectivity within a rack, 100MB across racks? Are all disks 7200 SATA?

Hw09 Rethinking The Data Warehouse With Hadoop And Hive

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Hw09 Rethinking The Data Warehouse With Hadoop And Hive

Similaire à Hw09 Rethinking The Data Warehouse With Hadoop And Hive (20)

Plus de Cloudera, Inc.

Plus de Cloudera, Inc. (20)

Dernier

Dernier (20)

Hw09 Rethinking The Data Warehouse With Hadoop And Hive

Notes de l'éditeur