SlideShare une entreprise Scribd logo
1  sur  12
1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.
A NEW PLATFORM FOR A NEW ERA
2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
3Pivotal Confidential–Internal Use Only
HDFS
HBase
Pig, Hive,
Mahout
Map Reduce
Sqoop Flume
Resource
Management
& Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure,
Deploy, Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HD
Enterprise
Xtension
Framework
Catalog
Services
Query
Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced
Database Services
Pivotal HD Architecture
4Pivotal Confidential–Internal Use Only
• HDFS – The Hadoop Distributed File System
acts as the storage layer for Hadoop
• MapReduce – Parallel processing framework
used for data computation in Hadoop
• Hive – Structured, data warehouse
implementation for data in HDFS that
provides a SQL-like interface to Hadoop
• Pig – High-level procedural language for data
pipeline/data flow processing in Hadoop
• HBase – NoSQL, key-value data store on top
of HDFS
• Mahout – Library of scalable machine-
learning Algorithms
• Spring Hadoop – Integrates the Spring
framework into Hadoop
Pivotal HD Components
5Pivotal Confidential–Internal Use Only
• Installation and Configuration Manager (ICM) – cluster
installation, upgrade, and expansion tools.
• GP Command Center – visual interface for cluster health,
system metrics, and job monitoring.
• Hadoop Virtualization Extension (HVE) – enhances
Hadoop to support virtual node awareness and enables
greater cluster elasticity.
• GP Data Loader – parallel loading infrastructure that
supports “line speed” data loading into HDFS.
• Isilon Integration – extensively tested at scale with
guidelines for compute-heavy, storage-heavy, and
balanced configurations.
• Advanced Database Services (HAWQ)– high-performance,
“True SQL” query interface running within the Hadoop cluster.
• Extensions Framework (GPXF) – support for HAWQ
interfaces on external data providers (HBase, Avro, etc.).
• Advanced Analytics Functions (MADLib) – ability to access
parallelized machine-learning and data-mining functions at
scale.
GPHD Includes… Pivotal HD Adds the Following to GPHD…
Pivotal HD Value-Added Components
6Pivotal Confidential–Internal Use Only
Component Version
Hadoop 1.0.3
HBase 0.92.1
Hive 0.8.1
Mahout 0.6
Pig 0.9.2
Zookeeper 3.3.5
Flume 1.2.0
Sqoop 1.4.1
Spring Hadoop
GPHD 1.2 Core Distribution Pivotal HD Enterprise
Pivotal Core Components & Versions
Component Version
Hadoop 2.0.2
HBase 0.94.2
Hive 0.9.1
Mahout 0.8.0
Pig 0.10.0
Zookeeper 3.4.5
Flume 1.3.1
Sqoop 1.4.2
Spring Hadoop 1.0.0
7Pivotal Confidential–Internal Use Only
DataLoader
.
.
Streams
Push
Pull
Connectors
Flume
HDFS
DataLoader
Data Source
Registration
Copy
Strategy
Optimization
Web GUI and CLI
Data
Destination
Registration
Data Copy
Job
Management
Data
Processing
REST APIs
Files
HDFS
NFS
HTTP
FTP
Local
8Pivotal Confidential–Internal Use Only
Command Center
Simple and complete cluster management
 Install and configure Hadoop
components and services
 Centralized interface for Pivotal HD
cluster monitoring, diagnostics, and
management
 Live and historical Hadoop system
metrics analysis
Configure
Monitor
Manage
Analyze
Deploy
9Pivotal Confidential–Internal Use Only
Command Center – Monitor, Manage, and
Analyze
 Host, application, and job level
monitoring across the entire Pivotal
HD cluster performance
 Visualize and analyze live and
historical Hadoop cluster information
through Command Center
Dashboard
 Quick diagnostics of functional or
performance issue
10Pivotal Confidential–Internal Use Only
Hadoop Virtualization Extensions (HVE)
• HVE enables Hadoop to support more effective virtual deployments
• This creates the opportunity to provision and scale the compute and storage processes
independently resulting in:
• Much better resource utilization
• Improved resource allocation and consumption
• Support Multi-Tenancy
11Pivotal Confidential–Internal Use Only
HAWQ Delivers
 SQL compliant
 World-class query optimizer
 Interactive query
 Horizontal scalability
 Robust data management
 Common Hadoop formats
 Deep analytics
12Pivotal Confidential–Internal Use Only
Xtension Framework
 An advanced version of GPDB
external tables
 Enables combining HAWQ data and
Hadoop data in single query
 Supports connectors for HDFS,
Hbase and Hive
 Provides extensible framework API to
enable custom connector
development for other data sources
HDFS HBase Hive
Xtension Framework

Contenu connexe

Tendances

MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 StepsScott Cinnamond
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoopmarkgrover
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataSaurav Kumar Sinha
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleSpringPeople
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafkaSzehon Ho
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureRyan Hennig
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...Cloudera, Inc.
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...DataWorks Summit
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezJan Pieter Posthuma
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep DiveHortonworks
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesDataWorks Summit
 

Tendances (20)

MySql to HBase in 5 Steps
MySql to HBase in 5 StepsMySql to HBase in 5 Steps
MySql to HBase in 5 Steps
 
Architecting Applications with Hadoop
Architecting Applications with HadoopArchitecting Applications with Hadoop
Architecting Applications with Hadoop
 
A glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big DataA glimpse into the Future of Hadoop & Big Data
A glimpse into the Future of Hadoop & Big Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Introduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeopleIntroduction To Hadoop Administration - SpringPeople
Introduction To Hadoop Administration - SpringPeople
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Deep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profitDeep Learning using Spark and DL4J for fun and profit
Deep Learning using Spark and DL4J for fun and profit
 
Hive on kafka
Hive on kafkaHive on kafka
Hive on kafka
 
Empower Data-Driven Organizations
Empower Data-Driven OrganizationsEmpower Data-Driven Organizations
Empower Data-Driven Organizations
 
Empower Hive with Spark
Empower Hive with SparkEmpower Hive with Spark
Empower Hive with Spark
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
HBaseCon 2013: Apache Drill - A Community-driven Initiative to Deliver ANSI S...
 
Envelope
Envelope Envelope
Envelope
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Stinger Initiative - Deep Dive
Stinger Initiative - Deep DiveStinger Initiative - Deep Dive
Stinger Initiative - Deep Dive
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
 

En vedette

Caminho para definir micro-serviços
Caminho para definir micro-serviçosCaminho para definir micro-serviços
Caminho para definir micro-serviçosVictor Fonseca
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015PivotalOpenSourceHub
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1Nikolay Samokhvalov
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to GreenplumDave Cramer
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...PGDay Campinas
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...EMC
 

En vedette (8)

Caminho para definir micro-serviços
Caminho para definir micro-serviçosCaminho para definir micro-serviços
Caminho para definir micro-serviços
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1
 
Introduction to Greenplum
Introduction to GreenplumIntroduction to Greenplum
Introduction to Greenplum
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
 

Similaire à Pivotal HD Platform Overview

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개Seungdon Choi
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft PlatformJesus Rodriguez
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem pptsunera pathan
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsDataWorks Summit/Hadoop Summit
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxraghavanand36
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxDr.Florence Dayana
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxiaeronlineexm
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.OW2
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptxVIJAYAPRABAP
 

Similaire à Pivotal HD Platform Overview (20)

Pivotal HAWQ 소개
Pivotal HAWQ 소개Pivotal HAWQ 소개
Pivotal HAWQ 소개
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
BIGDATA ppts
BIGDATA pptsBIGDATA ppts
BIGDATA ppts
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
 
Big Data in the Microsoft Platform
Big Data in the Microsoft PlatformBig Data in the Microsoft Platform
Big Data in the Microsoft Platform
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Eric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers ConferenceEric Baldeschwieler Keynote from Storage Developers Conference
Eric Baldeschwieler Keynote from Storage Developers Conference
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the expertsHadoop in the Cloud - The what, why and how from the experts
Hadoop in the Cloud - The what, why and how from the experts
 
hadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptxhadoop-ecosystem-ppt.pptx
hadoop-ecosystem-ppt.pptx
 
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptxM. Florence Dayana - Hadoop Foundation for Analytics.pptx
M. Florence Dayana - Hadoop Foundation for Analytics.pptx
 
An Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptxAn Introduction-to-Hive and its Applications and Implementations.pptx
An Introduction-to-Hive and its Applications and Implementations.pptx
 
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
Scalable ETL with Talend and Hadoop, Cédric Carbone, Talend.
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx01-Introduction-to-Hive.pptx
01-Introduction-to-Hive.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 

Plus de Chiou-Nan Chen

Plus de Chiou-Nan Chen (20)

Moving NEON to 64 bits
Moving NEON to 64 bitsMoving NEON to 64 bits
Moving NEON to 64 bits
 
64-bit Android
64-bit Android64-bit Android
64-bit Android
 
Intelligent Power Allocation
Intelligent Power AllocationIntelligent Power Allocation
Intelligent Power Allocation
 
3. v sphere big data extensions
3. v sphere big data extensions3. v sphere big data extensions
3. v sphere big data extensions
 
4. v sphere big data extensions hadoop
4. v sphere big data extensions   hadoop4. v sphere big data extensions   hadoop
4. v sphere big data extensions hadoop
 
2. hadoop
2. hadoop2. hadoop
2. hadoop
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Emc keynote 1130 1200
Emc keynote 1130 1200Emc keynote 1130 1200
Emc keynote 1130 1200
 
Emc keynote 1030 1130
Emc keynote 1030 1130Emc keynote 1030 1130
Emc keynote 1030 1130
 
Emc keynote 0945 1030
Emc keynote 0945 1030Emc keynote 0945 1030
Emc keynote 0945 1030
 
Emc keynote 0930 0945
Emc keynote 0930 0945Emc keynote 0930 0945
Emc keynote 0930 0945
 
102 1600-1630
102 1600-1630102 1600-1630
102 1600-1630
 
102 1530-1600
102 1530-1600102 1530-1600
102 1530-1600
 
102 1430-1445
102 1430-1445102 1430-1445
102 1430-1445
 
102 1315-1345
102 1315-1345102 1315-1345
102 1315-1345
 
102 1630 1700
102 1630 1700102 1630 1700
102 1630 1700
 
102 1445 1515
102 1445 1515102 1445 1515
102 1445 1515
 
101 cd 1630-1700
101 cd 1630-1700101 cd 1630-1700
101 cd 1630-1700
 
101 cd 1600-1630
101 cd 1600-1630101 cd 1600-1630
101 cd 1600-1630
 
101 cd 1445-1515
101 cd 1445-1515101 cd 1445-1515
101 cd 1445-1515
 

Dernier

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Dernier (20)

The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Pivotal HD Platform Overview

  • 1. 1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved. A NEW PLATFORM FOR A NEW ERA
  • 2. 2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved. Pivotal HD
  • 3. 3Pivotal Confidential–Internal Use Only HDFS HBase Pig, Hive, Mahout Map Reduce Sqoop Flume Resource Management & Workflow Yarn Zookeeper Apache Pivotal HD Added Value Configure, Deploy, Monitor, Manage Command Center Hadoop Virtualization (HVE) Data Loader Pivotal HD Enterprise Xtension Framework Catalog Services Query Optimizer Dynamic Pipelining ANSI SQL + Analytics HAWQ– Advanced Database Services Pivotal HD Architecture
  • 4. 4Pivotal Confidential–Internal Use Only • HDFS – The Hadoop Distributed File System acts as the storage layer for Hadoop • MapReduce – Parallel processing framework used for data computation in Hadoop • Hive – Structured, data warehouse implementation for data in HDFS that provides a SQL-like interface to Hadoop • Pig – High-level procedural language for data pipeline/data flow processing in Hadoop • HBase – NoSQL, key-value data store on top of HDFS • Mahout – Library of scalable machine- learning Algorithms • Spring Hadoop – Integrates the Spring framework into Hadoop Pivotal HD Components
  • 5. 5Pivotal Confidential–Internal Use Only • Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools. • GP Command Center – visual interface for cluster health, system metrics, and job monitoring. • Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity. • GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS. • Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations. • Advanced Database Services (HAWQ)– high-performance, “True SQL” query interface running within the Hadoop cluster. • Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.). • Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale. GPHD Includes… Pivotal HD Adds the Following to GPHD… Pivotal HD Value-Added Components
  • 6. 6Pivotal Confidential–Internal Use Only Component Version Hadoop 1.0.3 HBase 0.92.1 Hive 0.8.1 Mahout 0.6 Pig 0.9.2 Zookeeper 3.3.5 Flume 1.2.0 Sqoop 1.4.1 Spring Hadoop GPHD 1.2 Core Distribution Pivotal HD Enterprise Pivotal Core Components & Versions Component Version Hadoop 2.0.2 HBase 0.94.2 Hive 0.9.1 Mahout 0.8.0 Pig 0.10.0 Zookeeper 3.4.5 Flume 1.3.1 Sqoop 1.4.2 Spring Hadoop 1.0.0
  • 7. 7Pivotal Confidential–Internal Use Only DataLoader . . Streams Push Pull Connectors Flume HDFS DataLoader Data Source Registration Copy Strategy Optimization Web GUI and CLI Data Destination Registration Data Copy Job Management Data Processing REST APIs Files HDFS NFS HTTP FTP Local
  • 8. 8Pivotal Confidential–Internal Use Only Command Center Simple and complete cluster management  Install and configure Hadoop components and services  Centralized interface for Pivotal HD cluster monitoring, diagnostics, and management  Live and historical Hadoop system metrics analysis Configure Monitor Manage Analyze Deploy
  • 9. 9Pivotal Confidential–Internal Use Only Command Center – Monitor, Manage, and Analyze  Host, application, and job level monitoring across the entire Pivotal HD cluster performance  Visualize and analyze live and historical Hadoop cluster information through Command Center Dashboard  Quick diagnostics of functional or performance issue
  • 10. 10Pivotal Confidential–Internal Use Only Hadoop Virtualization Extensions (HVE) • HVE enables Hadoop to support more effective virtual deployments • This creates the opportunity to provision and scale the compute and storage processes independently resulting in: • Much better resource utilization • Improved resource allocation and consumption • Support Multi-Tenancy
  • 11. 11Pivotal Confidential–Internal Use Only HAWQ Delivers  SQL compliant  World-class query optimizer  Interactive query  Horizontal scalability  Robust data management  Common Hadoop formats  Deep analytics
  • 12. 12Pivotal Confidential–Internal Use Only Xtension Framework  An advanced version of GPDB external tables  Enables combining HAWQ data and Hadoop data in single query  Supports connectors for HDFS, Hbase and Hive  Provides extensible framework API to enable custom connector development for other data sources HDFS HBase Hive Xtension Framework

Notes de l'éditeur

  1. Start with basic HD and then comment about the addition of a true SQL interface
  2. Uniform – Uniformly distribute copy tasks between workers for to maximize throughput Data Locality Driven – For HDFS/Local Disk sources. Job planner assigns Copy Tasks to Local / closest Worker Node. When deployed on source, assigns reads to local worker, when deployed to destination HDFS, assigns writes to local nodes. Job Planner gets locality information from NameNode. Patches to HDFS schedulers: HDFS rack awareness to reduce inter-rack traffic Local disk awareness to assign read/write MapReduce tasks to workers local to the data Dynamic – Used for loading large number of small files. Assigns small tasks to workers, re-assigns in runtime Connection Limited – Limit # read connections to source FTP/HTTP server Intelligent – Choose correct copy strategy based on source type and data
  3. ICM What is it – GPHD Manager for Greenplum HD GPHD Manager is a part of Command center package. GPHD Manager supports installation and default configuration of Hadoop, MapReduce, Hive, HBase, Zookeeper, Pig and Mahout GPHD Manager supports a Command Line Interface built using a RESTful web services API to install, configure and start/stop various Hadoop services It stores all metadata from the Hadoop cluster nodes and services into a postgresql db to keep track of cluster config and runtime stats How it works GPHD manager is installed on an Admin node that is typically separated from Hadoop cluster nodes Functionality of GPHD manager Admin node exposed as web service REST based APIs Leverages Puppet to manage the installation of Hadoop services (Master/Slave mode) Benefits Provides a centralized role-based configuration and deployment tool Includes validation – machine validation, reachability validation, dependency validation Single GPHD manager Admin node can manage multiple Hadoop clusters (integrated into GP Command Center in next release) Command Center What is it – Application for monitoring & management of GPHD environment Web-based interface that provides standard infrastructure system metrics and Hadoop-specific metrics Designed to make a Hadoop Administrator’s job easier Command Center How it works Visualizes live and historical data in GPCC Dashboard to display state of Hadoop cluster (stored in backend GPDB) GPCC Provides: Host level monitoring (all information specific to a particular host) Application level monitoring (HDFS information across the whole cluster) Job-level monitoring and analysis (information on particular MapReduce jobs) Benefits Monitor that the GPHD cluster is running efficiently and without any problems Quickly diagnose functional or performance issues with the Hadoop cluster GPSM What is it – GPHD System Management & Monitoring Web-services component of GPHD 1.2 that allows for applications to easily monitor and manage one or more Hadoop clusters GP-SM is designed to work with Greenplum Command Center as the UI Leverages GPDB to store/analyze both GPHD application and system metrics How it works GP-SM provides a Thrift Plugin to retrieve data from GPHD Live and historical data stored in GPDB instance with pre-defined schema (gpperfmon) Thrift Plugin exposes its APIs via Web Services GPCC uses this Web Service to visualize live and historical data of GPHD environment Benefits Serves as the backend system for Greenplum Command Center Enables users to analyze both live and historical Hadoop system information
  4. Command Center What is it – Application for monitoring & management of GPHD environment Web-based interface that provides standard infrastructure system metrics and Hadoop-specific metrics Designed to make a Hadoop Administrator’s job easier Command Center How it works Visualizes live and historical data in GPCC Dashboard to display state of Hadoop cluster (stored in backend GPDB) GPCC Provides: Host level monitoring (all information specific to a particular host) Application level monitoring (HDFS information across the whole cluster) Job-level monitoring and analysis (information on particular MapReduce jobs) Benefits Monitor that the GPHD cluster is running efficiently and without any problems Quickly diagnose functional or performance issues with the Hadoop cluster GPSM What is it – GPHD System Management & Monitoring Web-services component of GPHD 1.2 that allows for applications to easily monitor and manage one or more Hadoop clusters GP-SM is designed to work with Greenplum Command Center as the UI Leverages GPDB to store/analyze both GPHD application and system metrics How it works GP-SM provides a Thrift Plugin to retrieve data from GPHD Live and historical data stored in GPDB instance with pre-defined schema (gpperfmon) Thrift Plugin exposes its APIs via Web Services GPCC uses this Web Service to visualize live and historical data of GPHD environment Benefits Serves as the backend system for Greenplum Command Center Enables users to analyze both live and historical Hadoop system information
  5. Topology Extensions: Enable Hadoop to recognize additional virtualization layer for read/write/balancing Enable compute/data node separation without losing locality Elasticity Extensions: Ability to adjust resource allocation (CPU, memory, map/reduce slots) to compute nodes Enables multiple compute VMs sharing common HDFS Data VMs HVE – Allow the HDFS to be virtualization aware Serengeti – Deployment tool for virtualized hadoop
  6. This is the first true SQL engine for hadoop
  7. HDFS Delimited Text Sequence File GPDB Writable Format Protocol Buffer Avro Hbase Predicate Pushdown Hive RCFile Text File Sequence File