SlideShare une entreprise Scribd logo
1  sur  11
Comparison of MPP
Data Warehouse Platforms
- David Portnoy-
- 312.970.9740-
http://LinkedIn.com/in/DavidPortnoy
© 2013-2014
What’s MPP in data warehousing?
MPP (massively parallel processing) data warehouse systems
are different from SMP (symmetric multiprocessing)
databases:
1. Shared-nothing architectures, with no single point of failure
and often hot-swappable components
2. Scale horizontally by adding nodes, rather than moving to a
server with more CPUs or higher storage capacity
3. Breaks a large queries across nodes for simultaneous
processing
4. Capable of higher data ingestion rates through parallelized
data movement
Who are the players?
Previously, we discussed just the specialized MPP data warehouse vendors:
 Teradata
 Netezza
 Vertica
 Greenplum
…But We should keep in mind that most major database vendors also have
their own MPP products for data warehousing. Examples include:
 Microsoft PDW (Parallel Data Warehouse)
 DB2 UDB with Database Partitioning Feature (DPF)
 Oracle Big Data Appliance, which just provides a gateway between Hadoop to
their SMP RDBMS platform
Finally, we need to consider the emergence of SQL-oriented, low-latency
Hadoop solutions. Examples include:
 Impala; Stinger; Apache Drill; Phoenix; Shark; Hadapt
 Teradata’s SQL-H (Aster Data); EMC’s HAWQ; IBM’s BigSQL
See related writeup: http://www.slideshare.net/DavidPortnoy/hybrid-data-
warehouse-hadoop-implementations
How to the architectures compare?
Looking at the specialized MPP data warehouse vendors
Teradata Netezza Greenplum Vertica
Hardware Custom MPP, Shared
Nothing
Custom MPP: SPU +
FPGA logic
Commodity hardware Custom Hybrid MPP,
Shared Everything
Type of
processing
OLTP or OLAP,
Can handle high user
load
OLAP,
Assumes few users for
heavy analytics
OLAP OLAP optimized for
large fact tables
Inception /
Maturity
1979
From Caltech
2000
By Saxena & Hinshaw
2003
From Metapa & Didera
2005
By MIT’s Stonebaker
Performance &
maintenance
Auto-recommended
optimization,
columnar compression
available
No need for
performance tuning,
Must manually reclaim
space
Based on
PostgreSQL, but
optimized for MPP and
enterprise maint.
Column oriented
optimization for
ingestion,
storage/compression,
and access
Hardware Proprietary Proprietary Commodity Commodity
Definitions
* OLAP: Online Analytical Processing
* OLTP: Online Transaction Processing
The industry is moving towards open, commodity solutions
Traditional database servers, such as IBM DB2, Oracle Exadata and
Microsoft SQL Server, license proprietary software, but run on
commodity hardware. Although the nature of SMP architecture typically
favors having a few large expensive servers.
But the biggest MPP data warehouse vendors all have proprietary
software. That’s despite the fact that Netezza and Vertica were on the
open source PostgreSQL database. Teradata and Netezza even
implement custom hardware, which drives up the price.
Hadoop has open sourced the software component leading to a vibrant
ecosystem of tools and applications. And with built in redundancy, it’s
easy to deploy on cheap commodity servers.
Specialized
Hardware
Commodity
Hardware
Open Source,
Standardized Software
Proprietary Software
So the trend looks something like this
Hadoop
** While up-front cost of Hadoop may be lower, the TCO (total cost of ownership)
could be relatively much higher. This is due to the maturity of product, complexity of
solutions and scarcity of talent.
Traditional
Database
MPP Data
Warehouse
Teradata
Hardware and licenses the most
expensive of all options. Staff costs can
be expensive and it takes a great deal of
effort to configure and administer.
IBM
Netezza
Hardware and licenses used to be much
less than Teradata, but prices have been
converging. Some of the highest staff
cost due to scarcity, but that’s tempered
by lower effort for configuration and
admin of single purpose appliance.
Greenplum
Commodity hardware. Moderately priced
licenses. Few Greenplum specialists, but
can be staffed by PostgreSQL DBAs and
developers.
Vertica Commodity hardware. Moderately priced
licenses, but special purpose orientation
limits usefulness. Few specialists, but
can be staffed by traditional DBAs and
developerss.
Hadoop
HBase
Commodity hardware and no license
cost, resulting in lowest up-front cost.
Likely to buy more hardware for
redundancy and load. But requires
highly technical staff and implementation
is less productive than more mature
options.
So lets look at the relative cost breakdown
Hardware & Licenses Development
Hardware Licenses Development
Hardware & Licenses Development
Hardware Development
Hardware Licenses Development
What’s their relative adoption today?
Comparing the supply and demand for administrators and developers can
be a proxy for the strength and staying power of a platform.
Teradata has been around for many years longer than the alternatives and
still dominates the market in terms of install base (3 times next rival) and
vibrant development community (6 times next rival).
But in recent years Hadoop solutions have outstripped Teradata by a
significant margin. (Of course, it should be noted that Hadoop includes use
cases outside of
traditional data
warehousing.)
Over time, interest in market leader Teradata has been consistent, but flat
While Netezza, Vertica, and Greenplum have grown, they didn’t take significant
market share away from Teradata.
(The spike in Netezza interest is attributed to its acquisition by IBM.)
But when Hadoop is added into the mix, the picture changes drastically
Interest in Hadoop has quickly overtaken even traditional Teradata
Which might explain why Teradata has been on an acquisition spree for
Hadoop related products and services, such as Aster Data
The future of its next biggest rival, Netezza, is uncertain as it seeks its
niche within IBM’s product lineup.
Related Reading
Hybrid Data Warehouse-Hadoop Implementations:
http://www.slideshare.net/DavidPortnoy/hybrid-data-warehouse-
hadoop-implementations
Agile Business Intelligence:
http://www.slideshare.net/DavidPortnoy/agile-bi-18491924
Blog:
http://david.portnoy.us

Contenu connexe

Tendances

Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
Cloudera, Inc.
 

Tendances (20)

Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Performance Optimizations in Apache Impala
Performance Optimizations in Apache ImpalaPerformance Optimizations in Apache Impala
Performance Optimizations in Apache Impala
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
All about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdfAll about Zookeeper and ClickHouse Keeper.pdf
All about Zookeeper and ClickHouse Keeper.pdf
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Modernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data PipelinesModernize & Automate Analytics Data Pipelines
Modernize & Automate Analytics Data Pipelines
 
ClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and howClickHouse Monitoring 101: What to monitor and how
ClickHouse Monitoring 101: What to monitor and how
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Memory Management in Apache Spark
Memory Management in Apache SparkMemory Management in Apache Spark
Memory Management in Apache Spark
 
My first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdfMy first 90 days with ClickHouse.pdf
My first 90 days with ClickHouse.pdf
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 

Similaire à Comparison of MPP Data Warehouse Platforms

Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
DataWorks Summit
 

Similaire à Comparison of MPP Data Warehouse Platforms (20)

Hybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop ImplementationsHybrid Data Warehouse Hadoop Implementations
Hybrid Data Warehouse Hadoop Implementations
 
Hadoop is not an Island in the Enterprise
Hadoop is not an Island in the EnterpriseHadoop is not an Island in the Enterprise
Hadoop is not an Island in the Enterprise
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond Hadoop
 
Agile data lake? An oxymoron?
Agile data lake? An oxymoron?Agile data lake? An oxymoron?
Agile data lake? An oxymoron?
 
Hadoop in a Nutshell
Hadoop in a NutshellHadoop in a Nutshell
Hadoop in a Nutshell
 
Deutsche Telekom on Big Data
Deutsche Telekom on Big DataDeutsche Telekom on Big Data
Deutsche Telekom on Big Data
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Hadoop & Data Warehouse
Hadoop & Data Warehouse Hadoop & Data Warehouse
Hadoop & Data Warehouse
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop Analyst Report : The Enterprise Use of Hadoop
Analyst Report : The Enterprise Use of Hadoop
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
 

Plus de David Portnoy

Plus de David Portnoy (9)

DDOD framework infographic
DDOD framework infographicDDOD framework infographic
DDOD framework infographic
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016
 
Industry Uses of HHS Data
Industry Uses of HHS DataIndustry Uses of HHS Data
Industry Uses of HHS Data
 
Open Data Discoverability
Open Data DiscoverabilityOpen Data Discoverability
Open Data Discoverability
 
DDOD for FOIA organizations
DDOD for FOIA organizationsDDOD for FOIA organizations
DDOD for FOIA organizations
 
Intro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data OwnersIntro to Demand-Driven Open Data for Data Owners
Intro to Demand-Driven Open Data for Data Owners
 
Intro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data UsersIntro to Demand Driven Open Data for Data Users
Intro to Demand Driven Open Data for Data Users
 
Case Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human GenomeCase Study in Linked Data and Semantic Web: Human Genome
Case Study in Linked Data and Semantic Web: Human Genome
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Comparison of MPP Data Warehouse Platforms

  • 1. Comparison of MPP Data Warehouse Platforms - David Portnoy- - 312.970.9740- http://LinkedIn.com/in/DavidPortnoy © 2013-2014
  • 2. What’s MPP in data warehousing? MPP (massively parallel processing) data warehouse systems are different from SMP (symmetric multiprocessing) databases: 1. Shared-nothing architectures, with no single point of failure and often hot-swappable components 2. Scale horizontally by adding nodes, rather than moving to a server with more CPUs or higher storage capacity 3. Breaks a large queries across nodes for simultaneous processing 4. Capable of higher data ingestion rates through parallelized data movement
  • 3. Who are the players? Previously, we discussed just the specialized MPP data warehouse vendors:  Teradata  Netezza  Vertica  Greenplum …But We should keep in mind that most major database vendors also have their own MPP products for data warehousing. Examples include:  Microsoft PDW (Parallel Data Warehouse)  DB2 UDB with Database Partitioning Feature (DPF)  Oracle Big Data Appliance, which just provides a gateway between Hadoop to their SMP RDBMS platform Finally, we need to consider the emergence of SQL-oriented, low-latency Hadoop solutions. Examples include:  Impala; Stinger; Apache Drill; Phoenix; Shark; Hadapt  Teradata’s SQL-H (Aster Data); EMC’s HAWQ; IBM’s BigSQL See related writeup: http://www.slideshare.net/DavidPortnoy/hybrid-data- warehouse-hadoop-implementations
  • 4. How to the architectures compare? Looking at the specialized MPP data warehouse vendors Teradata Netezza Greenplum Vertica Hardware Custom MPP, Shared Nothing Custom MPP: SPU + FPGA logic Commodity hardware Custom Hybrid MPP, Shared Everything Type of processing OLTP or OLAP, Can handle high user load OLAP, Assumes few users for heavy analytics OLAP OLAP optimized for large fact tables Inception / Maturity 1979 From Caltech 2000 By Saxena & Hinshaw 2003 From Metapa & Didera 2005 By MIT’s Stonebaker Performance & maintenance Auto-recommended optimization, columnar compression available No need for performance tuning, Must manually reclaim space Based on PostgreSQL, but optimized for MPP and enterprise maint. Column oriented optimization for ingestion, storage/compression, and access Hardware Proprietary Proprietary Commodity Commodity Definitions * OLAP: Online Analytical Processing * OLTP: Online Transaction Processing
  • 5. The industry is moving towards open, commodity solutions Traditional database servers, such as IBM DB2, Oracle Exadata and Microsoft SQL Server, license proprietary software, but run on commodity hardware. Although the nature of SMP architecture typically favors having a few large expensive servers. But the biggest MPP data warehouse vendors all have proprietary software. That’s despite the fact that Netezza and Vertica were on the open source PostgreSQL database. Teradata and Netezza even implement custom hardware, which drives up the price. Hadoop has open sourced the software component leading to a vibrant ecosystem of tools and applications. And with built in redundancy, it’s easy to deploy on cheap commodity servers.
  • 6. Specialized Hardware Commodity Hardware Open Source, Standardized Software Proprietary Software So the trend looks something like this Hadoop ** While up-front cost of Hadoop may be lower, the TCO (total cost of ownership) could be relatively much higher. This is due to the maturity of product, complexity of solutions and scarcity of talent. Traditional Database MPP Data Warehouse
  • 7. Teradata Hardware and licenses the most expensive of all options. Staff costs can be expensive and it takes a great deal of effort to configure and administer. IBM Netezza Hardware and licenses used to be much less than Teradata, but prices have been converging. Some of the highest staff cost due to scarcity, but that’s tempered by lower effort for configuration and admin of single purpose appliance. Greenplum Commodity hardware. Moderately priced licenses. Few Greenplum specialists, but can be staffed by PostgreSQL DBAs and developers. Vertica Commodity hardware. Moderately priced licenses, but special purpose orientation limits usefulness. Few specialists, but can be staffed by traditional DBAs and developerss. Hadoop HBase Commodity hardware and no license cost, resulting in lowest up-front cost. Likely to buy more hardware for redundancy and load. But requires highly technical staff and implementation is less productive than more mature options. So lets look at the relative cost breakdown Hardware & Licenses Development Hardware Licenses Development Hardware & Licenses Development Hardware Development Hardware Licenses Development
  • 8. What’s their relative adoption today? Comparing the supply and demand for administrators and developers can be a proxy for the strength and staying power of a platform. Teradata has been around for many years longer than the alternatives and still dominates the market in terms of install base (3 times next rival) and vibrant development community (6 times next rival). But in recent years Hadoop solutions have outstripped Teradata by a significant margin. (Of course, it should be noted that Hadoop includes use cases outside of traditional data warehousing.)
  • 9. Over time, interest in market leader Teradata has been consistent, but flat While Netezza, Vertica, and Greenplum have grown, they didn’t take significant market share away from Teradata. (The spike in Netezza interest is attributed to its acquisition by IBM.)
  • 10. But when Hadoop is added into the mix, the picture changes drastically Interest in Hadoop has quickly overtaken even traditional Teradata Which might explain why Teradata has been on an acquisition spree for Hadoop related products and services, such as Aster Data The future of its next biggest rival, Netezza, is uncertain as it seeks its niche within IBM’s product lineup.
  • 11. Related Reading Hybrid Data Warehouse-Hadoop Implementations: http://www.slideshare.net/DavidPortnoy/hybrid-data-warehouse- hadoop-implementations Agile Business Intelligence: http://www.slideshare.net/DavidPortnoy/agile-bi-18491924 Blog: http://david.portnoy.us