Pivotal Big Data Suite is a comprehensive platform that allows companies to modernize their data infrastructure, gain insights through advanced analytics, and build analytic applications at scale. It includes components for data processing, storage, analytics, in-memory processing, and application development. The suite is based on open source software, supports multiple deployment options, and provides an agile approach to help companies transform into data-driven enterprises.
1. pivotal.io
PIVOTAL HANDOUT
Pivotal Big Data Suite
Product Suite
COMPLETE PLATFORM FOR DATA-DRIVEN ENTERPRISES
Many industry stalwarts have found their traditional business models under threat by a
new generation of fast growing competitors that leverage big data and analytics. These
new companies are transforming and redefining markets by creating innovative customer
experiences with intelligent, customer-centered applications.
Powering these applications are significant advances made in data processing and analytics
with technologies such as scale-out processing, machine learning, and in-memory
computation. These advances leverage hardware trends such as cloud computing,
convergence of storage and compute resources, and rapidly increasing RAM per system.
Collectively known as big data and advanced analytics, these technologies are developed
within open source communities.
Pivotal Software is a leading contributor to many big data and analytics open source
software projects, and is dedicated to driving innovation in the open source ecosystem.
To help companies adopt big data and analytics and create data-driven business models,
Pivotal has rolled these open source technologies into a comprehensive platform called
Pivotal Big Data Suite, as depicted in Figure 1. Big Data Suite allows companies to
modernize their data infrastructure, discover more insights with advanced analytics, and
build analytic applications at scale.
KEY ADVANTAGES
• Quickly deploy and manage an
analytics-optimized business data lake
based on Hadoop
• Discover more insights using advanced
analytics with SQL on Hadoop or an
analytics data warehouse
• Innovate at scale with smart,
predictive applications backed by
distributed in-memory data stores
FEATURES OF BIG DATA SUITE
• Comprehensive offering covering
data processing & storage, advanced
analytics, in-memory data processing
& messaging
• Works with Pivotal Cloud Foundry –
deploy with Ops Manager, consume
as services within Pivotal Cloud
Foundry apps
• Compatible with Open Data
Platform (ODP) core based
distributions of Hadoop
• Based on open source
• Processing core-based subscription
license for 1 to 3 years
• Flexible licensing – reallocate licensed
core capacity between components
depending on need
• Multiple deployment options:
commodity hardware, appliance,
virtualized, cloud and hybrid cloud
Overview
2. pivotal.io
PIVOTAL HANDOUT
MODERNIZE DATA INFRASTRUCTURE
Store and Process Any Size and Type of Data
A first step for many companies in becoming a data-driven enterprise is to deploy a
modern data infrastructure for storage and data processing based on Hadoop. Pivotal
Big Data Suite helps companies with this transformation at the data processing layer by
including Spring XD, Pivotal HD, and Cloud Foundry Operations Manager.
In an agile infrastructure, data scientists and architects need a rapid, scalable way to
develop specific data flows for ingestion and processing. Spring XD helps customers
quickly create data pipelines to orchestrate the flow of data from any source, between
processing steps, and into any final repository.
Massive data volumes and enterprise IT transformation will require that future data
storage will be based on HDFS. Pivotal HD is a distribution of Hadoop based on Open
Data Platform (ODP) core that is targeted for analytical use cases. Pivotal HD provides a
scale-out flexible data management framework that can handle any data type. Pivotal HD
can work with any big data ecosystem applications or tools that support ODP-based
Hadoop distributions.
PIVOTAL BIG DATA SUITE
COMPONENTS OF BIG DATA SUITE
• Pivotal HD - ODP core-based Hadoop
distribution targeting SQL and
advanced analytics
• Pivotal Greenplum Database®
-
Leading analytical massively-parallel
processing data warehouse
• Pivotal HAWQ®
- Highly scalable ANSI-
compliant SQL on Hadoop analytic
query engine
• Pivotal GemFire®
- High-performing
distributed in-memory NoSQL
database
• Spring XD - Distributed data pipeline
data ingestion, stream processing
and orchestration
• Redis - Leading scalable key-value
store and data structure server
• RabbitMQ™
- Leading scalable open
source reliable message queue for
applications
• Pivotal Big Data Suite on Pivotal
Cloud Foundry - Big Data Suite
components exposed as data
services in Pivotal Cloud Foundry
• Pivotal Cloud Foundry Ops Manager -
deployment and management of
Cloud Foundry PaaS
Figure 1. Pivotal Big Data Suite is the advanced analytics and in-memory processing stack for
data-driven enterprises.
3. pivotal.io
PIVOTAL HANDOUT
In-memory computing, where entire data sets reside in memory, are future state of
the art for analytics and processing. Pivotal HD includes the powerful Spark stack for
in-memory distributed data processing.
IT infrastructures are migrating to open cloud platforms. To help customers make this
transition, an instance of Pivotal Cloud Foundry Ops Manager is provided to automate
deployment of Big Data Suite components and help Cloud Foundry applications leverage
Big Data Suite capabilities as services. This delivers a complete agile data stack, in a single
subscription offering.
Modernizing data infrastructure allows customers to implement a business data lake.
Data from any source can be ingested in any format, whether as batch files or at
real-time streaming velocity. Now customers have additional flexibility for performing
large scale ETL such as processing a data stream before storage. It becomes practical to
run SQL queries on very large data sets at interactive speed.
DISCOVER MORE INSIGHTS WITH ADVANCED ANALYTICS
Massively Parallel Processing on Large Data Sets
A key capability of data-driven enterprises is their ability to leverage data science and
advanced analytics. For advanced analytics, Pivotal Big Data Suite includes two massively
scalable SQL engines: HAWQ and Pivotal Greenplum Database. HAWQ is the most
advanced SQL on Hadoop engine in the industry. It provides interactive and complex query
processing on very large data leveraging compute resources directly in Hadoop nodes.
Pivotal Greenplum Database is the leading analytical data warehouse with a shared-nothing
scale-out architecture, fast data loading, and enterprise-grade reliability, administration,
and advanced security capabilities. Both HAWQ and Greenplum Database share the cost-
based Pivotal Query Optimizer technology which dramatically speeds up execution of
complex joins. Both engines provide massively parallel execution of powerful open source
data science libraries such as MADlib.
HAWQ will run in any ODP-based distribution of Hadoop and tightly integrates with
management tools within Hadoop such as Ambari, YARN, and HCatalog. Pivotal Greenplum
database provides import and export integration with most leading Hadoop distributions.
By deploying an advanced analytics platform, customers can apply data science to discover
new insights for solving business problems. Data scientists can run complex queries at
breakthrough speed on petabyte-scale data sets, and access powerful predictive analytics
and machine learning capabilities based on SQL.
PIVOTAL BIG DATA SUITE
4. pivotal.io
PIVOTAL HANDOUT
BUILD ANALYTIC APPLICATIONS AT SCALE
Scale-Out Apps with Elastic, Distributed In-memory Data Stores
Data-driven enterprises are able to take insights they glean from their data and
operationalize them through massively scaled analytic-driven applications.
Pivotal Big Data Suite provides key building blocks for rapid development and deployment
of high scale data-centric applications. These include Big Data Suite on Pivotal Cloud
Foundry, Pivotal GemFire, Redis, and RabbitMQ.
Big data application development teams can radically speed time to market by leveraging
Pivotal Cloud Foundry as their development and deployment environment. All components
of Big Data Suite can be accessed as services within Pivotal Cloud Foundry, and Big
Data Suite includes an instance of Pivotal Cloud Foundry Ops Manager to automate
this deployment.
Pivotal GemFire is a distributed, in-memory NoSQL database. This enables enterprises
to build scaled-out, highly available transactional systems with sub-second latency
requirements. GemFire-powered applications can process many simultaneous operations
and maintain sub-second response time at linear scale. Examples of such applications
include large scale ticketing or financial trading applications.
The large volumes of historical data typically generated by these kinds of applications
can be archived into traditional RDBMS or pipelined to the analytical components
within Big Data Suite using Spring XD.
Big Data Suite also provides support for Redis and RabbitMQ either as services within
Pivotal Cloud Foundry, or as part of a stand alone application stack.
With Pivotal Big Data Suite, data-driven companies can rapidly turn their insights into
action and deploy high scale analytic applications.. Such applications can support
mobile customer experiences, mass market transactions, and global Internet of
Things networks leading to new revenue opportunities and competitive advantages.
PIVOTAL BIG DATA SUITE