SlideShare une entreprise Scribd logo
1  sur  41
Télécharger pour lire hors ligne
1© 2016 Pivotal Software, Inc. All rights reserved.
Introduction
to Greenplum
Database
January, 2016
2© 2016 Pivotal Software, Inc. All rights reserved.
Forward Looking Statements
This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially
from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in
general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of
product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including
but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in
VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of
customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth
of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and
achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory;
(xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to
protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed
previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange
Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.
3© 2016 Pivotal Software, Inc. All rights reserved.
Ÿ  Relational database system for big data
Ÿ  Mission critical & system of record product with supporting tools and ecosystem
Ÿ  Fully open source with a global community of developers and users
Ÿ  Implement world’s leading research in database technology across all components
–  Optimizer, Query Execution
–  Transaction Processing, Database Storage, Compression, High Availability
–  Embedded Programming Languages (Python, R, Java, etc …. )
–  In-Database analytics in domains (e.g. Geospatial, Text, Machine Learning, Mathematics, etc …. )
Ÿ  Performance tuned for multiple workload profiles
–  Analytics, long running queries, short running queries, mixed workloads
Ÿ  Large industrial focused system
–  Financial, Government, Telecom, Retail, Manufacturing, Oil & Gas, etc…….
Greenplum Database Mission & Strategy
4© 2016 Pivotal Software, Inc. All rights reserved.
Ÿ  An ambitious project
–  10 years in the making
–  Investment of hundred of millions of dollars
–  Potential to define a new market and disrupt traditional EDW vendors
Ÿ  www.greenplum.org
–  Github code
–  mailing lists / community engagement
–  Global project w/ external contributors
Ÿ  Pivotal Greenplum
–  Enterprise software distribution & release management
–  Pivotal expertise
–  24-hour global support
–  5.0 release in Early Q2 2016
Greenplum Open Source
5© 2016 Pivotal Software, Inc. All rights reserved.
PostgreSQL Compatibility
Roadmap
•  Strategic backport key features from PostgreSQL to Greenplum … JSONB, UUID,
Variadic functions, Default function arguments, etc.
•  Consistent back porting of patches from older PostgreSQL to Greenplum … Initial
goal to reach 9.0
6© 2016 Pivotal Software, Inc. All rights reserved. 6
GPDB Architecture
Overview
7© 2016 Pivotal Software, Inc. All rights reserved.
MPP Shared Nothing Architecture
Standby
Master
Segment Host with one or more Segment Instances
Segment Instances process queries in parallel
Performance Through Segment Instance Parallelism
High speed interconnect for
continuous pipelining of data
processing
…
Master
Host
SQL
Master Host and Standby Master Host
Master coordinates work with Segment Hosts
Interconnect
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
Segment Hosts have their own
CPU, disk and memory (shared
nothing) Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node1
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node2
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node3
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
nodeN
8© 2016 Pivotal Software, Inc. All rights reserved.
Master Host
Master Segment
Catalog
Query Optimizer
Distributed TM
DispatchQuery Executor
Parser enforces
syntax, semantics
and produces a
parse tree
Client
Accepts client connections,
incoming user requests and
performs authentication
Parser
Master Host
9© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal Query Optimizer
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
DispatcherQuery Executor
Parser Query Optimizer
Consumes the
parse tree and
produces the query
plan
Query execution
plan contains how
the query is
executed
Master Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
10© 2016 Pivotal Software, Inc. All rights reserved.
Query Dispatcher
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
Query Optimizer
Query Executor
Parser
Dispatcher
Responsible for
communicating the
query plan to
segments
Allocates cluster
resources required to
perform the job and
accumulating/
presenting final
results
Master Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
11© 2016 Pivotal Software, Inc. All rights reserved.
Query Executor
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
Query Optimizer
Query Dispatcher
Parser
Query Executor
Responsible for
executing the steps
in the plan
(e.g. open file,
iterate over tuples)
Communicates its
intermediate results
to other executor
processes
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
12© 2016 Pivotal Software, Inc. All rights reserved.
Interconnect
Local Storage
Master Segment
CatalogDistributed TM
Query Optimizer
Query Dispatcher
Parser
Query Executor
Interconnect
Responsible for
serving tuples from
one segment to
another (motion
operations) to
perform joins, etc.
Uses UDP for
optimal performance
and scalability
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
13© 2016 Pivotal Software, Inc. All rights reserved.
System Catalog
Local Storage
Master Segment
Query Executor
Distributed TM
Interconnect
Query Optimizer
Query Dispatcher
Parser
Catalog
Stores and manages
metadata for
databases, tables,
columns, etc.
Master keeps a copy
of the metadata
coordinated on
every segment host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
14© 2016 Pivotal Software, Inc. All rights reserved.
Distributed Transaction Management
Local Storage
Master Segment
Query Executor
Catalog
Interconnect
Query Optimizer
Query Dispatcher
Parser
Distributed TM
Segments have their
own commit and replay
logs and decide when
to commit, abort for
their own transactions
DTM resides on the
master and
coordinates the
commit and abort
actions of segments
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
15© 2016 Pivotal Software, Inc. All rights reserved.
GPDB High Availability
Ÿ  Master Host mirroring
–  Warm Standby Master Host
▪  Replica of Master Host system catalogs
–  Eliminates single point of failure
–  Synchronization process between Master Host and Standby Master Host
▪  Uses PostgreSQL WAL Replication
Ÿ  Segment mirroring
–  Creates a mirror segment for every primary segment
▪  Uses a custom file block replication process
–  If a primary segment becomes unavailable automatic failover to the mirror
16© 2016 Pivotal Software, Inc. All rights reserved.
Fault Detection and Recovery
Ÿ  ftsprobe fault detection process monitors and scans segments and database
processes at configurable intervals
Ÿ  Query gp_segment_configuration catalog table for detailed information about a
failed segment
▪  $ psql -c "SELECT * FROM gp_segment_configuration WHERE status='d';"
Ÿ  When ftsprobe cannot connect to a segment it marks it as down
–  Will remain down until administrator manually recovers the failed segment
using gprecoverseg utility
Ÿ  Automatic failover to the mirror segment
–  Subsequent connection requests are switched to the mirror segment
17© 2016 Pivotal Software, Inc. All rights reserved.
CREATE TABLE Define Data Distributions
Ÿ  One of the most important aspects of GPDB!
Ÿ  Every table has a distribution method
Ÿ  DISTRIBUTED BY (column)
–  Uses a hash distribution
Ÿ  DISTRIBUTED RANDOMLY
–  Uses a random distribution which is not guaranteed to provide a perfectly even
distribution
Ÿ  Explicitly define a column or random distribution for all tables
–  Do not use the default
18© 2016 Pivotal Software, Inc. All rights reserved.
DISTRIBUTED BY (column_name)
•  Use a single column that will distribute data across all
segments evenly
•  For large tables significant performance gains can be
obtained with local joins (co-located joins)
–  Distribute on the same column for tables commonly joined together
•  Co-located join is performed within the segment
–  Segment operates independently of other segments
•  Co-located join eliminates or minimizes motion operations
–  Broadcast motion or Redistribute motion
19© 2016 Pivotal Software, Inc. All rights reserved.
Use the Same Distribution Key for Commonly Joined
Tables
= Distribute on the same key
used in the join
to obtain local joins
Segment 1A
Segment 2A
customer
(c_customer_id)
freg_shopper
(f_customer_id)
customer
(c_customer_id)
freq_shopper
(f_customer_id)
=
=
20© 2016 Pivotal Software, Inc. All rights reserved.
Redistribution Motion
WHERE customer.c_customer_id = freg_shopper.f_customer_id
freq_shopper table is dynamically redistributed on f_customer_id
Segment 1A
customer
(c_customer_id)
customer_id =102
freg_shopper
(f_trans_number)
Segment 2A
customer
(c_customer_id)
customer_id=745
freq_shopper
(f_trans_number)
customer_id=102
Segment 3A
customer
(c_customer_id)
freq_shopper
(f_trans_number)
customer_id=745
21© 2016 Pivotal Software, Inc. All rights reserved.
Broadcast Motion
WHERE customer.c_statekey = state.s_statekey
The state table is dynamically broadcasted to all segments
Segment 1A Segment 2A Segment 3A
customer
(c_customer_id)
state
(s_statekey)
AK, AL, AZ, CA…
customer
(c_customer_id)
state
(s_statekey)
AK, AL, AZ, CA…
customer
(c_customer_id)
state
(s_statekey)
AK, AL, AZ, CA…
22© 2016 Pivotal Software, Inc. All rights reserved.
Data Distribution: The Key to Parallelism
The primary strategy and goal is to spread data evenly across
all segment instances. Most important in a MPP shared nothing
architecture!
43 Oct 20 2005 12
64 Oct 20 2005 111
45 Oct 20 2005 42
46 Oct 20 2005 64
77 Oct 20 2005 32
48 Oct 20 2005 12
Order
Order#
Order
Date
Customer
ID
50 Oct 20 2005 34
56 Oct 20 2005 213
63 Oct 20 2005 15
44 Oct 20 2005 102
53 Oct 20 2005 82
55 Oct 20 2005 55
23© 2016 Pivotal Software, Inc. All rights reserved.
Master
Parallel Data Scans Across All Segments
SELECT COUNT(*)
FROM orders
WHERE order_date >= ‘Oct 20 2007’
AND order_date < ‘Oct 27 2007’
4,423,323
Each Segment Scans Data Simultaneously in Parallel
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segments Return ResultsReturn ResultsSend Plan to SegmentsDevelop Query Plan
24© 2016 Pivotal Software, Inc. All rights reserved.
CREATE TABLE Define Partitioning
Ÿ  Reduces the amount of data to be scanned by reading only the relevant data
needed to satisfy a query
–  The only goal of partitioning is to achieve partition elimination aka partition
pruning
Ÿ  Is not a substitution for distributions
–  A good distribution strategy and partitioning that achieves partition
elimination unlocks performance magic
Ÿ  Uses table inheritance and constraints
–  Persistent relationship between parent and child tables
25© 2016 Pivotal Software, Inc. All rights reserved.
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Distributions and Partitioning
SELECT COUNT(*)
FROM orders
WHERE order_date >= ‘Oct 20 2007’
AND order_date < ‘Oct 27 2007’
&
Evenly distribute orders data across all segments Only scans the relevant order partitions
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
26© 2016 Pivotal Software, Inc. All rights reserved.
Define the Storage Model
CREATE TABLE
Ÿ  Heap Tables versus Append Optimized (AO) Tables
Ÿ  Row oriented storage versus Column oriented storage
Ÿ  Compression
–  Table level compression applied to entire table
–  Column level compression applied to a specific column w/ columnar storage
–  Zlib level with Run Length Encoding Optional
27© 2016 Pivotal Software, Inc. All rights reserved.
Heap Tables or AO Tables
•  Use heap for tables and partitions that will receive singleton
UPDATE, DELETE and INSERT operations
•  Use heap storage for tables and partitions that will receive
concurrent UPDATE, DELETE and INSERT operations
•  Use AO for tables and partitions that are updated
infrequently after the initial load and subsequent inserts or
updates are only performed in large batch operations
28© 2016 Pivotal Software, Inc. All rights reserved.
GPDB Data Loading Options
Loading Method Common Uses Examples
INSERTS •  Operational Workloads
•  OBDC/JDBC Interfaces
INSERT INTO performers
(name, specialty)
VALUES
(‘Sinatra’, ‘Singer’);
COPY
•  Quick and easy data in
•  Legacy PostgreSQL applications
•  Output sample results from SQL statements
COPY performers
FROM ‘/tmp/comedians.dat’
WITH DELIMITER ‘|’;
External Tables
•  High speed bulk loads
•  Parallel loading using gpfdist protocol
•  Local file, remote file, HTTP or HDFS based
sources
INSERT INTO craps_bets
SELECT g.bet_type
, g.bet_dttm
, g.bt_amt
FROM x_allbets b
JOIN games g
ON ( g.id = b.game_id )
WHERE g.name = ‘CRAPS’;
GPLOAD
•  Simplifies external table method (YAML
wrapper )
•  Supports Insert, Merge & Update
gpload –f blackjack_bets.yml
29© 2016 Pivotal Software, Inc. All rights reserved.
Example Load Architectures
Master Host
Segment Host Segment Host Segment Host
ETL Host
Data file
Data file
Data file
Data file
Data file
Data file
gpdfdist gpdfdist
ETL Host
Data file
Data file
Data file
Data file
Data file
Data file
gpdfdist gpdfdist
Segment
Instance
Segment
Instance
Segment
Instance
Segment
Instance
Segment
Instance
Segment
Instance
Master Instance
Segment Host
Segment
Instance
Segment
Instance
Singleton INSERT
statement
COPY statement
INSERT via external table
or gpload
30© 2016 Pivotal Software, Inc. All rights reserved.
Load Using Regular External Tables
Ÿ  File based (flat files)
–  gpfdist provides the best performance
=# CREATE EXTERNAL TABLE ext_expenses (name text,
date date, amount float4, category text, description text)
LOCATION
( ‘gpfdist://etlhost:8081/*.txt’, ‘gpfdst://etlhost:8082/*.txt’)
FORMAT ’TEXT' (DELIMITER ‘|’ );
$ gpfdist –d /var/load_files1/expenses –p 8081 –l /home/gpadmin/log1 &
$ gpfdist –d /var/load_files2/expenses –p 8082 –l /home/gpadmin/log2 &
31© 2016 Pivotal Software, Inc. All rights reserved.
ANALYZEDB and Database Statistics
•  Accurate statistics are critical for the query optimizer to generate optimal
query plans
–  When a table is analyzed table information about the data is stored
into system catalog tables
•  Always update statistics after loading data
•  Always update statistics after CREATE INDEX operations
•  Always update statistics after INSERT, UPDATE and DELETE
operations that significantly changes the underlying data
32© 2016 Pivotal Software, Inc. All rights reserved.
ANALYZEDB Parallel ANALYZE sessions
•  Invoke concurrent ANALYZE sessions
•  Each session is at individual table/partition level
•  For example:
analyzedb -d myDB -t public.big_fact_table -p 4
•  Parallel level p between 1 and 10. Default value 5.
•  Tune parallel level according to system load
•  In general, 3~5x speed up over single session
33© 2016 Pivotal Software, Inc. All rights reserved.
ANALYZEDB Incremental ANALYZE
•  If a table/partition has not changed (DML, DDL) since last run of
ANALYZEDB, it will be skipped automatically
•  ANALYZEDB keeps a record of which tables have up-to-state stats after
a run on disk in
$MASTER_DATA_DIRECTORY/db_analyze
•  ANALYZEDB compares the current catalog with the state files of last run
to determine the incremental
•  ANALYZEDB captures statistics on root partition table required for the
Pivotal Query Optimizer (PQO)
34© 2016 Pivotal Software, Inc. All rights reserved.
ANALYZEDB Details
•  Incremental analyze does not apply to heap tables
•  Heap tables are always analyzed
•  Catalog tables, views and external tables are automatically
skipped
35© 2016 Pivotal Software, Inc. All rights reserved.
ANALYZEDB Miscellaneous
•  Gently kill analyzedb by Ctrl+C or sending SIGINT – it will resume
at where it left off when restarted
•  Print out progress report while running
•  Refresh root partition stats for the Pivotal Query Optimizer
automatically
•  Analyze tables in descending OID order
•  Use analyzedb -? for other options (using config file, include/
exclude columns, dry run, force non-incremental, quiet mode)
36© 2016 Pivotal Software, Inc. All rights reserved.
Greenplum source code major differences w/ PostgreSQL
https://github.com/greenplum-db/gpdb/tree/master/gpMgmt
Python cluster management code
https://github.com/greenplum-db/gpdb/tree/master/gpAux/gpperfmon
Performance and system management code
https://github.com/greenplum-db/gpdb/tree/master/src/backend/access/appendonly
Append-optimized and columnar tables
https://github.com/greenplum-db/gpdb/tree/master/src/backend/access/external
External tables
https://github.com/greenplum-db/gpdb/tree/master/src/backend/cdb
Main cluster database code, such as mirroring etc
https://github.com/greenplum-db/gpdb/tree/master/src/backend/cdb/motion
Interconnect between nodes
37© 2016 Pivotal Software, Inc. All rights reserved. 37
Live Demo
38© 2016 Pivotal Software, Inc. All rights reserved.
Core Greenplum Engine
UDP Interconnect Flow Control
Roadmap
•  Replicated Tables; High Performance Temp Tables
•  Faster Query Dispatch; Short Query Performance
•  Query Plan Code Generation
•  Small Material Aggregates
•  Refactor Analyze for Performance Gains
39© 2016 Pivotal Software, Inc. All rights reserved.
Polymorphic Storage™
User Definable Storage Layout
Ÿ  Columnar storage compresses better
Ÿ  Optimized for retrieving a subset of the
columns when querying
Ÿ  Compression can be set differently per
column: gzip (1-9), quicklz, delta, RLE
Ÿ  Row oriented faster when returning
all columns
Ÿ  HEAP for many updates and deletes
Ÿ  Use indexes for drill through queries
TABLE ‘SALES’
Jun
Column-orientedRow-oriented
Oct Year
-1
Year
-2
External HDFS
Ÿ  Less accessed partitions
on HDFS with external
partitions to seamlessly
query all data
Ÿ  Text, CSV, Binary, Avro,
Parquet format
Ÿ  All major HDP Distros
Nov DecJul Aug Sep
Roadmap
•  GPHDFS Predicate Pushdown
•  S3 Object Store External Tables
•  GPDB to GPDB External Tables
•  HAWQ External Tables
40© 2016 Pivotal Software, Inc. All rights reserved.
Pivotal Greenplum Roadmap Highlights
●  S3 External Tables
●  Performance tuned for AWS
●  Dynamic Code Generation using
LLVM
●  Short running query performance
enhancements
●  Faster analyze
●  WAL Replication Segment
Mirroring
●  Incremental restore MVP
●  Disk space full warnings
●  Snapshot Backup
●  Anaconda Python Modules:
NLTK, etc
●  Time Series Gap Filling
●  Complex Numbers
●  PostGIS Raster Support
●  Geospatial Trajectories
●  Path analytics
●  Enhanced SVM module
●  Py-Madlib
●  Lock Free Backup
41© 2016 Pivotal Software, Inc. All rights reserved.
•  Government detection of benefits that should not be made
•  Government detection of tax fraud
•  Government economic statistics research database
•  Commercial banking wealth management data science and product development
•  Commercial clearing corporation's risk and trade repositories reporting
•  Pharmaceutical company vaccine potency prediction based on manufacturing sensors
•  401K providers analytics on investment choices
•  Auto manufacturer’s analytics on predictive maintenance
•  Corporate/Financial internal email and communication surveillance and reporting
•  Oil drilling equipment predictive maintenance
•  Mobile telephone company enterprise data warehouse
•  Retail store chain customer purchases analytics
•  Airlines loyalty program analytics
•  Telecom company network performance and availability analytics
•  Corporate network anomalous behavior and intrusion detections
•  Semiconductor Fab sensor analytics and reporting
Highlighted Greenplum successes

Contenu connexe

Tendances

Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data EngineeringHarald Erb
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflakeSivakumar Ramar
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summits
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache HiveAvkash Chauhan
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overviewDataArt
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationMarkus Michalewicz
 
MySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationMySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationNuno Carvalho
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 

Tendances (20)

Snowflake Overview
Snowflake OverviewSnowflake Overview
Snowflake Overview
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No LimitsAWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
AWS Summit Singapore 2019 | Snowflake: Your Data. No Limits
 
Introduction to HBase
Introduction to HBaseIntroduction to HBase
Introduction to HBase
 
Introduction to Apache Hive
Introduction to Apache HiveIntroduction to Apache Hive
Introduction to Apache Hive
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 
Sqoop
SqoopSqoop
Sqoop
 
The Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - PresentationThe Oracle RAC Family of Solutions - Presentation
The Oracle RAC Family of Solutions - Presentation
 
Spark architecture
Spark architectureSpark architecture
Spark architecture
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
MySQL High Availability with Group Replication
MySQL High Availability with Group ReplicationMySQL High Availability with Group Replication
MySQL High Availability with Group Replication
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 

En vedette

Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts EMC
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
Demonstrating the Future of Data Science
Demonstrating the Future of Data ScienceDemonstrating the Future of Data Science
Demonstrating the Future of Data Sciencegreenplum
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015PivotalOpenSourceHub
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1Nikolay Samokhvalov
 
Green plum培训材料
Green plum培训材料Green plum培训材料
Green plum培训材料锐 张
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopBigData Research
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data ScienceKrishna Sankar
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analyticseaiti
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...PGDay Campinas
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllDataWorks Summit
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...EMC
 

En vedette (19)

Greenplum feature
Greenplum featureGreenplum feature
Greenplum feature
 
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts Whitepaper : Working with Greenplum Database using Toad for Data Analysts
Whitepaper : Working with Greenplum Database using Toad for Data Analysts
 
Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
Os Lonergan
Os LonerganOs Lonergan
Os Lonergan
 
Demonstrating the Future of Data Science
Demonstrating the Future of Data ScienceDemonstrating the Future of Data Science
Demonstrating the Future of Data Science
 
5. pivotal hd 2013
5. pivotal hd 20135. pivotal hd 2013
5. pivotal hd 2013
 
Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015Greenplum Database Open Source December 2015
Greenplum Database Open Source December 2015
 
#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1#PostgreSQLRussia в банке Тинькофф, доклад №1
#PostgreSQLRussia в банке Тинькофф, доклад №1
 
Green plum培训材料
Green plum培训材料Green plum培训材料
Green plum培训材料
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
HAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoopHAWQ: a massively parallel processing SQL engine in hadoop
HAWQ: a massively parallel processing SQL engine in hadoop
 
Pandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data SciencePandas, Data Wrangling & Data Science
Pandas, Data Wrangling & Data Science
 
Greenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and AnalyticsGreenplum: Driving the future of Data Warehousing and Analytics
Greenplum: Driving the future of Data Warehousing and Analytics
 
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
Greenplum: O banco de dados open source massivamente paralelo baseado em Post...
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
 
Pivotal hawq internals
Pivotal hawq internalsPivotal hawq internals
Pivotal hawq internals
 
Apache HAWQ Architecture
Apache HAWQ ArchitectureApache HAWQ Architecture
Apache HAWQ Architecture
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
 

Similaire à Introduction to Greenplum

Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsEsther Vasiete
 
Java EE, What's Next? by Anil Gaur
Java EE, What's Next? by Anil GaurJava EE, What's Next? by Anil Gaur
Java EE, What's Next? by Anil GaurTakashi Ito
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsMichael Häusler
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Developers
 
Prakash_Profile(279074)
Prakash_Profile(279074)Prakash_Profile(279074)
Prakash_Profile(279074)Prakash s
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsGreg Makowski
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
 
Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...
 Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ... Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...
Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...Contribyte
 
Cognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZECognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZEStepan Kutaj
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsColloquium
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined StorageSandeep Patil
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on OpenstackOpen Stack
 
Infrastructure Specification - Hosting & Mota HQ IT
Infrastructure Specification - Hosting & Mota HQ ITInfrastructure Specification - Hosting & Mota HQ IT
Infrastructure Specification - Hosting & Mota HQ ITGregory Weiss
 
Understanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce PlatformUnderstanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce PlatformSalesforce Developers
 

Similaire à Introduction to Greenplum (20)

Data Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source ToolsData Science at Scale on MPP databases - Use Cases & Open Source Tools
Data Science at Scale on MPP databases - Use Cases & Open Source Tools
 
Java EE, What's Next? by Anil Gaur
Java EE, What's Next? by Anil GaurJava EE, What's Next? by Anil Gaur
Java EE, What's Next? by Anil Gaur
 
Integration Patterns for Big Data Applications
Integration Patterns for Big Data ApplicationsIntegration Patterns for Big Data Applications
Integration Patterns for Big Data Applications
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
 
Prakash_Profile(279074)
Prakash_Profile(279074)Prakash_Profile(279074)
Prakash_Profile(279074)
 
Vineet Kurrewar
Vineet KurrewarVineet Kurrewar
Vineet Kurrewar
 
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning SolutionsKamanja: Driving Business Value through Real-Time Decisioning Solutions
Kamanja: Driving Business Value through Real-Time Decisioning Solutions
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director Technology
 
Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...
 Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ... Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...
Using IBM Rational Change as an Enterprise-Wide Error Management Solution – ...
 
Cognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZECognos CIO CEE 2010 Prague CZE
Cognos CIO CEE 2010 Prague CZE
 
Legacy Migration Overview
Legacy Migration OverviewLegacy Migration Overview
Legacy Migration Overview
 
Legacy Migration
Legacy MigrationLegacy Migration
Legacy Migration
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Monitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS SolutionsMonitoring IAAS & PAAS Solutions
Monitoring IAAS & PAAS Solutions
 
Genomics Deployments - How to Get Right with Software Defined Storage
 Genomics Deployments -  How to Get Right with Software Defined Storage Genomics Deployments -  How to Get Right with Software Defined Storage
Genomics Deployments - How to Get Right with Software Defined Storage
 
PaaS on Openstack
PaaS on OpenstackPaaS on Openstack
PaaS on Openstack
 
Infrastructure Specification - Hosting & Mota HQ IT
Infrastructure Specification - Hosting & Mota HQ ITInfrastructure Specification - Hosting & Mota HQ IT
Infrastructure Specification - Hosting & Mota HQ IT
 
Understanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce PlatformUnderstanding Multitenancy and the Architecture of the Salesforce Platform
Understanding Multitenancy and the Architecture of the Salesforce Platform
 

Dernier

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 

Dernier (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Introduction to Greenplum

  • 1. 1© 2016 Pivotal Software, Inc. All rights reserved. Introduction to Greenplum Database January, 2016
  • 2. 2© 2016 Pivotal Software, Inc. All rights reserved. Forward Looking Statements This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.
  • 3. 3© 2016 Pivotal Software, Inc. All rights reserved. Ÿ  Relational database system for big data Ÿ  Mission critical & system of record product with supporting tools and ecosystem Ÿ  Fully open source with a global community of developers and users Ÿ  Implement world’s leading research in database technology across all components –  Optimizer, Query Execution –  Transaction Processing, Database Storage, Compression, High Availability –  Embedded Programming Languages (Python, R, Java, etc …. ) –  In-Database analytics in domains (e.g. Geospatial, Text, Machine Learning, Mathematics, etc …. ) Ÿ  Performance tuned for multiple workload profiles –  Analytics, long running queries, short running queries, mixed workloads Ÿ  Large industrial focused system –  Financial, Government, Telecom, Retail, Manufacturing, Oil & Gas, etc……. Greenplum Database Mission & Strategy
  • 4. 4© 2016 Pivotal Software, Inc. All rights reserved. Ÿ  An ambitious project –  10 years in the making –  Investment of hundred of millions of dollars –  Potential to define a new market and disrupt traditional EDW vendors Ÿ  www.greenplum.org –  Github code –  mailing lists / community engagement –  Global project w/ external contributors Ÿ  Pivotal Greenplum –  Enterprise software distribution & release management –  Pivotal expertise –  24-hour global support –  5.0 release in Early Q2 2016 Greenplum Open Source
  • 5. 5© 2016 Pivotal Software, Inc. All rights reserved. PostgreSQL Compatibility Roadmap •  Strategic backport key features from PostgreSQL to Greenplum … JSONB, UUID, Variadic functions, Default function arguments, etc. •  Consistent back porting of patches from older PostgreSQL to Greenplum … Initial goal to reach 9.0
  • 6. 6© 2016 Pivotal Software, Inc. All rights reserved. 6 GPDB Architecture Overview
  • 7. 7© 2016 Pivotal Software, Inc. All rights reserved. MPP Shared Nothing Architecture Standby Master Segment Host with one or more Segment Instances Segment Instances process queries in parallel Performance Through Segment Instance Parallelism High speed interconnect for continuous pipelining of data processing … Master Host SQL Master Host and Standby Master Host Master coordinates work with Segment Hosts Interconnect Segment Host Segment Instance Segment Instance Segment Instance Segment Instance Segment Hosts have their own CPU, disk and memory (shared nothing) Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node1 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node2 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node3 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance nodeN
  • 8. 8© 2016 Pivotal Software, Inc. All rights reserved. Master Host Master Segment Catalog Query Optimizer Distributed TM DispatchQuery Executor Parser enforces syntax, semantics and produces a parse tree Client Accepts client connections, incoming user requests and performs authentication Parser Master Host
  • 9. 9© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Query Optimizer Local Storage Master Segment CatalogDistributed TM Interconnect DispatcherQuery Executor Parser Query Optimizer Consumes the parse tree and produces the query plan Query execution plan contains how the query is executed Master Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage
  • 10. 10© 2016 Pivotal Software, Inc. All rights reserved. Query Dispatcher Local Storage Master Segment CatalogDistributed TM Interconnect Query Optimizer Query Executor Parser Dispatcher Responsible for communicating the query plan to segments Allocates cluster resources required to perform the job and accumulating/ presenting final results Master Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage
  • 11. 11© 2016 Pivotal Software, Inc. All rights reserved. Query Executor Local Storage Master Segment CatalogDistributed TM Interconnect Query Optimizer Query Dispatcher Parser Query Executor Responsible for executing the steps in the plan (e.g. open file, iterate over tuples) Communicates its intermediate results to other executor processes Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 12. 12© 2016 Pivotal Software, Inc. All rights reserved. Interconnect Local Storage Master Segment CatalogDistributed TM Query Optimizer Query Dispatcher Parser Query Executor Interconnect Responsible for serving tuples from one segment to another (motion operations) to perform joins, etc. Uses UDP for optimal performance and scalability Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 13. 13© 2016 Pivotal Software, Inc. All rights reserved. System Catalog Local Storage Master Segment Query Executor Distributed TM Interconnect Query Optimizer Query Dispatcher Parser Catalog Stores and manages metadata for databases, tables, columns, etc. Master keeps a copy of the metadata coordinated on every segment host Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 14. 14© 2016 Pivotal Software, Inc. All rights reserved. Distributed Transaction Management Local Storage Master Segment Query Executor Catalog Interconnect Query Optimizer Query Dispatcher Parser Distributed TM Segments have their own commit and replay logs and decide when to commit, abort for their own transactions DTM resides on the master and coordinates the commit and abort actions of segments Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 15. 15© 2016 Pivotal Software, Inc. All rights reserved. GPDB High Availability Ÿ  Master Host mirroring –  Warm Standby Master Host ▪  Replica of Master Host system catalogs –  Eliminates single point of failure –  Synchronization process between Master Host and Standby Master Host ▪  Uses PostgreSQL WAL Replication Ÿ  Segment mirroring –  Creates a mirror segment for every primary segment ▪  Uses a custom file block replication process –  If a primary segment becomes unavailable automatic failover to the mirror
  • 16. 16© 2016 Pivotal Software, Inc. All rights reserved. Fault Detection and Recovery Ÿ  ftsprobe fault detection process monitors and scans segments and database processes at configurable intervals Ÿ  Query gp_segment_configuration catalog table for detailed information about a failed segment ▪  $ psql -c "SELECT * FROM gp_segment_configuration WHERE status='d';" Ÿ  When ftsprobe cannot connect to a segment it marks it as down –  Will remain down until administrator manually recovers the failed segment using gprecoverseg utility Ÿ  Automatic failover to the mirror segment –  Subsequent connection requests are switched to the mirror segment
  • 17. 17© 2016 Pivotal Software, Inc. All rights reserved. CREATE TABLE Define Data Distributions Ÿ  One of the most important aspects of GPDB! Ÿ  Every table has a distribution method Ÿ  DISTRIBUTED BY (column) –  Uses a hash distribution Ÿ  DISTRIBUTED RANDOMLY –  Uses a random distribution which is not guaranteed to provide a perfectly even distribution Ÿ  Explicitly define a column or random distribution for all tables –  Do not use the default
  • 18. 18© 2016 Pivotal Software, Inc. All rights reserved. DISTRIBUTED BY (column_name) •  Use a single column that will distribute data across all segments evenly •  For large tables significant performance gains can be obtained with local joins (co-located joins) –  Distribute on the same column for tables commonly joined together •  Co-located join is performed within the segment –  Segment operates independently of other segments •  Co-located join eliminates or minimizes motion operations –  Broadcast motion or Redistribute motion
  • 19. 19© 2016 Pivotal Software, Inc. All rights reserved. Use the Same Distribution Key for Commonly Joined Tables = Distribute on the same key used in the join to obtain local joins Segment 1A Segment 2A customer (c_customer_id) freg_shopper (f_customer_id) customer (c_customer_id) freq_shopper (f_customer_id) = =
  • 20. 20© 2016 Pivotal Software, Inc. All rights reserved. Redistribution Motion WHERE customer.c_customer_id = freg_shopper.f_customer_id freq_shopper table is dynamically redistributed on f_customer_id Segment 1A customer (c_customer_id) customer_id =102 freg_shopper (f_trans_number) Segment 2A customer (c_customer_id) customer_id=745 freq_shopper (f_trans_number) customer_id=102 Segment 3A customer (c_customer_id) freq_shopper (f_trans_number) customer_id=745
  • 21. 21© 2016 Pivotal Software, Inc. All rights reserved. Broadcast Motion WHERE customer.c_statekey = state.s_statekey The state table is dynamically broadcasted to all segments Segment 1A Segment 2A Segment 3A customer (c_customer_id) state (s_statekey) AK, AL, AZ, CA… customer (c_customer_id) state (s_statekey) AK, AL, AZ, CA… customer (c_customer_id) state (s_statekey) AK, AL, AZ, CA…
  • 22. 22© 2016 Pivotal Software, Inc. All rights reserved. Data Distribution: The Key to Parallelism The primary strategy and goal is to spread data evenly across all segment instances. Most important in a MPP shared nothing architecture! 43 Oct 20 2005 12 64 Oct 20 2005 111 45 Oct 20 2005 42 46 Oct 20 2005 64 77 Oct 20 2005 32 48 Oct 20 2005 12 Order Order# Order Date Customer ID 50 Oct 20 2005 34 56 Oct 20 2005 213 63 Oct 20 2005 15 44 Oct 20 2005 102 53 Oct 20 2005 82 55 Oct 20 2005 55
  • 23. 23© 2016 Pivotal Software, Inc. All rights reserved. Master Parallel Data Scans Across All Segments SELECT COUNT(*) FROM orders WHERE order_date >= ‘Oct 20 2007’ AND order_date < ‘Oct 27 2007’ 4,423,323 Each Segment Scans Data Simultaneously in Parallel Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segments Return ResultsReturn ResultsSend Plan to SegmentsDevelop Query Plan
  • 24. 24© 2016 Pivotal Software, Inc. All rights reserved. CREATE TABLE Define Partitioning Ÿ  Reduces the amount of data to be scanned by reading only the relevant data needed to satisfy a query –  The only goal of partitioning is to achieve partition elimination aka partition pruning Ÿ  Is not a substitution for distributions –  A good distribution strategy and partitioning that achieves partition elimination unlocks performance magic Ÿ  Uses table inheritance and constraints –  Persistent relationship between parent and child tables
  • 25. 25© 2016 Pivotal Software, Inc. All rights reserved. Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Distributions and Partitioning SELECT COUNT(*) FROM orders WHERE order_date >= ‘Oct 20 2007’ AND order_date < ‘Oct 27 2007’ & Evenly distribute orders data across all segments Only scans the relevant order partitions Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D
  • 26. 26© 2016 Pivotal Software, Inc. All rights reserved. Define the Storage Model CREATE TABLE Ÿ  Heap Tables versus Append Optimized (AO) Tables Ÿ  Row oriented storage versus Column oriented storage Ÿ  Compression –  Table level compression applied to entire table –  Column level compression applied to a specific column w/ columnar storage –  Zlib level with Run Length Encoding Optional
  • 27. 27© 2016 Pivotal Software, Inc. All rights reserved. Heap Tables or AO Tables •  Use heap for tables and partitions that will receive singleton UPDATE, DELETE and INSERT operations •  Use heap storage for tables and partitions that will receive concurrent UPDATE, DELETE and INSERT operations •  Use AO for tables and partitions that are updated infrequently after the initial load and subsequent inserts or updates are only performed in large batch operations
  • 28. 28© 2016 Pivotal Software, Inc. All rights reserved. GPDB Data Loading Options Loading Method Common Uses Examples INSERTS •  Operational Workloads •  OBDC/JDBC Interfaces INSERT INTO performers (name, specialty) VALUES (‘Sinatra’, ‘Singer’); COPY •  Quick and easy data in •  Legacy PostgreSQL applications •  Output sample results from SQL statements COPY performers FROM ‘/tmp/comedians.dat’ WITH DELIMITER ‘|’; External Tables •  High speed bulk loads •  Parallel loading using gpfdist protocol •  Local file, remote file, HTTP or HDFS based sources INSERT INTO craps_bets SELECT g.bet_type , g.bet_dttm , g.bt_amt FROM x_allbets b JOIN games g ON ( g.id = b.game_id ) WHERE g.name = ‘CRAPS’; GPLOAD •  Simplifies external table method (YAML wrapper ) •  Supports Insert, Merge & Update gpload –f blackjack_bets.yml
  • 29. 29© 2016 Pivotal Software, Inc. All rights reserved. Example Load Architectures Master Host Segment Host Segment Host Segment Host ETL Host Data file Data file Data file Data file Data file Data file gpdfdist gpdfdist ETL Host Data file Data file Data file Data file Data file Data file gpdfdist gpdfdist Segment Instance Segment Instance Segment Instance Segment Instance Segment Instance Segment Instance Master Instance Segment Host Segment Instance Segment Instance Singleton INSERT statement COPY statement INSERT via external table or gpload
  • 30. 30© 2016 Pivotal Software, Inc. All rights reserved. Load Using Regular External Tables Ÿ  File based (flat files) –  gpfdist provides the best performance =# CREATE EXTERNAL TABLE ext_expenses (name text, date date, amount float4, category text, description text) LOCATION ( ‘gpfdist://etlhost:8081/*.txt’, ‘gpfdst://etlhost:8082/*.txt’) FORMAT ’TEXT' (DELIMITER ‘|’ ); $ gpfdist –d /var/load_files1/expenses –p 8081 –l /home/gpadmin/log1 & $ gpfdist –d /var/load_files2/expenses –p 8082 –l /home/gpadmin/log2 &
  • 31. 31© 2016 Pivotal Software, Inc. All rights reserved. ANALYZEDB and Database Statistics •  Accurate statistics are critical for the query optimizer to generate optimal query plans –  When a table is analyzed table information about the data is stored into system catalog tables •  Always update statistics after loading data •  Always update statistics after CREATE INDEX operations •  Always update statistics after INSERT, UPDATE and DELETE operations that significantly changes the underlying data
  • 32. 32© 2016 Pivotal Software, Inc. All rights reserved. ANALYZEDB Parallel ANALYZE sessions •  Invoke concurrent ANALYZE sessions •  Each session is at individual table/partition level •  For example: analyzedb -d myDB -t public.big_fact_table -p 4 •  Parallel level p between 1 and 10. Default value 5. •  Tune parallel level according to system load •  In general, 3~5x speed up over single session
  • 33. 33© 2016 Pivotal Software, Inc. All rights reserved. ANALYZEDB Incremental ANALYZE •  If a table/partition has not changed (DML, DDL) since last run of ANALYZEDB, it will be skipped automatically •  ANALYZEDB keeps a record of which tables have up-to-state stats after a run on disk in $MASTER_DATA_DIRECTORY/db_analyze •  ANALYZEDB compares the current catalog with the state files of last run to determine the incremental •  ANALYZEDB captures statistics on root partition table required for the Pivotal Query Optimizer (PQO)
  • 34. 34© 2016 Pivotal Software, Inc. All rights reserved. ANALYZEDB Details •  Incremental analyze does not apply to heap tables •  Heap tables are always analyzed •  Catalog tables, views and external tables are automatically skipped
  • 35. 35© 2016 Pivotal Software, Inc. All rights reserved. ANALYZEDB Miscellaneous •  Gently kill analyzedb by Ctrl+C or sending SIGINT – it will resume at where it left off when restarted •  Print out progress report while running •  Refresh root partition stats for the Pivotal Query Optimizer automatically •  Analyze tables in descending OID order •  Use analyzedb -? for other options (using config file, include/ exclude columns, dry run, force non-incremental, quiet mode)
  • 36. 36© 2016 Pivotal Software, Inc. All rights reserved. Greenplum source code major differences w/ PostgreSQL https://github.com/greenplum-db/gpdb/tree/master/gpMgmt Python cluster management code https://github.com/greenplum-db/gpdb/tree/master/gpAux/gpperfmon Performance and system management code https://github.com/greenplum-db/gpdb/tree/master/src/backend/access/appendonly Append-optimized and columnar tables https://github.com/greenplum-db/gpdb/tree/master/src/backend/access/external External tables https://github.com/greenplum-db/gpdb/tree/master/src/backend/cdb Main cluster database code, such as mirroring etc https://github.com/greenplum-db/gpdb/tree/master/src/backend/cdb/motion Interconnect between nodes
  • 37. 37© 2016 Pivotal Software, Inc. All rights reserved. 37 Live Demo
  • 38. 38© 2016 Pivotal Software, Inc. All rights reserved. Core Greenplum Engine UDP Interconnect Flow Control Roadmap •  Replicated Tables; High Performance Temp Tables •  Faster Query Dispatch; Short Query Performance •  Query Plan Code Generation •  Small Material Aggregates •  Refactor Analyze for Performance Gains
  • 39. 39© 2016 Pivotal Software, Inc. All rights reserved. Polymorphic Storage™ User Definable Storage Layout Ÿ  Columnar storage compresses better Ÿ  Optimized for retrieving a subset of the columns when querying Ÿ  Compression can be set differently per column: gzip (1-9), quicklz, delta, RLE Ÿ  Row oriented faster when returning all columns Ÿ  HEAP for many updates and deletes Ÿ  Use indexes for drill through queries TABLE ‘SALES’ Jun Column-orientedRow-oriented Oct Year -1 Year -2 External HDFS Ÿ  Less accessed partitions on HDFS with external partitions to seamlessly query all data Ÿ  Text, CSV, Binary, Avro, Parquet format Ÿ  All major HDP Distros Nov DecJul Aug Sep Roadmap •  GPHDFS Predicate Pushdown •  S3 Object Store External Tables •  GPDB to GPDB External Tables •  HAWQ External Tables
  • 40. 40© 2016 Pivotal Software, Inc. All rights reserved. Pivotal Greenplum Roadmap Highlights ●  S3 External Tables ●  Performance tuned for AWS ●  Dynamic Code Generation using LLVM ●  Short running query performance enhancements ●  Faster analyze ●  WAL Replication Segment Mirroring ●  Incremental restore MVP ●  Disk space full warnings ●  Snapshot Backup ●  Anaconda Python Modules: NLTK, etc ●  Time Series Gap Filling ●  Complex Numbers ●  PostGIS Raster Support ●  Geospatial Trajectories ●  Path analytics ●  Enhanced SVM module ●  Py-Madlib ●  Lock Free Backup
  • 41. 41© 2016 Pivotal Software, Inc. All rights reserved. •  Government detection of benefits that should not be made •  Government detection of tax fraud •  Government economic statistics research database •  Commercial banking wealth management data science and product development •  Commercial clearing corporation's risk and trade repositories reporting •  Pharmaceutical company vaccine potency prediction based on manufacturing sensors •  401K providers analytics on investment choices •  Auto manufacturer’s analytics on predictive maintenance •  Corporate/Financial internal email and communication surveillance and reporting •  Oil drilling equipment predictive maintenance •  Mobile telephone company enterprise data warehouse •  Retail store chain customer purchases analytics •  Airlines loyalty program analytics •  Telecom company network performance and availability analytics •  Corporate network anomalous behavior and intrusion detections •  Semiconductor Fab sensor analytics and reporting Highlighted Greenplum successes