SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Understand the Query Plan to
Optimize Performance with
EXPLAIN and EXPLAIN ANALYZE
Jeevan Chalke
3rd Dec 2020
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.2
Agenda
• Brief on optimizer/planner
• Paths and Plan
• EXPLAIN and EXPLAIN ANALYZE
• Options used with EXPLAIN
• Who is EDB?
• Q&A
• Scans - seq, index, bitmap
• Joins
• Sort
• Limit
• Parallelism
• Grouping
• Partitioning
• FDW
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.3
• Statistics
• ANALYZE table
• Creates different paths
• The way query can be evaluated to give the desired output
• Gives metrics to them
• Chooses the cheapest path
• Converts path to the plan
• Executor then executes the query according to the chosen plan
Brief on optimizer/planner
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.4
Paths and Plan
• Costs
• cpu_tuple_cost (0.01)
• cpu_operator_cost (0.0025)
• seq_page_cost (1.0)
• ...
• Rows
• Estimated
• Actual
• Width
• Time
• Actual
• Planning
• Execution
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.5
# create table mytab1(a int, b int, c int);
# create table mytab2(x int, y int);
# insert into mytab1 select i, i%50, i%100 from generate_series(1, 1000000) i;
# insert into mytab2 select i, i%30 from generate_series(1, 1000) i;
# analyze mytab1;
# analyze mytab2;
# select relname, relpages, reltuples from pg_class where relname like 'mytab%';
relname | relpages | reltuples
---------+----------+-----------
mytab1 | 5406 | 1e+06
mytab2 | 5 | 1000
(2 rows)
Basic EXPLAIN with seq scan, setup
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.6
# EXPLAIN select * from mytab1;
QUERY PLAN
-----------------------------------------------------------------
Seq Scan on mytab1 (cost=0.00..15406.00 rows=1000000 width=12)
(1 row)
Basic EXPLAIN with seq scan
• Syntax:
• EXPLAIN [ ( option [, ...] ) ] statement
• EXPLAIN [ ANALYZE ] [ VERBOSE ] statement
• For a seq scan; the planner has to do two things
• Read all the pages [cost = 5406 (relpages) * 1 (seq_page_cost) = 5406]
• Read all the tuples from each page [cost = 1000000 (reltuples) * 0.01 (cpu_tuple_cost) = 10000]
• Total cost = 5406 + 10000 = 15406
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.7
# EXPLAIN select * from mytab1 where a < 800000;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12)
Filter: (a < 800000)
(2 rows)
With WHERE clause
• Filter is added in the output showing the condition
• Number of rows reduced
• Cost increased
• cpu_operator_cost = 0.0025 (Filtering cost = 0.0025 * 1000000 = 2500)
• Total cost = 15406 + 2500 = 17906
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.8
# create index myidx1 on mytab1(a);
# EXPLAIN select * from mytab1 where a < 800000;
QUERY PLAN
----------------------------------------------------------------
Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12)
Filter: (a < 800000)
(2 rows)
With an Index
• Let’s add an index on mytab1 and run the same query again
• Same result, why?
Let’s understand EXPLAIN ANALYZE first...
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.9
• Gives the EXPLAIN plan
• Executes the query; discards the output
• Gives actual timing and row details
• Gives a few more finer details; like loops and allows using BUFFERS option
• Provides summary showing planning and execution time
• Risky with DML commands
• For example, an EXPLAIN ANALYZE DELETE... will actually delete the rows from a table
EXPLAIN ANALYZE
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.10
# EXPLAIN (ANALYZE) select * from mytab1 where a < 800000;
QUERY PLAN ------------------------
-------------------------------------------------------
Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12)
(actual time=0.019..173.306 rows=799999 loops=1)
Filter: (a < 800000)
Rows Removed by Filter: 200001
Planning Time: 0.127 ms
Execution Time: 201.754 ms
# set enable_seqscan to off; -- Forces Index scan
# EXPLAIN (ANALYZE) select * from mytab1 where a < 800000;
QUERY PLAN ----------------------------------
---------------------------------------------
Index Scan using myidx1 on mytab1 (cost=0.42..27073.51 rows=798805 width=12)
(actual time=0.226..372.299 rows=799999 loops=1)
Index Cond: (a < 800000)
Planning Time: 0.098 ms
Execution Time: 399.717 ms
With an Index (continued...)
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.11
# set enable_seqscan to on;
# EXPLAIN (ANALYZE) select * from mytab1 where a > 800000;
QUERY PLAN ------------------------
-------------------------------------------------
Index Scan using myidx1 on mytab1 (cost=0.42..6767.22 rows=199588 width=12)
(actual time=0.135..115.500 rows=200000 loops=1)
Index Cond: (a > 800000)
Planning Time: 0.156 ms
Execution Time: 123.236 ms
(4 rows)
With an Index (continued...)
• So, having an index doesn’t always improve the performance.
• But it does so in most of the cases and you have to figure it out by looking at your data.
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.12
# EXPLAIN (ANALYZE) select a from mytab1 where a > 800000;
QUERY PLAN --------------
--------------------------------------------------------------------
Index Only Scan using myidx1 on mytab1 (cost=0.42..6767.22 rows=199588 width=4)
(actual time=0.060..77.517 rows=200000 loops=1)
Index Cond: (a > 800000)
Heap Fetches: 200000
Planning Time: 0.091 ms
Execution Time: 85.114 ms
(5 rows)
With an Index, with specific column
• Index Scan is changed to Index Only Scan
• Required column is part of an index itself so no need to scan the actual table
• Available planner options
• enable_seqscan, enable_indexscan, enable_indexonlyscan, enable_bitmapscan
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.13
# create index myidx2 on mytab1(c);
# EXPLAIN (COSTS OFF) select a, c from mytab1 where a > 800000 or c <= 100000;
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on mytab1
Recheck Cond: ((a > 800000) OR (c <= 100000))
-> BitmapOr
-> Bitmap Index Scan on myidx1
Index Cond: (a > 800000)
-> Bitmap Index Scan on myidx2
Index Cond: (c <= 100000)
With Bitmap Index/Heap Scan
• Two-step scanning
• Find final rows in the output, locate them (Inner Plan)
• ANDs or ORs the multiple conditions as needed
• Actually, fetches only those rows (Outer Plan)
• Outer plan sorts the rows' physical location and then fetches those
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.14
# EXPLAIN (ANALYZE, SETTINGS, COSTS OFF)
select a, x from mytab1 t1 left join mytab2 t2 on (t1.a = t2.x);
QUERY PLAN
--------------------------------------------------------------------------------
Hash Left Join (actual time=0.535..358.964 rows=1000000 loops=1)
Hash Cond: (t1.a = t2.x)
-> Seq Scan on mytab1 t1 (actual time=0.013..106.310 rows=1000000 loops=1)
-> Hash (actual time=0.513..0.514 rows=1000 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 44kB
-> Seq Scan on mytab2 t2 (actual time=0.013..0.239 rows=1000 loops=1)
Settings: enable_hashagg = 'off', enable_mergejoin = 'off'
Planning Time: 0.137 ms
Execution Time: 393.929 ms
(9 rows)
JOINs
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.15
# EXPLAIN (COSTS OFF) select a from mytab1 t1 where a in (select x from mytab2);
QUERY PLAN
-------------------------------------------------
Merge Semi Join
Merge Cond: (t1.a = mytab2.x)
-> Index Only Scan using myidx1 on mytab1 t1
-> Sort
Sort Key: mytab2.x
-> Seq Scan on mytab2
# EXPLAIN (COSTS OFF) select a from mytab1 t1 where a in (select x from mytab2);
QUERY PLAN
-------------------------------------------------
Nested Loop
-> Unique
-> Sort
-> Seq Scan on mytab2
-> Index Only Scan using myidx1 on mytab1 t1
Index Cond: (a = mytab2.x)
JOINs (continued...)
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.16
Merge Join Hash Join Nested Loop Join
• Sorts, and then merges
• Faster for bigger data-set
• Works with equality
constraints
• Faster provided we have
enough memory
• Mostly used where one
table is smaller
• For smaller data-set
• Mostly used for cross joins
JOINs summary
• Join Types
• Left/Right/Full/Anti/Semi/Inner
• Planner parameters
• enable_hashjoin, enable_mergejoin, enable_nestloop
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.17
# EXPLAIN (ANALYZE, BUFFERS) select * from mytab1 t1 where a < 200000 order by b;
QUERY PLAN ----------------------------------
-------------------------------------------------
Sort (cost=27679.73..28177.42 rows=199079 width=12) (actual time=147.275..178.516
rows=199999 loops=1)
Sort Key: b
Sort Method: external merge Disk: 4320kB
Buffers: shared hit=1631, temp read=540 written=543
-> Index Scan using myidx1 on mytab1 t1 (cost=0.42..6752.31 rows=199079
width=12) (actual time=0.035..67.219 rows=199999 loops=1)
Index Cond: (a < 200000)
Buffers: shared hit=1631
Planning Time: 0.130 ms
Execution Time: 190.342 ms
Sort
• Sort Key
• Sort Method is external merge sort
• Buffers option, temp read/written in blocks (8K)
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.18
# set work_mem to '32MB';
# EXPLAIN (ANALYZE, BUFFERS) select * from mytab1 t1 where a < 200000 order by b;
QUERY PLAN ----------------------------------
-------------------------------------------------
Sort (cost=24274.23..24771.92 rows=199079 width=12) (actual time=127.852..157.664
rows=199999 loops=1)
Sort Key: b
Sort Method: quicksort Memory: 17082kB
Buffers: shared hit=1631
-> Index Scan using myidx1 on mytab1 t1 (cost=0.42..6752.31 rows=199079
width=12) (actual time=0.035..78.813 rows=199999 loops=1)
Index Cond: (a < 200000)
Buffers: shared hit=1631
Planning Time: 0.150 ms
Execution Time: 174.043 ms
Sort (continued...)
• Sort Method is quicksort due to increased work_mem, sorting is now done in-memory.
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.19
# EXPLAIN (ANALYZE, TIMING OFF) select * from mytab1 t1 where b < 20 limit 10;
QUERY PLAN --------------
-------------------------------------------------------------------
Limit (cost=0.00..0.45 rows=10 width=12) (actual rows=10 loops=1)
-> Seq Scan on mytab1 t1 (cost=0.00..17906.00 rows=401200 width=12) (actual
rows=10 loops=1)
Filter: (b < 20)
Planning Time: 0.110 ms
Execution Time: 0.096 ms
(5 rows)
Limit
• Scan stops as soon as LIMIT is reached
• Also, notice new option timing
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.20
# create table mytab3(x int, y int);
# insert into mytab3 select i, i%30 from generate_series(1, 10000000) i;
# analyze mytab3;
# set parallel_leader_participation to off;
# set parallel_tuple_cost = 0.1;
Parallelism
• Look for parameters related to Parallelism
• parallel_leader_participation, parallel_setup_cost, parallel_tuple_cost, max_parallel_workers,
max_parallel_workers_per_gather, etc.
• New nodes in EXPLAIN
• Gather or Gather Merge
• Workers
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.21
# EXPLAIN (ANALYZE, VERBOSE, SUMMARY OFF) select x from mytab3 t1 where x < 2000000;
QUERY PLAN ----------------
---------------------------------------------------------------------
Gather (cost=1000.00..127883.81 rows=2013581 width=4) (actual time=6.694..1011.544
rows=1999999 loops=1)
Output: x
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on public.mytab3 t1 (cost=0.00..106748.00 rows=1006790
width=4) (actual time=0.025..773.107 rows=1000000 loops=2)
Output: x
Filter: (t1.x < 2000000)
Rows Removed by Filter: 4000000
Worker 0: actual time=0.022..772.607 rows=1023227 loops=1
Worker 1: actual time=0.029..773.607 rows=976772 loops=1
Parallelism (continued...)
• Gather collects tuples from all the workers
• Gather’s result is always unsorted
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.22
# EXPLAIN (ANALYZE, VERBOSE)
select count(*) from mytab2 t1 group by y having sum(x) > 17000;
QUERY PLAN --------------------------
------------------------------------------------------------
HashAggregate (cost=22.50..22.88 rows=10 width=12) (actual time=0.480..0.486 rows=5
loops=1)
Output: count(*), y
Group Key: t1.y
Filter: (sum(t1.x) > 17000)
Rows Removed by Filter: 25
-> Seq Scan on public.mytab2 t1 (cost=0.00..15.00 rows=1000 width=8) (actual
time=0.013..0.108 rows=1000 loops=1)
Output: x, y
Planning Time: 0.081 ms
Execution Time: 0.545 ms
Grouping
• Group keys are displayed, and Filter shows the Having clause
• Parameter enable_hashagg can be used to disable HashAggregate
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.23
# EXPLAIN VERBOSE select count(*) from mytab3 t1 group by y having sum(x) > 17000;
QUERY PLAN ----------------
----------------------------------------------------------------------
Finalize GroupAggregate (cost=132749.06..132756.89 rows=10 width=12)
Output: count(*), y
Group Key: t1.y
Filter: (sum(t1.x) > 17000)
-> Gather Merge (cost=132749.06..132756.06 rows=60 width=20)
Output: y, (PARTIAL count(*)), (PARTIAL sum(x))
Workers Planned: 2
-> Sort (cost=131749.04..131749.11 rows=30 width=20)
Output: y, (PARTIAL count(*)), (PARTIAL sum(x))
Sort Key: t1.y
-> Partial HashAggregate (cost=131748.00..131748.30 rows=30 width=20)
Output: y, PARTIAL count(*), PARTIAL sum(x)
Group Key: t1.y
-> Parallel Seq Scan on public.mytab3 t1 (cost=0.00..94248.00
rows=5000000 width=8)
Output: x, y
Grouping with parallelism
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.24
# EXPLAIN (ANALYZE, COSTS OFF) select min(b) from mytab1 t1;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (actual time=214.125..214.125 rows=1 loops=1)
-> Seq Scan on mytab1 t1 (actual time=0.018..104.833 rows=1000000 loops=1)
Planning Time: 0.134 ms
Execution Time: 214.161 ms
(4 rows)
MIN()/MAX() aggregates
• Finding just a min value seems very slow
• What if we have an index on that column?
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.25
# EXPLAIN (ANALYZE, COSTS OFF) select max(a) from mytab1 t1;
QUERY PLAN --------------------------
----------------------------------------------------
Result (actual time=0.100..0.101 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (actual time=0.094..0.094 rows=1 loops=1)
-> Index Only Scan Backward using myidx1 on mytab1 t1 (actual
time=0.091..0.092 rows=1 loops=1)
Index Cond: (a IS NOT NULL)
Heap Fetches: 0
Planning Time: 0.226 ms
Execution Time: 0.149 ms
(8 rows)
MIN()/MAX() aggregates (continued...)
• With an index, min/max executes really fast
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.26
# create table pagg_tab (a int, b int, c text, d int) partition by list(c);
# create table pagg_tab_p1 partition of pagg_tab
for values in ('0000', '0001', '0002', '0003');
# create table pagg_tab_p2 partition of pagg_tab
for values in ('0004', '0005', '0006', '0007');
# create table pagg_tab_p3 partition of pagg_tab
for values in ('0008', '0009', '0010', '0011');
# insert into pagg_tab select i%20, i%30, to_char(i%12, 'FM0000'), i%30
from generate_series(0, 2999) i;
# analyze pagg_tab;
# set enable_partitionwise_join to on;
# set enable_partitionwise_aggregate to on;
# set edb_enable_pruning to false;
Partitioning
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.27
# EXPLAIN (ANALYZE, COSTS OFF)
select * from pagg_tab;
QUERY PLAN
----------------------------------------------------------------------------
Append (actual time=0.016..0.650 rows=3000 loops=1)
-> Seq Scan on pagg_tab_p1 (actual time=0.015..0.132 rows=1000 loops=1)
-> Seq Scan on pagg_tab_p2 (actual time=0.008..0.130 rows=1000 loops=1)
-> Seq Scan on pagg_tab_p3 (actual time=0.012..0.137 rows=1000 loops=1)
Planning Time: 0.189 ms
Execution Time: 0.803 ms
(6 rows)
Partitioning (continued...)
• Append of all children scanned sequentially
• Look for parameters related to Partitioning
• enable_partition_pruning, enable_partitionwise_aggregate, enable_partitionwise_join
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.28
# EXPLAIN (COSTS OFF) select t1.c, count(*) from pagg_tab t1, pagg_tab t2
where t1.c = t2.c and t1.c in ('0000', '0004') group by (t1.c);
QUERY PLAN
---------------------------------------------------------------
Append
-> HashAggregate
Group Key: t1.c
-> Hash Join
Hash Cond: (t1.c = t2.c)
-> Seq Scan on pagg_tab_p1 t1
Filter: (c = ANY ('{0000,0004}'::text[]))
-> Hash
-> Seq Scan on pagg_tab_p1 t2
-> HashAggregate
Group Key: t1_1.c
-> Hash Join
Hash Cond: (t1_1.c = t2_1.c)
-> Seq Scan on pagg_tab_p2 t1_1
Filter: (c = ANY ('{0000,0004}'::text[]))
-> Hash
-> Seq Scan on pagg_tab_p2 t2_1
Partition-wise Join and Aggregation
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.29
# EXPLAIN (ANALYZE, VERBOSE, COSTS OFF)
select count(*) from fprt1 t1 inner join fprt2 t2 on (t1.a = t2.b) group by t1.a;
QUERY PLAN
--------------------------------------------------------------------------------------------------
Append (actual time=0.799..1.612 rows=84 loops=1)
-> Foreign Scan (actual time=0.798..0.802 rows=42 loops=1)
Output: (count(*)), t1.a
Relations: Aggregate on ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2))
Remote SQL: SELECT count(*), r5.a FROM (public.fprt1_p1 r5 INNER JOIN public.fprt2_p1
r8 ON (((r5.a = r8.b)))) GROUP BY 2
-> Foreign Scan (actual time=0.799..0.802 rows=42 loops=1)
Output: (count(*)), t1_1.a
Relations: Aggregate on ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2))
Remote SQL: SELECT count(*), r6.a FROM (public.fprt1_p2 r6 INNER JOIN public.fprt2_p2
r9 ON (((r6.a = r9.b)))) GROUP BY 2
Planning Time: 5.245 ms
Execution Time: 1.865 ms
FDW - Foreign Scan
• Foreign Scan and Remote SQL
• Both join and aggregation is pushed down on the remote server
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.30
# EXPLAIN (ANALYZE, VERBOSE, WAL, COSTS OFF)
insert into mytab1 values(0, 0, 0);
QUERY PLAN
-------------------------------------------------------------------
Insert on public.mytab1 (actual time=0.075..0.076 rows=0 loops=1)
WAL: records=3 bytes=195
-> Result (actual time=0.002..0.002 rows=1 loops=1)
Output: 0, 0, 0
Planning Time: 0.028 ms
Execution Time: 0.108 ms
(6 rows)
INSERT Plan
• Explain works with
• SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, CREATE TABLE AS, or CREATE
MATERIALIZED VIEW queries
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.31
Explain and Explain Analyze Options
• ANALYZE [ boolean ]
• Executes the query. Shows actual run times
and other statistics.
• VERBOSE [ boolean ]
• Detailed output.
• COSTS [ boolean ]
• Shows estimated statistics and rows.
Default TRUE.
• SETTINGS [ boolean ]
• Shows configuration parameters.
• Wal [ boolean ]
• Shows WAL related details. Used only with
ANALYZE.
• BUFFERS [ boolean ]
• Displays information on buffer usage. Used
only with ANALYZE.
• TIMING [ boolean ]
• Show actual times. Used only with
ANALYZE, default TRUE.
• SUMMARY [ boolean ]
• Gives summary on planning and execution
times.
• FORMAT { TEXT | XML | JSON | YAML }
• Output display format. Default TEXT.
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.32
Format
# EXPLAIN (FORMAT JSON, verbose off, analyze,
summary, timing, costs, buffers off)
select * from mytab1;
QUERY PLAN
-------------------------------------
[ +
{ +
"Plan": { +
"Node Type": "Seq Scan", +
"Parallel Aware": false, +
"Relation Name": "mytab1", +
"Alias": "mytab1", +
"Startup Cost": 0.00, +
"Total Cost": 15406.00, +
"Plan Rows": 1000000, +
"Plan Width": 12, +
"Actual Startup Time": 0.018,+
"Actual Total Time": 154.808,+
"Actual Rows": 1000000, +
"Actual Loops": 1 +
}, +
"Planning Time": 0.081, +
"Triggers": [ +
], +
"Execution Time": 198.105 +
} +
]
# EXPLAIN (FORMAT XML, verbose off, analyze,
Summary, timing, costs, buffers off)
select * from mytab1;
QUERY PLAN
----------------------------------------------------------
<explain xmlns="http://www.postgresql.org/2009/explain">+
<Query> +
<Plan>
+
<Node-Type>Seq Scan</Node-Type>
+
<Parallel-Aware>false</Parallel-Aware>
+
<Relation-Name>mytab1</Relation-Name>
+
<Alias>mytab1</Alias>
+
<Startup-Cost>0.00</Startup-Cost>
+
<Total-Cost>15406.00</Total-Cost>
+
<Plan-Rows>1000000</Plan-Rows>
+
<Plan-Width>12</Plan-Width>
+
<Actual-Startup-Time>0.016</Actual-Startup-Time>
+
<Actual-Total-Time>133.351</Actual-Total-Time>
+
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.33
• EXPLAIN
• https://www.postgresql.org/docs/13/sql-explain.html
• Using EXPLAIN
• https://www.postgresql.org/docs/13/using-explain.html
• Other web links
• http://www.postgresqltutorial.com/postgresql-explain/
• https://momjian.us/main/writings/pgsql/optimizer.pdf
• https://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf
• https://www.dalibo.org/_media/understanding_explain.pdf
• Query Planning Gone Wrong by Robert Haas
• http://rhaas.blogspot.com/2013/05/query-planning-gone-wrong.html
References
Who is EDB?
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.35
© Copyright EnterpriseDB Corporation, 2020. All rights reserved.36
Questions?
Thank You !

Contenu connexe

Tendances

Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
DataWorks Summit
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 

Tendances (20)

Automating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with AnsibleAutomating a PostgreSQL High Availability Architecture with Ansible
Automating a PostgreSQL High Availability Architecture with Ansible
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
How to Design for Database High Availability
How to Design for Database High AvailabilityHow to Design for Database High Availability
How to Design for Database High Availability
 
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
PayPal merchant ecosystem using Apache Spark, Hive, Druid, and HBase
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 
Apache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in AlibabaApache Hadoop YARN 3.x in Alibaba
Apache Hadoop YARN 3.x in Alibaba
 
Exploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthyExploiting machine learning to keep Hadoop clusters healthy
Exploiting machine learning to keep Hadoop clusters healthy
 
Pig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big DataPig on Tez: Low Latency Data Processing with Big Data
Pig on Tez: Low Latency Data Processing with Big Data
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
Hive LLAP: A High Performance, Cost-effective Alternative to Traditional MPP ...
 
Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
What's new in apache hive
What's new in apache hive What's new in apache hive
What's new in apache hive
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Performance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storagePerformance tuning your Hadoop/Spark clusters to use cloud storage
Performance tuning your Hadoop/Spark clusters to use cloud storage
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges Big data processing meets non-volatile memory: opportunities and challenges
Big data processing meets non-volatile memory: opportunities and challenges
 
Node labels in YARN
Node labels in YARNNode labels in YARN
Node labels in YARN
 

Similaire à Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN ANALYZE

Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Guatemala User Group
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
j9soto
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
mCloud
 

Similaire à Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN ANALYZE (20)

All you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICSAll you need to know about CREATE STATISTICS
All you need to know about CREATE STATISTICS
 
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ OracleUnderstanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
Understanding Query Optimization with ‘regular’ and ‘Exadata’ Oracle
 
Oracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_shareOracle 122 partitioning_in_action_slide_share
Oracle 122 partitioning_in_action_slide_share
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
Big Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStoreBig Data Analytics with MariaDB ColumnStore
Big Data Analytics with MariaDB ColumnStore
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1Informix Warehouse Accelerator (IWA) features in version 12.1
Informix Warehouse Accelerator (IWA) features in version 12.1
 
Adaptive Query Optimization
Adaptive Query OptimizationAdaptive Query Optimization
Adaptive Query Optimization
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migration
 
Oracle 12c SPM
Oracle 12c SPMOracle 12c SPM
Oracle 12c SPM
 
Histograms : Pre-12c and Now
Histograms : Pre-12c and NowHistograms : Pre-12c and Now
Histograms : Pre-12c and Now
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
Developers’ mDay u Banjoj Luci - Bogdan Kecman, Oracle – MySQL Server 8.0
 
Developers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman OracleDevelopers' mDay 2017. - Bogdan Kecman Oracle
Developers' mDay 2017. - Bogdan Kecman Oracle
 
A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013A few things about the Oracle optimizer - 2013
A few things about the Oracle optimizer - 2013
 

Plus de EDB

EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
EDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
EDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB
 

Plus de EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN ANALYZE

  • 1. Understand the Query Plan to Optimize Performance with EXPLAIN and EXPLAIN ANALYZE Jeevan Chalke 3rd Dec 2020
  • 2. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.2 Agenda • Brief on optimizer/planner • Paths and Plan • EXPLAIN and EXPLAIN ANALYZE • Options used with EXPLAIN • Who is EDB? • Q&A • Scans - seq, index, bitmap • Joins • Sort • Limit • Parallelism • Grouping • Partitioning • FDW
  • 3. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.3 • Statistics • ANALYZE table • Creates different paths • The way query can be evaluated to give the desired output • Gives metrics to them • Chooses the cheapest path • Converts path to the plan • Executor then executes the query according to the chosen plan Brief on optimizer/planner
  • 4. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.4 Paths and Plan • Costs • cpu_tuple_cost (0.01) • cpu_operator_cost (0.0025) • seq_page_cost (1.0) • ... • Rows • Estimated • Actual • Width • Time • Actual • Planning • Execution
  • 5. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.5 # create table mytab1(a int, b int, c int); # create table mytab2(x int, y int); # insert into mytab1 select i, i%50, i%100 from generate_series(1, 1000000) i; # insert into mytab2 select i, i%30 from generate_series(1, 1000) i; # analyze mytab1; # analyze mytab2; # select relname, relpages, reltuples from pg_class where relname like 'mytab%'; relname | relpages | reltuples ---------+----------+----------- mytab1 | 5406 | 1e+06 mytab2 | 5 | 1000 (2 rows) Basic EXPLAIN with seq scan, setup
  • 6. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.6 # EXPLAIN select * from mytab1; QUERY PLAN ----------------------------------------------------------------- Seq Scan on mytab1 (cost=0.00..15406.00 rows=1000000 width=12) (1 row) Basic EXPLAIN with seq scan • Syntax: • EXPLAIN [ ( option [, ...] ) ] statement • EXPLAIN [ ANALYZE ] [ VERBOSE ] statement • For a seq scan; the planner has to do two things • Read all the pages [cost = 5406 (relpages) * 1 (seq_page_cost) = 5406] • Read all the tuples from each page [cost = 1000000 (reltuples) * 0.01 (cpu_tuple_cost) = 10000] • Total cost = 5406 + 10000 = 15406
  • 7. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.7 # EXPLAIN select * from mytab1 where a < 800000; QUERY PLAN ---------------------------------------------------------------- Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12) Filter: (a < 800000) (2 rows) With WHERE clause • Filter is added in the output showing the condition • Number of rows reduced • Cost increased • cpu_operator_cost = 0.0025 (Filtering cost = 0.0025 * 1000000 = 2500) • Total cost = 15406 + 2500 = 17906
  • 8. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.8 # create index myidx1 on mytab1(a); # EXPLAIN select * from mytab1 where a < 800000; QUERY PLAN ---------------------------------------------------------------- Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12) Filter: (a < 800000) (2 rows) With an Index • Let’s add an index on mytab1 and run the same query again • Same result, why? Let’s understand EXPLAIN ANALYZE first...
  • 9. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.9 • Gives the EXPLAIN plan • Executes the query; discards the output • Gives actual timing and row details • Gives a few more finer details; like loops and allows using BUFFERS option • Provides summary showing planning and execution time • Risky with DML commands • For example, an EXPLAIN ANALYZE DELETE... will actually delete the rows from a table EXPLAIN ANALYZE
  • 10. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.10 # EXPLAIN (ANALYZE) select * from mytab1 where a < 800000; QUERY PLAN ------------------------ ------------------------------------------------------- Seq Scan on mytab1 (cost=0.00..17906.00 rows=798805 width=12) (actual time=0.019..173.306 rows=799999 loops=1) Filter: (a < 800000) Rows Removed by Filter: 200001 Planning Time: 0.127 ms Execution Time: 201.754 ms # set enable_seqscan to off; -- Forces Index scan # EXPLAIN (ANALYZE) select * from mytab1 where a < 800000; QUERY PLAN ---------------------------------- --------------------------------------------- Index Scan using myidx1 on mytab1 (cost=0.42..27073.51 rows=798805 width=12) (actual time=0.226..372.299 rows=799999 loops=1) Index Cond: (a < 800000) Planning Time: 0.098 ms Execution Time: 399.717 ms With an Index (continued...)
  • 11. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.11 # set enable_seqscan to on; # EXPLAIN (ANALYZE) select * from mytab1 where a > 800000; QUERY PLAN ------------------------ ------------------------------------------------- Index Scan using myidx1 on mytab1 (cost=0.42..6767.22 rows=199588 width=12) (actual time=0.135..115.500 rows=200000 loops=1) Index Cond: (a > 800000) Planning Time: 0.156 ms Execution Time: 123.236 ms (4 rows) With an Index (continued...) • So, having an index doesn’t always improve the performance. • But it does so in most of the cases and you have to figure it out by looking at your data.
  • 12. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.12 # EXPLAIN (ANALYZE) select a from mytab1 where a > 800000; QUERY PLAN -------------- -------------------------------------------------------------------- Index Only Scan using myidx1 on mytab1 (cost=0.42..6767.22 rows=199588 width=4) (actual time=0.060..77.517 rows=200000 loops=1) Index Cond: (a > 800000) Heap Fetches: 200000 Planning Time: 0.091 ms Execution Time: 85.114 ms (5 rows) With an Index, with specific column • Index Scan is changed to Index Only Scan • Required column is part of an index itself so no need to scan the actual table • Available planner options • enable_seqscan, enable_indexscan, enable_indexonlyscan, enable_bitmapscan
  • 13. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.13 # create index myidx2 on mytab1(c); # EXPLAIN (COSTS OFF) select a, c from mytab1 where a > 800000 or c <= 100000; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on mytab1 Recheck Cond: ((a > 800000) OR (c <= 100000)) -> BitmapOr -> Bitmap Index Scan on myidx1 Index Cond: (a > 800000) -> Bitmap Index Scan on myidx2 Index Cond: (c <= 100000) With Bitmap Index/Heap Scan • Two-step scanning • Find final rows in the output, locate them (Inner Plan) • ANDs or ORs the multiple conditions as needed • Actually, fetches only those rows (Outer Plan) • Outer plan sorts the rows' physical location and then fetches those
  • 14. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.14 # EXPLAIN (ANALYZE, SETTINGS, COSTS OFF) select a, x from mytab1 t1 left join mytab2 t2 on (t1.a = t2.x); QUERY PLAN -------------------------------------------------------------------------------- Hash Left Join (actual time=0.535..358.964 rows=1000000 loops=1) Hash Cond: (t1.a = t2.x) -> Seq Scan on mytab1 t1 (actual time=0.013..106.310 rows=1000000 loops=1) -> Hash (actual time=0.513..0.514 rows=1000 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 44kB -> Seq Scan on mytab2 t2 (actual time=0.013..0.239 rows=1000 loops=1) Settings: enable_hashagg = 'off', enable_mergejoin = 'off' Planning Time: 0.137 ms Execution Time: 393.929 ms (9 rows) JOINs
  • 15. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.15 # EXPLAIN (COSTS OFF) select a from mytab1 t1 where a in (select x from mytab2); QUERY PLAN ------------------------------------------------- Merge Semi Join Merge Cond: (t1.a = mytab2.x) -> Index Only Scan using myidx1 on mytab1 t1 -> Sort Sort Key: mytab2.x -> Seq Scan on mytab2 # EXPLAIN (COSTS OFF) select a from mytab1 t1 where a in (select x from mytab2); QUERY PLAN ------------------------------------------------- Nested Loop -> Unique -> Sort -> Seq Scan on mytab2 -> Index Only Scan using myidx1 on mytab1 t1 Index Cond: (a = mytab2.x) JOINs (continued...)
  • 16. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.16 Merge Join Hash Join Nested Loop Join • Sorts, and then merges • Faster for bigger data-set • Works with equality constraints • Faster provided we have enough memory • Mostly used where one table is smaller • For smaller data-set • Mostly used for cross joins JOINs summary • Join Types • Left/Right/Full/Anti/Semi/Inner • Planner parameters • enable_hashjoin, enable_mergejoin, enable_nestloop
  • 17. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.17 # EXPLAIN (ANALYZE, BUFFERS) select * from mytab1 t1 where a < 200000 order by b; QUERY PLAN ---------------------------------- ------------------------------------------------- Sort (cost=27679.73..28177.42 rows=199079 width=12) (actual time=147.275..178.516 rows=199999 loops=1) Sort Key: b Sort Method: external merge Disk: 4320kB Buffers: shared hit=1631, temp read=540 written=543 -> Index Scan using myidx1 on mytab1 t1 (cost=0.42..6752.31 rows=199079 width=12) (actual time=0.035..67.219 rows=199999 loops=1) Index Cond: (a < 200000) Buffers: shared hit=1631 Planning Time: 0.130 ms Execution Time: 190.342 ms Sort • Sort Key • Sort Method is external merge sort • Buffers option, temp read/written in blocks (8K)
  • 18. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.18 # set work_mem to '32MB'; # EXPLAIN (ANALYZE, BUFFERS) select * from mytab1 t1 where a < 200000 order by b; QUERY PLAN ---------------------------------- ------------------------------------------------- Sort (cost=24274.23..24771.92 rows=199079 width=12) (actual time=127.852..157.664 rows=199999 loops=1) Sort Key: b Sort Method: quicksort Memory: 17082kB Buffers: shared hit=1631 -> Index Scan using myidx1 on mytab1 t1 (cost=0.42..6752.31 rows=199079 width=12) (actual time=0.035..78.813 rows=199999 loops=1) Index Cond: (a < 200000) Buffers: shared hit=1631 Planning Time: 0.150 ms Execution Time: 174.043 ms Sort (continued...) • Sort Method is quicksort due to increased work_mem, sorting is now done in-memory.
  • 19. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.19 # EXPLAIN (ANALYZE, TIMING OFF) select * from mytab1 t1 where b < 20 limit 10; QUERY PLAN -------------- ------------------------------------------------------------------- Limit (cost=0.00..0.45 rows=10 width=12) (actual rows=10 loops=1) -> Seq Scan on mytab1 t1 (cost=0.00..17906.00 rows=401200 width=12) (actual rows=10 loops=1) Filter: (b < 20) Planning Time: 0.110 ms Execution Time: 0.096 ms (5 rows) Limit • Scan stops as soon as LIMIT is reached • Also, notice new option timing
  • 20. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.20 # create table mytab3(x int, y int); # insert into mytab3 select i, i%30 from generate_series(1, 10000000) i; # analyze mytab3; # set parallel_leader_participation to off; # set parallel_tuple_cost = 0.1; Parallelism • Look for parameters related to Parallelism • parallel_leader_participation, parallel_setup_cost, parallel_tuple_cost, max_parallel_workers, max_parallel_workers_per_gather, etc. • New nodes in EXPLAIN • Gather or Gather Merge • Workers
  • 21. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.21 # EXPLAIN (ANALYZE, VERBOSE, SUMMARY OFF) select x from mytab3 t1 where x < 2000000; QUERY PLAN ---------------- --------------------------------------------------------------------- Gather (cost=1000.00..127883.81 rows=2013581 width=4) (actual time=6.694..1011.544 rows=1999999 loops=1) Output: x Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on public.mytab3 t1 (cost=0.00..106748.00 rows=1006790 width=4) (actual time=0.025..773.107 rows=1000000 loops=2) Output: x Filter: (t1.x < 2000000) Rows Removed by Filter: 4000000 Worker 0: actual time=0.022..772.607 rows=1023227 loops=1 Worker 1: actual time=0.029..773.607 rows=976772 loops=1 Parallelism (continued...) • Gather collects tuples from all the workers • Gather’s result is always unsorted
  • 22. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.22 # EXPLAIN (ANALYZE, VERBOSE) select count(*) from mytab2 t1 group by y having sum(x) > 17000; QUERY PLAN -------------------------- ------------------------------------------------------------ HashAggregate (cost=22.50..22.88 rows=10 width=12) (actual time=0.480..0.486 rows=5 loops=1) Output: count(*), y Group Key: t1.y Filter: (sum(t1.x) > 17000) Rows Removed by Filter: 25 -> Seq Scan on public.mytab2 t1 (cost=0.00..15.00 rows=1000 width=8) (actual time=0.013..0.108 rows=1000 loops=1) Output: x, y Planning Time: 0.081 ms Execution Time: 0.545 ms Grouping • Group keys are displayed, and Filter shows the Having clause • Parameter enable_hashagg can be used to disable HashAggregate
  • 23. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.23 # EXPLAIN VERBOSE select count(*) from mytab3 t1 group by y having sum(x) > 17000; QUERY PLAN ---------------- ---------------------------------------------------------------------- Finalize GroupAggregate (cost=132749.06..132756.89 rows=10 width=12) Output: count(*), y Group Key: t1.y Filter: (sum(t1.x) > 17000) -> Gather Merge (cost=132749.06..132756.06 rows=60 width=20) Output: y, (PARTIAL count(*)), (PARTIAL sum(x)) Workers Planned: 2 -> Sort (cost=131749.04..131749.11 rows=30 width=20) Output: y, (PARTIAL count(*)), (PARTIAL sum(x)) Sort Key: t1.y -> Partial HashAggregate (cost=131748.00..131748.30 rows=30 width=20) Output: y, PARTIAL count(*), PARTIAL sum(x) Group Key: t1.y -> Parallel Seq Scan on public.mytab3 t1 (cost=0.00..94248.00 rows=5000000 width=8) Output: x, y Grouping with parallelism
  • 24. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.24 # EXPLAIN (ANALYZE, COSTS OFF) select min(b) from mytab1 t1; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (actual time=214.125..214.125 rows=1 loops=1) -> Seq Scan on mytab1 t1 (actual time=0.018..104.833 rows=1000000 loops=1) Planning Time: 0.134 ms Execution Time: 214.161 ms (4 rows) MIN()/MAX() aggregates • Finding just a min value seems very slow • What if we have an index on that column?
  • 25. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.25 # EXPLAIN (ANALYZE, COSTS OFF) select max(a) from mytab1 t1; QUERY PLAN -------------------------- ---------------------------------------------------- Result (actual time=0.100..0.101 rows=1 loops=1) InitPlan 1 (returns $0) -> Limit (actual time=0.094..0.094 rows=1 loops=1) -> Index Only Scan Backward using myidx1 on mytab1 t1 (actual time=0.091..0.092 rows=1 loops=1) Index Cond: (a IS NOT NULL) Heap Fetches: 0 Planning Time: 0.226 ms Execution Time: 0.149 ms (8 rows) MIN()/MAX() aggregates (continued...) • With an index, min/max executes really fast
  • 26. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.26 # create table pagg_tab (a int, b int, c text, d int) partition by list(c); # create table pagg_tab_p1 partition of pagg_tab for values in ('0000', '0001', '0002', '0003'); # create table pagg_tab_p2 partition of pagg_tab for values in ('0004', '0005', '0006', '0007'); # create table pagg_tab_p3 partition of pagg_tab for values in ('0008', '0009', '0010', '0011'); # insert into pagg_tab select i%20, i%30, to_char(i%12, 'FM0000'), i%30 from generate_series(0, 2999) i; # analyze pagg_tab; # set enable_partitionwise_join to on; # set enable_partitionwise_aggregate to on; # set edb_enable_pruning to false; Partitioning
  • 27. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.27 # EXPLAIN (ANALYZE, COSTS OFF) select * from pagg_tab; QUERY PLAN ---------------------------------------------------------------------------- Append (actual time=0.016..0.650 rows=3000 loops=1) -> Seq Scan on pagg_tab_p1 (actual time=0.015..0.132 rows=1000 loops=1) -> Seq Scan on pagg_tab_p2 (actual time=0.008..0.130 rows=1000 loops=1) -> Seq Scan on pagg_tab_p3 (actual time=0.012..0.137 rows=1000 loops=1) Planning Time: 0.189 ms Execution Time: 0.803 ms (6 rows) Partitioning (continued...) • Append of all children scanned sequentially • Look for parameters related to Partitioning • enable_partition_pruning, enable_partitionwise_aggregate, enable_partitionwise_join
  • 28. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.28 # EXPLAIN (COSTS OFF) select t1.c, count(*) from pagg_tab t1, pagg_tab t2 where t1.c = t2.c and t1.c in ('0000', '0004') group by (t1.c); QUERY PLAN --------------------------------------------------------------- Append -> HashAggregate Group Key: t1.c -> Hash Join Hash Cond: (t1.c = t2.c) -> Seq Scan on pagg_tab_p1 t1 Filter: (c = ANY ('{0000,0004}'::text[])) -> Hash -> Seq Scan on pagg_tab_p1 t2 -> HashAggregate Group Key: t1_1.c -> Hash Join Hash Cond: (t1_1.c = t2_1.c) -> Seq Scan on pagg_tab_p2 t1_1 Filter: (c = ANY ('{0000,0004}'::text[])) -> Hash -> Seq Scan on pagg_tab_p2 t2_1 Partition-wise Join and Aggregation
  • 29. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.29 # EXPLAIN (ANALYZE, VERBOSE, COSTS OFF) select count(*) from fprt1 t1 inner join fprt2 t2 on (t1.a = t2.b) group by t1.a; QUERY PLAN -------------------------------------------------------------------------------------------------- Append (actual time=0.799..1.612 rows=84 loops=1) -> Foreign Scan (actual time=0.798..0.802 rows=42 loops=1) Output: (count(*)), t1.a Relations: Aggregate on ((public.ftprt1_p1 t1) INNER JOIN (public.ftprt2_p1 t2)) Remote SQL: SELECT count(*), r5.a FROM (public.fprt1_p1 r5 INNER JOIN public.fprt2_p1 r8 ON (((r5.a = r8.b)))) GROUP BY 2 -> Foreign Scan (actual time=0.799..0.802 rows=42 loops=1) Output: (count(*)), t1_1.a Relations: Aggregate on ((public.ftprt1_p2 t1) INNER JOIN (public.ftprt2_p2 t2)) Remote SQL: SELECT count(*), r6.a FROM (public.fprt1_p2 r6 INNER JOIN public.fprt2_p2 r9 ON (((r6.a = r9.b)))) GROUP BY 2 Planning Time: 5.245 ms Execution Time: 1.865 ms FDW - Foreign Scan • Foreign Scan and Remote SQL • Both join and aggregation is pushed down on the remote server
  • 30. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.30 # EXPLAIN (ANALYZE, VERBOSE, WAL, COSTS OFF) insert into mytab1 values(0, 0, 0); QUERY PLAN ------------------------------------------------------------------- Insert on public.mytab1 (actual time=0.075..0.076 rows=0 loops=1) WAL: records=3 bytes=195 -> Result (actual time=0.002..0.002 rows=1 loops=1) Output: 0, 0, 0 Planning Time: 0.028 ms Execution Time: 0.108 ms (6 rows) INSERT Plan • Explain works with • SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, CREATE TABLE AS, or CREATE MATERIALIZED VIEW queries
  • 31. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.31 Explain and Explain Analyze Options • ANALYZE [ boolean ] • Executes the query. Shows actual run times and other statistics. • VERBOSE [ boolean ] • Detailed output. • COSTS [ boolean ] • Shows estimated statistics and rows. Default TRUE. • SETTINGS [ boolean ] • Shows configuration parameters. • Wal [ boolean ] • Shows WAL related details. Used only with ANALYZE. • BUFFERS [ boolean ] • Displays information on buffer usage. Used only with ANALYZE. • TIMING [ boolean ] • Show actual times. Used only with ANALYZE, default TRUE. • SUMMARY [ boolean ] • Gives summary on planning and execution times. • FORMAT { TEXT | XML | JSON | YAML } • Output display format. Default TEXT.
  • 32. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.32 Format # EXPLAIN (FORMAT JSON, verbose off, analyze, summary, timing, costs, buffers off) select * from mytab1; QUERY PLAN ------------------------------------- [ + { + "Plan": { + "Node Type": "Seq Scan", + "Parallel Aware": false, + "Relation Name": "mytab1", + "Alias": "mytab1", + "Startup Cost": 0.00, + "Total Cost": 15406.00, + "Plan Rows": 1000000, + "Plan Width": 12, + "Actual Startup Time": 0.018,+ "Actual Total Time": 154.808,+ "Actual Rows": 1000000, + "Actual Loops": 1 + }, + "Planning Time": 0.081, + "Triggers": [ + ], + "Execution Time": 198.105 + } + ] # EXPLAIN (FORMAT XML, verbose off, analyze, Summary, timing, costs, buffers off) select * from mytab1; QUERY PLAN ---------------------------------------------------------- <explain xmlns="http://www.postgresql.org/2009/explain">+ <Query> + <Plan> + <Node-Type>Seq Scan</Node-Type> + <Parallel-Aware>false</Parallel-Aware> + <Relation-Name>mytab1</Relation-Name> + <Alias>mytab1</Alias> + <Startup-Cost>0.00</Startup-Cost> + <Total-Cost>15406.00</Total-Cost> + <Plan-Rows>1000000</Plan-Rows> + <Plan-Width>12</Plan-Width> + <Actual-Startup-Time>0.016</Actual-Startup-Time> + <Actual-Total-Time>133.351</Actual-Total-Time> +
  • 33. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.33 • EXPLAIN • https://www.postgresql.org/docs/13/sql-explain.html • Using EXPLAIN • https://www.postgresql.org/docs/13/using-explain.html • Other web links • http://www.postgresqltutorial.com/postgresql-explain/ • https://momjian.us/main/writings/pgsql/optimizer.pdf • https://wiki.postgresql.org/images/4/45/Explaining_EXPLAIN.pdf • https://www.dalibo.org/_media/understanding_explain.pdf • Query Planning Gone Wrong by Robert Haas • http://rhaas.blogspot.com/2013/05/query-planning-gone-wrong.html References
  • 35. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.35
  • 36. © Copyright EnterpriseDB Corporation, 2020. All rights reserved.36