SlideShare une entreprise Scribd logo
1  sur  60
Télécharger pour lire hors ligne
Becoming a better
developer with explain
Understanding Postgres Query planner
Louise Grandjonc
Louise Grandjonc (louise@ulule.com)
Lead developer at Ulule (www.ulule.com)
Python / Django developer - Postgres
enthusiast
@louisemeta on twitter
About me
1. Can understanding the query plan help me?
2. What if I’m using an ORM?
3. So what does my database do when I query stuff?
Today’s agenda
Can understanding the query plan
help me?
The mystery of querying stuff
You
Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
Zooming into it
SQL Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
Parse
Planner
Optimizer
Execute
- Generates execution plans for a query
- Calculates the cost of each plan.
- The best one is used to execute your query
- Understand why is your filter / join / order slow.
- Know if the new index you just added is used.
- Stop guessing which index can be useful
In conclusion: You will understand why your query is slow
So what can I learn from the
query plan
What if I’m using an ORM ?
But why can’t we trust it ?
1.The ORM executes queries that
you might not expect
2.Your queries might not be
optimised and you won’t know
about it
The story of the owls
Owl RecipientRecipientJob
Letters Human
*
1
1*
1*
id
sender_id
receiver_id
sent_at
delivered_by
id
first_name
last_name
id
name
employer_name
feather_color
favourite_food
id
name
10 002 owls
10 000 humans
411861 letters
Loops (with django)
SELECT id, name, employer_name, favourite_food, job_id,
feather_color FROM owl WHERE employer_name = 'Hogwarts'
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
…
owls = Owl.objects.filter(employer_name=‘Hogwarts’)
for owl in owls:
print(owl.job) # 1 query per loop iteration
Awesome right ?
Loops (with django)
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" =
"job"."id")
WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "job" WHERE "job"."id" IN (2)
Owl.objects.filter(employer_name=‘Ulule’)
.prefetch_related(‘job’)
Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_directory ;
data_directory
-------------------------
/usr/local/var/postgres
owl_conference=# show log_filename ;
log_filename
-------------------------
postgresql-%Y-%m-%d.log
Terminal command
$ psql -U user -d your_database_name
psql interface
Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
-----------------------------------------
/usr/local/var/postgres/postgresql.conf
(1 row)
In your postgresql.conf
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'
logging_collector = on
log_min_duration_statement = 0
Finding dirty queries
pg_stat_statements
In your psql
CREATE EXTENSION pg_stat_statements;
The module must be added to your shared_preload_libraries.
You have to change your postgresql.conf and restart.
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
Finding the painful queries
SELECT total_time, min_time, max_time, mean_time,
calls, query
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 100;
-[ RECORD 6 ]---------------------------------------------------------
total_time | 519.491
min_time | 519.491
max_time | 519.491
mean_time | 519.491
calls | 1
query | SELECT COUNT(*) FROM letters;
What is my database doing when I query?
Let’s talk about EXPLAIN !
What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL statement
Using ANALYZE will actually execute your query! (Don’t worry, you
can ROLLBACK)
EXPLAIN (ANALYZE) my super query;
BEGIN;
EXPLAIN ANALYZE UPDATE owl SET … WHERE …;
ROLLBACK;
So, what does it took like ?
EXPLAIN ANALYZE SELECT * FROM owl WHERE
employer_name=‘Ulule';
QUERY PLAN
-------------------------------------
Seq Scan on owl (cost=0.00..205.01 rows=1 width=35)
(actual time=1.945..1.946 rows=1 loops=1)
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
Planning time: 0.080 ms
Execution time: 1.965 ms
(5 rows)
Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned
Cost of retrieving
first row
Average width of a
row (in bytes)
(actual time=1.945..1.946 rows=1 loops=1)
If you use ANALYZE
Number of time your seq scan
(index scan etc.) was executed
Sequential Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
- Scan the entire database table.
- Retrieves the rows matching your WHERE.
It can be expensive !
Would an index make this query faster ?
What is an index then?
In an encyclopaedia, if you
want to read about owls,
you don’t read the entire
book, you go to the index
first!
A database index contains
the column value and
pointers to the row that has
this value.
CREATE INDEX ON owl (employer_name);
employer_name
Hogwarts
Hogwarts
Hogwarts
…
Post office
…
Ulule
…
Pointer to
Owl 1
Owl 12
Owl  23
…
Owl m
…
Owl n
…
Index scan
Index Scan using owl_employer_name_idx on owl
(cost=0.29..8.30 rows=1 width=35) (actual
time=0.055..0.056 rows=1 loops=1)
Index Cond: ((employer_name)::text =
'Ulule'::text)
Planning time: 0.191 ms
Execution time: 0.109 ms
The index is visited row by row in order to
retrieve the data corresponding to your clause.
Index scan or sequential scan?
EXPLAIN SELECT * FROM owl
WHERE employer_name = 'post office’;
QUERY PLAN
-------------------------------------------------
Seq Scan on owl (cost=0.00..205.03 rows=7001 width=35)
Filter: ((employer_name)::text = 'post office'::text)
With an index and a really common value !
7000/10 000 owls work at the post office
Why is it using a sequential scan?
An index scan uses the order of the index, the
head has to move between rows.
Moving the read head of the database is 1000
times slower than reading the next physical
block.
Conclusion: For common values it’s quicker to
read all data from the table in physical order
By the way… Retrieving 7000 rows might not be a great idea :).
Bitmap Heap Scan
EXPLAIN ANALYZE SELECT * FROM owl WHERE owl.employer_name = 'Hogwarts';
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on owl (cost=23.50..128.50 rows=2000 width=35)
(actual time=1.524..4.081 rows=2000 loops=1)
Recheck Cond: ((employer_name)::text = 'Hogwarts'::text)
Heap Blocks: exact=79
-> Bitmap Index Scan on owl_employer_name_idx1 (cost=0.00..23.00
rows=2000 width=0) (actual time=1.465..1.465 rows=2000 loops=1)
Index Cond: ((employer_name)::text = 'Hogwarts'::text)
Planning time: 15.642 ms
Execution time: 5.309 ms
(7 rows)
With an index and a common value
2000 owls work at Hogwarts
Bitmap Heap Scan
- The tuple-pointer from index are ordered by physical memory
- Go through the map.
Limit physical jumps between rows.
Recheck condition ? If the bitmap is too big
- The bitmap contains the pages where there are rows
- Goes through the pages, rechecks the condition
So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !
And now let’s join !
Nested loops
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id)
WHERE job.id=1;
QUERY PLAN
-------------------------------------------------------------------
Nested Loop (cost=0.00..296.14 rows=9003 width=56)
(actual time=0.093..4.081 rows=9001 loops=1)
-> Seq Scan on job (cost=0.00..1.09 rows=1 width=21) (actual
time=0.064..0.064 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 6
-> Seq Scan on owl (cost=0.00..205.03 rows=9003 width=35)
(actual time=0.015..2.658 rows=9001 loops=1)
Filter: (job_id = 1)
Rows Removed by Filter: 1001
Planning time: 0.188 ms
Execution time: 4.757 ms
Nested loops
Python version
jobs = Job.objects.all()
owls = Owl.objects.all()
for owl in owls:
for job in jobs:
if owl.job_id == job.id:
owl.job = job
break
- Used for little tables
- Complexity of O(n*m)
Hash Join
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Join (cost=1.17..318.70 rows=10001 width=56) (actual time=0.058..3.830 rows=1000 loops=1)
Hash Cond: (owl.job_id = job.id)
-> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.039..2.170
rows=10002 loops=1)
-> Hash (cost=1.09..1.09 rows=7 width=21) (actual time=0.010..0.010 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on job (cost=0.00..1.09 rows=7 width=21) (actual time=0.007..0.009 rows=6
loops=1)
Filter: (id > 1)
Rows Removed by Filter: 1
Planning time: 2.327 ms
Execution time: 3.905 ms
(10 rows)
Hash Join
Python version
jobs = Job.objects.all()
jobs_dict = {}
for job in jobs:
jobs_dict[job.id] = job
owls = Owl.objects.all()
for owl in owls:
owl.job = jobs_dict[owl.job_id]
- Used for small tables
- the hash table has to fit in memory (You
wouldn’t do a Python dictionary with 1M rows ;) )
- If the table is really small, a nested loop is
used because of the complexity of creating a
hash table
Merge Join
EXPLAIN ANALYZE SELECT * FROM letters JOIN human ON (receiver_id = human.id) ORDER BY
letters.receiver_id;
QUERY PLAN
-------------------------------------------------------------
Merge Join (cost=55300.94..62558.10 rows=411861 width=51)
(actual time=264.207..502.215 rows=411861 loops=1)
Merge Cond: (human.id = letters.receiver_id)
-> Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=3.191..4.413 rows=10000 loops=1)
Sort Key: human.id
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.067..1.131
rows=10000 loops=1)
-> Materialize (cost=54406.35..56465.66 rows=411861 width=24) (actual time=261.009..414.465
rows=411861 loops=1)
-> Sort (cost=54406.35..55436.00 rows=411861 width=24) (actual time=261.008..369.777
rows=411861 loops=1)
Sort Key: letters.receiver_id
Sort Method: external merge Disk: 15232kB
-> Seq Scan on letters (cost=0.00..7547.61 rows=411861 width=24) (actual
time=0.010..37.253 rows=411861 loops=1)
Planning time: 0.205 ms
Execution time: 537.119 ms
(13 rows)
Merge Join - 2
Used for big tables, an index can be
used to avoid sorting
Sorted
Human
Lida (id = 1)
Mattie (id = 2)
Cameron (id = 3)
Carol (id = 4)
Maxie (id = 5)
Candy (id = 6)
…
Letters
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 2)
Letter (receiver_id = 2)
Letter (receiver_id = 3)
…
Sorted
So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And now, ORDER BY
And now let’s order stuff…
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name;
QUERY PLAN
-------------------------------------------------------------
Sort (cost=894.39..919.39 rows=10000 width=23)
(actual time=163.228..164.211 rows=10000 loops=1)
Sort Key: last_name
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23)
(actual time=14.341..17.593 rows=10000 loops=1)
Planning time: 0.189 ms
Execution time: 164.702 ms
(6 rows)
Everything is sorted in the memory
(which is why it can be costly in terms of memory)
ORDER BY LIMIT
EXPLAIN ANALYZE SELECT * FROM human
ORDER BY last_name LIMIT 3;
QUERY PLAN
---------------------------------------------------------------
Limit (cost=446.10..446.12 rows=10 width=23)
(actual time=11.942..11.944 rows=3 loops=1)
-> Sort (cost=446.10..471.10 rows=10000 width=23)
(actual time=11.942..11.942 rows=3 loops=1)
Sort Key: last_name
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000
width=23) (actual time=0.074..0.947 rows=10000 loops=1)
Planning time: 0.074 ms
Execution time: 11.966 ms
(7 rows)
Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- Else
- If value smaller than current values (for ASC): insert row
in heap, pop last
- Else pass
Top-N heap sort
Example LIMIT 3
Human
id last_name
1 Potter
2 Bailey
3 Acosta
4 Weasley
5 Caroll
… …
Ordering with an index
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 5;
QUERY PLAN
-------------------------------------------------------------
Limit (cost=0.29..0.70 rows=5 width=23)
(actual time=0.606..0.611 rows=5 loops=1)
-> Index Scan using human_last_name_idx on human
(cost=0.29..834.27 rows=10000 width=23)
(actual time=0.605..0.610 rows=5 loops=1)
Planning time: 0.860 ms
Execution time: 0.808 ms
(4 rows)
Simply uses index order
CREATE INDEX ON human (last_name);
Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You might need an index, only EXPLAIN will tell
you
Let’s get Sirius !
An example a bit more complex
The ministry of magic wants to identify suspect humans.
To do that they ask their DBM (Database Magician) for
- The list of the letters sent by Voldemort
- For each we want the id and date of the reply (if exists)
- All of this with the first and last name of the receiver
Expected result
first_name last_name receiver_id letter_id sent_at answer_id answer_sent_at
Leticia O'Takatakaer 812 577732 2010-03-05 20:18:00
Almeda Steinkingjacer 899 577735 2010-03-09 11:44:00
Sherita Butchchterjac 1444 577744 2010-03-18 09:11:00
Devon Jactakadurmc 1812 577760 2010-04-02 14:38:00
Mariah Mctakachenmc 2000 577733 2010-03-06 18:00:00
Lupita Duro'Otosan 2332 577751 2010-03-24 18:04:00
Chloe Royo'Erfür 2464 577737 2010-03-10 14:46:00
Agustin Erillsancht 2668 577752 2010-03-25 22:57:00
Hiram Ibnchensteinill 2689 577746 2010-03-19 13:55:00
Jayme Erersteinvur 2856 577766 2010-04-08 23:03:00 578834 2010-6-16 6:26:36
The python version
letters_from_voldemort =
Letters.objects.filter(sender_id=3267).select_related('receiver').order_by('sent_at')
letters_to_voldemort = Letters.objects.filter(receiver_id=3267).order_by('sent_at')
data = []
for letter in letters_from_voldemort:
for answer in letters_to_voldemort:
if letter.receiver_id == answer.sender_id and letter.sent_at < answer.sent_at:
answer_found = True
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at, answer.id, answer.sent_at])
break
else:
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at])
Takes about 1540 ms
Why not have fun with SQL ?
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267
ORDER BY sent_at
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
Oh well, it takes 48 seconds…
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8579.33..434310.73 rows=40 width=47) (actual time=99.106..48459.498 rows=1067
loops=1)
-> Hash Join (cost=8579.33..8847.23 rows=40 width=39) (actual time=53.903..60.985 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.060..1.303 rows=10000
loops=1)
-> Hash (cost=8578.83..8578.83 rows=40 width=20) (actual time=53.828..53.828 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=8578.33..8578.83 rows=40 width=20) (actual time=53.377..53.580
rows=1067 loops=1)
-> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067
loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual
time=0.939..53.127 rows=1067 loops=1)
Filter: (sender_id = 3267)
Rows Removed by Filter: 410794
-> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067)
-> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346
rows=0 loops=1067)
Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id))
Rows Removed by Filter: 410896
Planning time: 0.859 ms
Execution time: 48460.225 ms
(19 rows)
What is my problem ?
-> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual
time=53.376..53.439 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Seq Scan on letters (cost=0.00..8577.26 rows=40
width=20) (actual time=0.939..53.127 rows=1067 loops=1)
Filter: (sender_id = 3267)
Rows Removed by Filter: 410794
-> Limit (cost=0.00..10636.57 rows=1 width=12) (actual
time=45.356..45.356 rows=0 loops=1067)
-> Seq Scan on letters letters_1 (cost=0.00..10636.57
rows=1 width=12) (actual time=45.346..45.346 rows=0
loops=1067)
Filter: ((sent_at > l1.sent_at) AND (sender_id =
l1.receiver_id) AND (receiver_id = l1.sender_id))
Rows Removed by Filter: 410896
The query planner is using a sequential scan to filter almost our entire table…
So clearly we are missing some indexes on the columns receiver_id, sender_id and sent_at.
Let’s create the following index
CREATE INDEX ON letters (sender_id, sent_at, receiver_id);
Quick reminder on multi column indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
Multi-columns indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
The first column of the index is ordered
The index will be used for
SELECT … FROM letters WHERE sender_id=…;
The same is true for
SELECT … FROM letters WHERE sender_id=…
AND sent_at = …;
The order of the columns matters !
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
In our previous query we had
SELECT … FROM letters WHERE sender_id=3267 ORDER BY sent_at
And then in the LATERAL JOIN
SELECT … FROM letters WHERE sender_id = l1.receiver_i
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
The index won’t be used for
SELECT … FROM letters WHERE receiver_id=…;
The index (sender_id, sent_at, receiver_id) works fine
Better no ? 180 times faster than Python
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=154.76..767.34 rows=40 width=47) (actual time=1.092..7.694 rows=1067 loops=1)
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
-> Sort (cost=153.34..153.44 rows=40 width=20) (actual time=0.520..0.630 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304
rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72
rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1
width=12) (actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
Planning time: 0.565 ms
Execution time: 7.804 ms
(20 rows)
Better no ?
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual
time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12)
(actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
The index is used where there
were sequential scans.
Focusing on this has join…
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
A hash is built from the subquery (1067 rows)
Human table having 10 000 rows
human
Human 1
Human 2
…
Human 22
…
Human 114
hash table
(receiver_id)
22
94
104
114
125
…
Rows
row 981
row 801
row 132
row 203
row 42
row 12
row 26
row 1012
…
Let’s use a pagination…
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267 AND sent_at > '2010-03-22'
ORDER BY sent_at LIMIT 20
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
Let’s use a pagination…
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.13..417.82 rows=20 width=47) (actual time=0.049..0.365 rows=20 loops=1)
-> Nested Loop Left Join (cost=0.84..255.57 rows=20 width=28) (actual time=0.040..0.231 rows=20
loops=1)
-> Limit (cost=0.42..82.82 rows=20 width=20) (actual time=0.022..0.038 rows=20 loops=1)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters
(cost=0.42..165.22 rows=40 width=20) (actual time=0.020..0.035 rows=20 loops=1)
Index Cond: ((sender_id = 3267) AND (sent_at > '2010-03-22 00:00:00+01'::timestamp
with time zone))
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=20)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1
(cost=0.42..8.61 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=20)
Index Cond: ((sender_id = letters.receiver_id) AND (sent_at > letters.sent_at) AND
(receiver_id = letters.sender_id))
-> Index Scan using humain_pkey on human (cost=0.29..8.10 rows=1 width=23) (actual time=0.006..0.006
rows=1 loops=20)
Index Cond: (id = letters.receiver_id)
Planning time: 0.709 ms
Execution time: 0.436 ms
(12 rows)
Conclusion
Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
To go further - sources
https://momjian.us/main/writings/pgsql/optimizer.pdf
https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations
http://tech.novapost.fr/postgresql-application_name-django-settings.html
Slide used with the owls ans paintings from
https://www.instagram.com/zimmoriarty

Contenu connexe

Tendances

Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
 
[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기NAVER D2
 
The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
 
yt: An Analysis and Visualization System for Astrophysical Simulation Data
yt: An Analysis and Visualization System for Astrophysical Simulation Datayt: An Analysis and Visualization System for Astrophysical Simulation Data
yt: An Analysis and Visualization System for Astrophysical Simulation DataJohn ZuHone
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cythonAnderson Dantas
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017Prasun Anand
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212Mahmoud Samir Fayed
 
Highlight Utility Styles
Highlight Utility StylesHighlight Utility Styles
Highlight Utility StylesAngela Byron
 
Elixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
 
Elixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitTobias Pfeiffer
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature SelectionJames Huang
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSqlDmytro Shylovskyi
 
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지Jeongah Shin
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2Zaar Hai
 

Tendances (20)

Palestra sobre Collections com Python
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
 
[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
 
The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181
 
R intro 20140716-advance
R intro 20140716-advanceR intro 20140716-advance
R intro 20140716-advance
 
PythonOOP
PythonOOPPythonOOP
PythonOOP
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
yt: An Analysis and Visualization System for Astrophysical Simulation Data
yt: An Analysis and Visualization System for Astrophysical Simulation Datayt: An Analysis and Visualization System for Astrophysical Simulation Data
yt: An Analysis and Visualization System for Astrophysical Simulation Data
 
Clustering com numpy e cython
Clustering com numpy e cythonClustering com numpy e cython
Clustering com numpy e cython
 
Statistical computing 01
Statistical computing 01Statistical computing 01
Statistical computing 01
 
Docopt
DocoptDocopt
Docopt
 
High performance GPU computing with Ruby RubyConf 2017
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212
 
Highlight Utility Styles
Highlight Utility StylesHighlight Utility Styles
Highlight Utility Styles
 
Leaks & Zombies
Leaks & ZombiesLeaks & Zombies
Leaks & Zombies
 
Elixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicit
 
Elixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicit
 
30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
 
Dive into EXPLAIN - PostgreSql
Dive into EXPLAIN  - PostgreSqlDive into EXPLAIN  - PostgreSql
Dive into EXPLAIN - PostgreSql
 
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
 
Advanced Python, Part 2
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
 

Similaire à Becoming a better developer with EXPLAIN

The amazing world behind your ORM
The amazing world behind your ORMThe amazing world behind your ORM
The amazing world behind your ORMLouise Grandjonc
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in pythonRemco Wendt
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialjbellis
 
Problem 1 Show the comparison of runtime of linear search and binar.pdf
Problem 1 Show the comparison of runtime of linear search and binar.pdfProblem 1 Show the comparison of runtime of linear search and binar.pdf
Problem 1 Show the comparison of runtime of linear search and binar.pdfebrahimbadushata00
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak SearchJustin Edelson
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoRemco Wendt
 
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...Rainer Schuettengruber
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)Qiangning Hong
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 

Similaire à Becoming a better developer with EXPLAIN (20)

The amazing world behind your ORM
The amazing world behind your ORMThe amazing world behind your ORM
The amazing world behind your ORM
 
Conf orm - explain
Conf orm - explainConf orm - explain
Conf orm - explain
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Pygrunn 2012 down the rabbit - profiling in python
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Problem 1 Show the comparison of runtime of linear search and binar.pdf
Problem 1 Show the comparison of runtime of linear search and binar.pdfProblem 1 Show the comparison of runtime of linear search and binar.pdf
Problem 1 Show the comparison of runtime of linear search and binar.pdf
 
Demystifying Oak Search
Demystifying Oak SearchDemystifying Oak Search
Demystifying Oak Search
 
EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search
EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak SearchEVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search
EVOLVE'14 | Enhance | Justin Edelson & Darin Kuntze | Demystifying Oak Search
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 
Down the rabbit hole, profiling in Django
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in Django
 
PostgreSQL: Advanced indexing
PostgreSQL: Advanced indexingPostgreSQL: Advanced indexing
PostgreSQL: Advanced indexing
 
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
 
7li7w devcon5
7li7w devcon57li7w devcon5
7li7w devcon5
 
Python高级编程(二)
Python高级编程(二)Python高级编程(二)
Python高级编程(二)
 
Writing Faster Python 3
Writing Faster Python 3Writing Faster Python 3
Writing Faster Python 3
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 

Plus de Louise Grandjonc

Plus de Louise Grandjonc (6)

Postgres index types
Postgres index typesPostgres index types
Postgres index types
 
Amazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doAmazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't do
 
Croco talk pgconfeu
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeu
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
Pg exercices
Pg exercicesPg exercices
Pg exercices
 
Meetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGMeetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PG
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Dernier (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Becoming a better developer with EXPLAIN

  • 1. Becoming a better developer with explain Understanding Postgres Query planner Louise Grandjonc
  • 2. Louise Grandjonc (louise@ulule.com) Lead developer at Ulule (www.ulule.com) Python / Django developer - Postgres enthusiast @louisemeta on twitter About me
  • 3. 1. Can understanding the query plan help me? 2. What if I’m using an ORM? 3. So what does my database do when I query stuff? Today’s agenda
  • 4. Can understanding the query plan help me?
  • 5. The mystery of querying stuff You Query SELECT … FROM … id name 1 Louise 2 Alfred … … Result
  • 6. Zooming into it SQL Query SELECT … FROM … id name 1 Louise 2 Alfred … … Result Parse Planner Optimizer Execute - Generates execution plans for a query - Calculates the cost of each plan. - The best one is used to execute your query
  • 7. - Understand why is your filter / join / order slow. - Know if the new index you just added is used. - Stop guessing which index can be useful In conclusion: You will understand why your query is slow So what can I learn from the query plan
  • 8. What if I’m using an ORM ?
  • 9. But why can’t we trust it ? 1.The ORM executes queries that you might not expect 2.Your queries might not be optimised and you won’t know about it
  • 10. The story of the owls Owl RecipientRecipientJob Letters Human * 1 1* 1* id sender_id receiver_id sent_at delivered_by id first_name last_name id name employer_name feather_color favourite_food id name 10 002 owls 10 000 humans 411861 letters
  • 11. Loops (with django) SELECT id, name, employer_name, favourite_food, job_id, feather_color FROM owl WHERE employer_name = 'Hogwarts' SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 … owls = Owl.objects.filter(employer_name=‘Hogwarts’) for owl in owls: print(owl.job) # 1 query per loop iteration Awesome right ?
  • 12. Loops (with django) Owl.objects.filter(employer_name=‘Ulule’) .select_related(‘job’) SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" = "job"."id") WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "job" WHERE "job"."id" IN (2) Owl.objects.filter(employer_name=‘Ulule’) .prefetch_related(‘job’)
  • 13. Where are my logs? owl_conference=# show log_directory ; log_directory --------------- pg_log owl_conference=# show data_directory ; data_directory ------------------------- /usr/local/var/postgres owl_conference=# show log_filename ; log_filename ------------------------- postgresql-%Y-%m-%d.log Terminal command $ psql -U user -d your_database_name psql interface
  • 14. Having good looking logs (and logging everything like a crazy owl) owl_conference=# SHOW config_file; config_file ----------------------------------------- /usr/local/var/postgres/postgresql.conf (1 row) In your postgresql.conf log_filename = 'postgresql-%Y-%m-%d.log' log_statement = 'all' logging_collector = on log_min_duration_statement = 0
  • 15. Finding dirty queries pg_stat_statements In your psql CREATE EXTENSION pg_stat_statements; The module must be added to your shared_preload_libraries. You have to change your postgresql.conf and restart. shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max = 10000 pg_stat_statements.track = all
  • 16. Finding the painful queries SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100; -[ RECORD 6 ]--------------------------------------------------------- total_time | 519.491 min_time | 519.491 max_time | 519.491 mean_time | 519.491 calls | 1 query | SELECT COUNT(*) FROM letters;
  • 17. What is my database doing when I query?
  • 18. Let’s talk about EXPLAIN !
  • 19. What is EXPLAIN Gives you the execution plan chosen by the query planner that your database will use to execute your SQL statement Using ANALYZE will actually execute your query! (Don’t worry, you can ROLLBACK) EXPLAIN (ANALYZE) my super query; BEGIN; EXPLAIN ANALYZE UPDATE owl SET … WHERE …; ROLLBACK;
  • 20. So, what does it took like ? EXPLAIN ANALYZE SELECT * FROM owl WHERE employer_name=‘Ulule'; QUERY PLAN ------------------------------------- Seq Scan on owl (cost=0.00..205.01 rows=1 width=35) (actual time=1.945..1.946 rows=1 loops=1) Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10001 Planning time: 0.080 ms Execution time: 1.965 ms (5 rows)
  • 21. Let’s go step by step ! .. 1 Costs (cost=0.00..205.01 rows=1 width=35) Cost of retrieving all rows Number of rows returned Cost of retrieving first row Average width of a row (in bytes) (actual time=1.945..1.946 rows=1 loops=1) If you use ANALYZE Number of time your seq scan (index scan etc.) was executed
  • 22. Sequential Scan Seq Scan on owl ... Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10001 - Scan the entire database table. - Retrieves the rows matching your WHERE. It can be expensive ! Would an index make this query faster ?
  • 23. What is an index then? In an encyclopaedia, if you want to read about owls, you don’t read the entire book, you go to the index first! A database index contains the column value and pointers to the row that has this value. CREATE INDEX ON owl (employer_name); employer_name Hogwarts Hogwarts Hogwarts … Post office … Ulule … Pointer to Owl 1 Owl 12 Owl  23 … Owl m … Owl n …
  • 24. Index scan Index Scan using owl_employer_name_idx on owl (cost=0.29..8.30 rows=1 width=35) (actual time=0.055..0.056 rows=1 loops=1) Index Cond: ((employer_name)::text = 'Ulule'::text) Planning time: 0.191 ms Execution time: 0.109 ms The index is visited row by row in order to retrieve the data corresponding to your clause.
  • 25. Index scan or sequential scan? EXPLAIN SELECT * FROM owl WHERE employer_name = 'post office’; QUERY PLAN ------------------------------------------------- Seq Scan on owl (cost=0.00..205.03 rows=7001 width=35) Filter: ((employer_name)::text = 'post office'::text) With an index and a really common value ! 7000/10 000 owls work at the post office
  • 26. Why is it using a sequential scan? An index scan uses the order of the index, the head has to move between rows. Moving the read head of the database is 1000 times slower than reading the next physical block. Conclusion: For common values it’s quicker to read all data from the table in physical order By the way… Retrieving 7000 rows might not be a great idea :).
  • 27. Bitmap Heap Scan EXPLAIN ANALYZE SELECT * FROM owl WHERE owl.employer_name = 'Hogwarts'; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on owl (cost=23.50..128.50 rows=2000 width=35) (actual time=1.524..4.081 rows=2000 loops=1) Recheck Cond: ((employer_name)::text = 'Hogwarts'::text) Heap Blocks: exact=79 -> Bitmap Index Scan on owl_employer_name_idx1 (cost=0.00..23.00 rows=2000 width=0) (actual time=1.465..1.465 rows=2000 loops=1) Index Cond: ((employer_name)::text = 'Hogwarts'::text) Planning time: 15.642 ms Execution time: 5.309 ms (7 rows) With an index and a common value 2000 owls work at Hogwarts
  • 28. Bitmap Heap Scan - The tuple-pointer from index are ordered by physical memory - Go through the map. Limit physical jumps between rows. Recheck condition ? If the bitmap is too big - The bitmap contains the pages where there are rows - Goes through the pages, rechecks the condition
  • 29. So we have 3 types of scan 1. Sequential scan 2. Index scan 3. Bitmap heap scan And now let’s join stuff !
  • 30. And now let’s join ! Nested loops EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id=1; QUERY PLAN ------------------------------------------------------------------- Nested Loop (cost=0.00..296.14 rows=9003 width=56) (actual time=0.093..4.081 rows=9001 loops=1) -> Seq Scan on job (cost=0.00..1.09 rows=1 width=21) (actual time=0.064..0.064 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 6 -> Seq Scan on owl (cost=0.00..205.03 rows=9003 width=35) (actual time=0.015..2.658 rows=9001 loops=1) Filter: (job_id = 1) Rows Removed by Filter: 1001 Planning time: 0.188 ms Execution time: 4.757 ms
  • 31. Nested loops Python version jobs = Job.objects.all() owls = Owl.objects.all() for owl in owls: for job in jobs: if owl.job_id == job.id: owl.job = job break - Used for little tables - Complexity of O(n*m)
  • 32. Hash Join EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id > 1; QUERY PLAN ------------------------------------------------------------------------------------------------- Hash Join (cost=1.17..318.70 rows=10001 width=56) (actual time=0.058..3.830 rows=1000 loops=1) Hash Cond: (owl.job_id = job.id) -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.039..2.170 rows=10002 loops=1) -> Hash (cost=1.09..1.09 rows=7 width=21) (actual time=0.010..0.010 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on job (cost=0.00..1.09 rows=7 width=21) (actual time=0.007..0.009 rows=6 loops=1) Filter: (id > 1) Rows Removed by Filter: 1 Planning time: 2.327 ms Execution time: 3.905 ms (10 rows)
  • 33. Hash Join Python version jobs = Job.objects.all() jobs_dict = {} for job in jobs: jobs_dict[job.id] = job owls = Owl.objects.all() for owl in owls: owl.job = jobs_dict[owl.job_id] - Used for small tables - the hash table has to fit in memory (You wouldn’t do a Python dictionary with 1M rows ;) ) - If the table is really small, a nested loop is used because of the complexity of creating a hash table
  • 34. Merge Join EXPLAIN ANALYZE SELECT * FROM letters JOIN human ON (receiver_id = human.id) ORDER BY letters.receiver_id; QUERY PLAN ------------------------------------------------------------- Merge Join (cost=55300.94..62558.10 rows=411861 width=51) (actual time=264.207..502.215 rows=411861 loops=1) Merge Cond: (human.id = letters.receiver_id) -> Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=3.191..4.413 rows=10000 loops=1) Sort Key: human.id Sort Method: quicksort Memory: 1166kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.067..1.131 rows=10000 loops=1) -> Materialize (cost=54406.35..56465.66 rows=411861 width=24) (actual time=261.009..414.465 rows=411861 loops=1) -> Sort (cost=54406.35..55436.00 rows=411861 width=24) (actual time=261.008..369.777 rows=411861 loops=1) Sort Key: letters.receiver_id Sort Method: external merge Disk: 15232kB -> Seq Scan on letters (cost=0.00..7547.61 rows=411861 width=24) (actual time=0.010..37.253 rows=411861 loops=1) Planning time: 0.205 ms Execution time: 537.119 ms (13 rows)
  • 35. Merge Join - 2 Used for big tables, an index can be used to avoid sorting Sorted Human Lida (id = 1) Mattie (id = 2) Cameron (id = 3) Carol (id = 4) Maxie (id = 5) Candy (id = 6) … Letters Letter (receiver_id = 1) Letter (receiver_id = 1) Letter (receiver_id = 1) Letter (receiver_id = 2) Letter (receiver_id = 2) Letter (receiver_id = 3) … Sorted
  • 36. So we have 3 types of joins 1. Nested loop 2. Hash join 3. Merge join And now, ORDER BY
  • 37. And now let’s order stuff… EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name; QUERY PLAN ------------------------------------------------------------- Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=163.228..164.211 rows=10000 loops=1) Sort Key: last_name Sort Method: quicksort Memory: 1166kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=14.341..17.593 rows=10000 loops=1) Planning time: 0.189 ms Execution time: 164.702 ms (6 rows) Everything is sorted in the memory (which is why it can be costly in terms of memory)
  • 38. ORDER BY LIMIT EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 3; QUERY PLAN --------------------------------------------------------------- Limit (cost=446.10..446.12 rows=10 width=23) (actual time=11.942..11.944 rows=3 loops=1) -> Sort (cost=446.10..471.10 rows=10000 width=23) (actual time=11.942..11.942 rows=3 loops=1) Sort Key: last_name Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.074..0.947 rows=10000 loops=1) Planning time: 0.074 ms Execution time: 11.966 ms (7 rows) Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
  • 39. Top-N heap sort - A heap (sort of tree) is used with a limited size - For each row - If heap not full: add row in heap - Else - If value smaller than current values (for ASC): insert row in heap, pop last - Else pass
  • 40. Top-N heap sort Example LIMIT 3 Human id last_name 1 Potter 2 Bailey 3 Acosta 4 Weasley 5 Caroll … …
  • 41. Ordering with an index EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 5; QUERY PLAN ------------------------------------------------------------- Limit (cost=0.29..0.70 rows=5 width=23) (actual time=0.606..0.611 rows=5 loops=1) -> Index Scan using human_last_name_idx on human (cost=0.29..834.27 rows=10000 width=23) (actual time=0.605..0.610 rows=5 loops=1) Planning time: 0.860 ms Execution time: 0.808 ms (4 rows) Simply uses index order CREATE INDEX ON human (last_name);
  • 42. Be careful when you ORDER BY ! 1. Sorting with sort key without limit or index can be heavy in term of memory ! 2. You might need an index, only EXPLAIN will tell you
  • 44. An example a bit more complex The ministry of magic wants to identify suspect humans. To do that they ask their DBM (Database Magician) for - The list of the letters sent by Voldemort - For each we want the id and date of the reply (if exists) - All of this with the first and last name of the receiver
  • 45. Expected result first_name last_name receiver_id letter_id sent_at answer_id answer_sent_at Leticia O'Takatakaer 812 577732 2010-03-05 20:18:00 Almeda Steinkingjacer 899 577735 2010-03-09 11:44:00 Sherita Butchchterjac 1444 577744 2010-03-18 09:11:00 Devon Jactakadurmc 1812 577760 2010-04-02 14:38:00 Mariah Mctakachenmc 2000 577733 2010-03-06 18:00:00 Lupita Duro'Otosan 2332 577751 2010-03-24 18:04:00 Chloe Royo'Erfür 2464 577737 2010-03-10 14:46:00 Agustin Erillsancht 2668 577752 2010-03-25 22:57:00 Hiram Ibnchensteinill 2689 577746 2010-03-19 13:55:00 Jayme Erersteinvur 2856 577766 2010-04-08 23:03:00 578834 2010-6-16 6:26:36
  • 46. The python version letters_from_voldemort = Letters.objects.filter(sender_id=3267).select_related('receiver').order_by('sent_at') letters_to_voldemort = Letters.objects.filter(receiver_id=3267).order_by('sent_at') data = [] for letter in letters_from_voldemort: for answer in letters_to_voldemort: if letter.receiver_id == answer.sender_id and letter.sent_at < answer.sent_at: answer_found = True data.append([letter.receiver.first_name, letter.receiver.last_name, letter.receiver.id, letter.id, letter.sent_at, answer.id, answer.sent_at]) break else: data.append([letter.receiver.first_name, letter.receiver.last_name, letter.receiver.id, letter.id, letter.sent_at]) Takes about 1540 ms
  • 47. Why not have fun with SQL ? SELECT human.first_name, human.last_name, receiver_id, letter_id, sent_at, answer_id, answer_sent_at FROM ( SELECT id as letter_id, receiver_id, sent_at, sender_id FROM letters WHERE sender_id=3267 ORDER BY sent_at ) l1 LEFT JOIN LATERAL ( SELECT id as answer_id, sent_at as answer_sent_at FROM letters WHERE sender_id = l1.receiver_id AND sent_at > l1.sent_at AND receiver_id = l1.sender_id LIMIT 1 ) l2 ON true JOIN human ON (human.id=receiver_id)
  • 48. Oh well, it takes 48 seconds… QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=8579.33..434310.73 rows=40 width=47) (actual time=99.106..48459.498 rows=1067 loops=1) -> Hash Join (cost=8579.33..8847.23 rows=40 width=39) (actual time=53.903..60.985 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.060..1.303 rows=10000 loops=1) -> Hash (cost=8578.83..8578.83 rows=40 width=20) (actual time=53.828..53.828 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=8578.33..8578.83 rows=40 width=20) (actual time=53.377..53.580 rows=1067 loops=1) -> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual time=0.939..53.127 rows=1067 loops=1) Filter: (sender_id = 3267) Rows Removed by Filter: 410794 -> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067) -> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346 rows=0 loops=1067) Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id)) Rows Removed by Filter: 410896 Planning time: 0.859 ms Execution time: 48460.225 ms (19 rows)
  • 49. What is my problem ? -> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual time=0.939..53.127 rows=1067 loops=1) Filter: (sender_id = 3267) Rows Removed by Filter: 410794 -> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067) -> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346 rows=0 loops=1067) Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id)) Rows Removed by Filter: 410896 The query planner is using a sequential scan to filter almost our entire table… So clearly we are missing some indexes on the columns receiver_id, sender_id and sent_at.
  • 50. Let’s create the following index CREATE INDEX ON letters (sender_id, sent_at, receiver_id); Quick reminder on multi column indexes sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … …
  • 51. Multi-columns indexes sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … The first column of the index is ordered The index will be used for SELECT … FROM letters WHERE sender_id=…; The same is true for SELECT … FROM letters WHERE sender_id=… AND sent_at = …;
  • 52. The order of the columns matters ! receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … In our previous query we had SELECT … FROM letters WHERE sender_id=3267 ORDER BY sent_at And then in the LATERAL JOIN SELECT … FROM letters WHERE sender_id = l1.receiver_i AND sent_at > l1.sent_at AND receiver_id = l1.sender_id The index won’t be used for SELECT … FROM letters WHERE receiver_id=…; The index (sender_id, sent_at, receiver_id) works fine
  • 53. Better no ? 180 times faster than Python QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=154.76..767.34 rows=40 width=47) (actual time=1.092..7.694 rows=1067 loops=1) -> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1) -> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1) -> Sort (cost=153.34..153.44 rows=40 width=20) (actual time=0.520..0.630 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1) Recheck Cond: (sender_id = 3267) Heap Blocks: exact=74 -> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1) Index Cond: (sender_id = 3267) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1067) Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id)) Planning time: 0.565 ms Execution time: 7.804 ms (20 rows)
  • 54. Better no ? -> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1) Recheck Cond: (sender_id = 3267) Heap Blocks: exact=74 -> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1) Index Cond: (sender_id = 3267) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1067) Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id)) The index is used where there were sequential scans.
  • 55. Focusing on this has join… -> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1) -> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1) A hash is built from the subquery (1067 rows) Human table having 10 000 rows human Human 1 Human 2 … Human 22 … Human 114 hash table (receiver_id) 22 94 104 114 125 … Rows row 981 row 801 row 132 row 203 row 42 row 12 row 26 row 1012 …
  • 56. Let’s use a pagination… SELECT human.first_name, human.last_name, receiver_id, letter_id, sent_at, answer_id, answer_sent_at FROM ( SELECT id as letter_id, receiver_id, sent_at, sender_id FROM letters WHERE sender_id=3267 AND sent_at > '2010-03-22' ORDER BY sent_at LIMIT 20 ) l1 LEFT JOIN LATERAL ( SELECT id as answer_id, sent_at as answer_sent_at FROM letters WHERE sender_id = l1.receiver_id AND sent_at > l1.sent_at AND receiver_id = l1.sender_id LIMIT 1 ) l2 ON true JOIN human ON (human.id=receiver_id)
  • 57. Let’s use a pagination… QUERY PLAN ---------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.13..417.82 rows=20 width=47) (actual time=0.049..0.365 rows=20 loops=1) -> Nested Loop Left Join (cost=0.84..255.57 rows=20 width=28) (actual time=0.040..0.231 rows=20 loops=1) -> Limit (cost=0.42..82.82 rows=20 width=20) (actual time=0.022..0.038 rows=20 loops=1) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters (cost=0.42..165.22 rows=40 width=20) (actual time=0.020..0.035 rows=20 loops=1) Index Cond: ((sender_id = 3267) AND (sent_at > '2010-03-22 00:00:00+01'::timestamp with time zone)) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=20) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=20) Index Cond: ((sender_id = letters.receiver_id) AND (sent_at > letters.sent_at) AND (receiver_id = letters.sender_id)) -> Index Scan using humain_pkey on human (cost=0.29..8.10 rows=1 width=23) (actual time=0.006..0.006 rows=1 loops=20) Index Cond: (id = letters.receiver_id) Planning time: 0.709 ms Execution time: 0.436 ms (12 rows)
  • 59. Thank you for your attention ! Any questions? Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
  • 60. To go further - sources https://momjian.us/main/writings/pgsql/optimizer.pdf https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations http://tech.novapost.fr/postgresql-application_name-django-settings.html Slide used with the owls ans paintings from https://www.instagram.com/zimmoriarty