Becoming a better developer with EXPLAIN

Louise Grandjonc
Louise GrandjoncSolutions engineer at Citus Data à Citus Data
Becoming a better
developer with explain
Understanding Postgres Query planner
Louise Grandjonc
Louise Grandjonc (louise@ulule.com)
Lead developer at Ulule (www.ulule.com)
Python / Django developer - Postgres
enthusiast
@louisemeta on twitter
About me
1. Can understanding the query plan help me?
2. What if I’m using an ORM?
3. So what does my database do when I query stuff?
Today’s agenda
Can understanding the query plan
help me?
The mystery of querying stuff
You
Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
Zooming into it
SQL Query
SELECT …
FROM …
id name
1 Louise
2 Alfred
… …
Result
Parse
Planner
Optimizer
Execute
- Generates execution plans for a query
- Calculates the cost of each plan.
- The best one is used to execute your query
- Understand why is your filter / join / order slow.
- Know if the new index you just added is used.
- Stop guessing which index can be useful
In conclusion: You will understand why your query is slow
So what can I learn from the
query plan
What if I’m using an ORM ?
But why can’t we trust it ?
1.The ORM executes queries that
you might not expect
2.Your queries might not be
optimised and you won’t know
about it
The story of the owls
Owl RecipientRecipientJob
Letters Human
*
1
1*
1*
id
sender_id
receiver_id
sent_at
delivered_by
id
first_name
last_name
id
name
employer_name
feather_color
favourite_food
id
name
10 002 owls
10 000 humans
411861 letters
Loops (with django)
SELECT id, name, employer_name, favourite_food, job_id,
feather_color FROM owl WHERE employer_name = 'Hogwarts'
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
SELECT id, name FROM job WHERE id = 1
…
owls = Owl.objects.filter(employer_name=‘Hogwarts’)
for owl in owls:
print(owl.job) # 1 query per loop iteration
Awesome right ?
Loops (with django)
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" =
"job"."id")
WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "job" WHERE "job"."id" IN (2)
Owl.objects.filter(employer_name=‘Ulule’)
.prefetch_related(‘job’)
Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_directory ;
data_directory
-------------------------
/usr/local/var/postgres
owl_conference=# show log_filename ;
log_filename
-------------------------
postgresql-%Y-%m-%d.log
Terminal command
$ psql -U user -d your_database_name
psql interface
Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
-----------------------------------------
/usr/local/var/postgres/postgresql.conf
(1 row)
In your postgresql.conf
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'
logging_collector = on
log_min_duration_statement = 0
Finding dirty queries
pg_stat_statements
In your psql
CREATE EXTENSION pg_stat_statements;
The module must be added to your shared_preload_libraries.
You have to change your postgresql.conf and restart.
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
Finding the painful queries
SELECT total_time, min_time, max_time, mean_time,
calls, query
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 100;
-[ RECORD 6 ]---------------------------------------------------------
total_time | 519.491
min_time | 519.491
max_time | 519.491
mean_time | 519.491
calls | 1
query | SELECT COUNT(*) FROM letters;
What is my database doing when I query?
Let’s talk about EXPLAIN !
What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL statement
Using ANALYZE will actually execute your query! (Don’t worry, you
can ROLLBACK)
EXPLAIN (ANALYZE) my super query;
BEGIN;
EXPLAIN ANALYZE UPDATE owl SET … WHERE …;
ROLLBACK;
So, what does it took like ?
EXPLAIN ANALYZE SELECT * FROM owl WHERE
employer_name=‘Ulule';
QUERY PLAN
-------------------------------------
Seq Scan on owl (cost=0.00..205.01 rows=1 width=35)
(actual time=1.945..1.946 rows=1 loops=1)
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
Planning time: 0.080 ms
Execution time: 1.965 ms
(5 rows)
Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned
Cost of retrieving
first row
Average width of a
row (in bytes)
(actual time=1.945..1.946 rows=1 loops=1)
If you use ANALYZE
Number of time your seq scan
(index scan etc.) was executed
Sequential Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10001
- Scan the entire database table.
- Retrieves the rows matching your WHERE.
It can be expensive !
Would an index make this query faster ?
What is an index then?
In an encyclopaedia, if you
want to read about owls,
you don’t read the entire
book, you go to the index
first!
A database index contains
the column value and
pointers to the row that has
this value.
CREATE INDEX ON owl (employer_name);
employer_name
Hogwarts
Hogwarts
Hogwarts
…
Post office
…
Ulule
…
Pointer to
Owl 1
Owl 12
Owl  23
…
Owl m
…
Owl n
…
Index scan
Index Scan using owl_employer_name_idx on owl
(cost=0.29..8.30 rows=1 width=35) (actual
time=0.055..0.056 rows=1 loops=1)
Index Cond: ((employer_name)::text =
'Ulule'::text)
Planning time: 0.191 ms
Execution time: 0.109 ms
The index is visited row by row in order to
retrieve the data corresponding to your clause.
Index scan or sequential scan?
EXPLAIN SELECT * FROM owl
WHERE employer_name = 'post office’;
QUERY PLAN
-------------------------------------------------
Seq Scan on owl (cost=0.00..205.03 rows=7001 width=35)
Filter: ((employer_name)::text = 'post office'::text)
With an index and a really common value !
7000/10 000 owls work at the post office
Why is it using a sequential scan?
An index scan uses the order of the index, the
head has to move between rows.
Moving the read head of the database is 1000
times slower than reading the next physical
block.
Conclusion: For common values it’s quicker to
read all data from the table in physical order
By the way… Retrieving 7000 rows might not be a great idea :).
Bitmap Heap Scan
EXPLAIN ANALYZE SELECT * FROM owl WHERE owl.employer_name = 'Hogwarts';
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on owl (cost=23.50..128.50 rows=2000 width=35)
(actual time=1.524..4.081 rows=2000 loops=1)
Recheck Cond: ((employer_name)::text = 'Hogwarts'::text)
Heap Blocks: exact=79
-> Bitmap Index Scan on owl_employer_name_idx1 (cost=0.00..23.00
rows=2000 width=0) (actual time=1.465..1.465 rows=2000 loops=1)
Index Cond: ((employer_name)::text = 'Hogwarts'::text)
Planning time: 15.642 ms
Execution time: 5.309 ms
(7 rows)
With an index and a common value
2000 owls work at Hogwarts
Bitmap Heap Scan
- The tuple-pointer from index are ordered by physical memory
- Go through the map.
Limit physical jumps between rows.
Recheck condition ? If the bitmap is too big
- The bitmap contains the pages where there are rows
- Goes through the pages, rechecks the condition
So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !
And now let’s join !
Nested loops
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id)
WHERE job.id=1;
QUERY PLAN
-------------------------------------------------------------------
Nested Loop (cost=0.00..296.14 rows=9003 width=56)
(actual time=0.093..4.081 rows=9001 loops=1)
-> Seq Scan on job (cost=0.00..1.09 rows=1 width=21) (actual
time=0.064..0.064 rows=1 loops=1)
Filter: (id = 1)
Rows Removed by Filter: 6
-> Seq Scan on owl (cost=0.00..205.03 rows=9003 width=35)
(actual time=0.015..2.658 rows=9001 loops=1)
Filter: (job_id = 1)
Rows Removed by Filter: 1001
Planning time: 0.188 ms
Execution time: 4.757 ms
Nested loops
Python version
jobs = Job.objects.all()
owls = Owl.objects.all()
for owl in owls:
for job in jobs:
if owl.job_id == job.id:
owl.job = job
break
- Used for little tables
- Complexity of O(n*m)
Hash Join
EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------------
Hash Join (cost=1.17..318.70 rows=10001 width=56) (actual time=0.058..3.830 rows=1000 loops=1)
Hash Cond: (owl.job_id = job.id)
-> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.039..2.170
rows=10002 loops=1)
-> Hash (cost=1.09..1.09 rows=7 width=21) (actual time=0.010..0.010 rows=6 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on job (cost=0.00..1.09 rows=7 width=21) (actual time=0.007..0.009 rows=6
loops=1)
Filter: (id > 1)
Rows Removed by Filter: 1
Planning time: 2.327 ms
Execution time: 3.905 ms
(10 rows)
Hash Join
Python version
jobs = Job.objects.all()
jobs_dict = {}
for job in jobs:
jobs_dict[job.id] = job
owls = Owl.objects.all()
for owl in owls:
owl.job = jobs_dict[owl.job_id]
- Used for small tables
- the hash table has to fit in memory (You
wouldn’t do a Python dictionary with 1M rows ;) )
- If the table is really small, a nested loop is
used because of the complexity of creating a
hash table
Merge Join
EXPLAIN ANALYZE SELECT * FROM letters JOIN human ON (receiver_id = human.id) ORDER BY
letters.receiver_id;
QUERY PLAN
-------------------------------------------------------------
Merge Join (cost=55300.94..62558.10 rows=411861 width=51)
(actual time=264.207..502.215 rows=411861 loops=1)
Merge Cond: (human.id = letters.receiver_id)
-> Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=3.191..4.413 rows=10000 loops=1)
Sort Key: human.id
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.067..1.131
rows=10000 loops=1)
-> Materialize (cost=54406.35..56465.66 rows=411861 width=24) (actual time=261.009..414.465
rows=411861 loops=1)
-> Sort (cost=54406.35..55436.00 rows=411861 width=24) (actual time=261.008..369.777
rows=411861 loops=1)
Sort Key: letters.receiver_id
Sort Method: external merge Disk: 15232kB
-> Seq Scan on letters (cost=0.00..7547.61 rows=411861 width=24) (actual
time=0.010..37.253 rows=411861 loops=1)
Planning time: 0.205 ms
Execution time: 537.119 ms
(13 rows)
Merge Join - 2
Used for big tables, an index can be
used to avoid sorting
Sorted
Human
Lida (id = 1)
Mattie (id = 2)
Cameron (id = 3)
Carol (id = 4)
Maxie (id = 5)
Candy (id = 6)
…
Letters
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 1)
Letter (receiver_id = 2)
Letter (receiver_id = 2)
Letter (receiver_id = 3)
…
Sorted
So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And now, ORDER BY
And now let’s order stuff…
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name;
QUERY PLAN
-------------------------------------------------------------
Sort (cost=894.39..919.39 rows=10000 width=23)
(actual time=163.228..164.211 rows=10000 loops=1)
Sort Key: last_name
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23)
(actual time=14.341..17.593 rows=10000 loops=1)
Planning time: 0.189 ms
Execution time: 164.702 ms
(6 rows)
Everything is sorted in the memory
(which is why it can be costly in terms of memory)
ORDER BY LIMIT
EXPLAIN ANALYZE SELECT * FROM human
ORDER BY last_name LIMIT 3;
QUERY PLAN
---------------------------------------------------------------
Limit (cost=446.10..446.12 rows=10 width=23)
(actual time=11.942..11.944 rows=3 loops=1)
-> Sort (cost=446.10..471.10 rows=10000 width=23)
(actual time=11.942..11.942 rows=3 loops=1)
Sort Key: last_name
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on human (cost=0.00..230.00 rows=10000
width=23) (actual time=0.074..0.947 rows=10000 loops=1)
Planning time: 0.074 ms
Execution time: 11.966 ms
(7 rows)
Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- Else
- If value smaller than current values (for ASC): insert row
in heap, pop last
- Else pass
Top-N heap sort
Example LIMIT 3
Human
id last_name
1 Potter
2 Bailey
3 Acosta
4 Weasley
5 Caroll
… …
Ordering with an index
EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 5;
QUERY PLAN
-------------------------------------------------------------
Limit (cost=0.29..0.70 rows=5 width=23)
(actual time=0.606..0.611 rows=5 loops=1)
-> Index Scan using human_last_name_idx on human
(cost=0.29..834.27 rows=10000 width=23)
(actual time=0.605..0.610 rows=5 loops=1)
Planning time: 0.860 ms
Execution time: 0.808 ms
(4 rows)
Simply uses index order
CREATE INDEX ON human (last_name);
Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You might need an index, only EXPLAIN will tell
you
Let’s get Sirius !
An example a bit more complex
The ministry of magic wants to identify suspect humans.
To do that they ask their DBM (Database Magician) for
- The list of the letters sent by Voldemort
- For each we want the id and date of the reply (if exists)
- All of this with the first and last name of the receiver
Expected result
first_name last_name receiver_id letter_id sent_at answer_id answer_sent_at
Leticia O'Takatakaer 812 577732 2010-03-05 20:18:00
Almeda Steinkingjacer 899 577735 2010-03-09 11:44:00
Sherita Butchchterjac 1444 577744 2010-03-18 09:11:00
Devon Jactakadurmc 1812 577760 2010-04-02 14:38:00
Mariah Mctakachenmc 2000 577733 2010-03-06 18:00:00
Lupita Duro'Otosan 2332 577751 2010-03-24 18:04:00
Chloe Royo'Erfür 2464 577737 2010-03-10 14:46:00
Agustin Erillsancht 2668 577752 2010-03-25 22:57:00
Hiram Ibnchensteinill 2689 577746 2010-03-19 13:55:00
Jayme Erersteinvur 2856 577766 2010-04-08 23:03:00 578834 2010-6-16 6:26:36
The python version
letters_from_voldemort =
Letters.objects.filter(sender_id=3267).select_related('receiver').order_by('sent_at')
letters_to_voldemort = Letters.objects.filter(receiver_id=3267).order_by('sent_at')
data = []
for letter in letters_from_voldemort:
for answer in letters_to_voldemort:
if letter.receiver_id == answer.sender_id and letter.sent_at < answer.sent_at:
answer_found = True
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at, answer.id, answer.sent_at])
break
else:
data.append([letter.receiver.first_name, letter.receiver.last_name,
letter.receiver.id, letter.id, letter.sent_at])
Takes about 1540 ms
Why not have fun with SQL ?
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267
ORDER BY sent_at
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
Oh well, it takes 48 seconds…
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=8579.33..434310.73 rows=40 width=47) (actual time=99.106..48459.498 rows=1067
loops=1)
-> Hash Join (cost=8579.33..8847.23 rows=40 width=39) (actual time=53.903..60.985 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.060..1.303 rows=10000
loops=1)
-> Hash (cost=8578.83..8578.83 rows=40 width=20) (actual time=53.828..53.828 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=8578.33..8578.83 rows=40 width=20) (actual time=53.377..53.580
rows=1067 loops=1)
-> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067
loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual
time=0.939..53.127 rows=1067 loops=1)
Filter: (sender_id = 3267)
Rows Removed by Filter: 410794
-> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067)
-> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346
rows=0 loops=1067)
Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id))
Rows Removed by Filter: 410896
Planning time: 0.859 ms
Execution time: 48460.225 ms
(19 rows)
What is my problem ?
-> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual
time=53.376..53.439 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Seq Scan on letters (cost=0.00..8577.26 rows=40
width=20) (actual time=0.939..53.127 rows=1067 loops=1)
Filter: (sender_id = 3267)
Rows Removed by Filter: 410794
-> Limit (cost=0.00..10636.57 rows=1 width=12) (actual
time=45.356..45.356 rows=0 loops=1067)
-> Seq Scan on letters letters_1 (cost=0.00..10636.57
rows=1 width=12) (actual time=45.346..45.346 rows=0
loops=1067)
Filter: ((sent_at > l1.sent_at) AND (sender_id =
l1.receiver_id) AND (receiver_id = l1.sender_id))
Rows Removed by Filter: 410896
The query planner is using a sequential scan to filter almost our entire table…
So clearly we are missing some indexes on the columns receiver_id, sender_id and sent_at.
Let’s create the following index
CREATE INDEX ON letters (sender_id, sent_at, receiver_id);
Quick reminder on multi column indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
Multi-columns indexes
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
sender_id
1
2
3
4
5
6
sent_at
2010-03-05 20:18:00
2010-03-09 11:44:00
2010-04-02 14:38:00
2010-03-05 20:18:00
2015-03-05 20:18:00
2010-01-05 20:18:00
…
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
The first column of the index is ordered
The index will be used for
SELECT … FROM letters WHERE sender_id=…;
The same is true for
SELECT … FROM letters WHERE sender_id=…
AND sent_at = …;
The order of the columns matters !
receiver_id
2
7
9
3
1
4
…
Pointer to
letter (id=1)
letter (id=11)
letter (id=16)
letter (id=2)
…
…
In our previous query we had
SELECT … FROM letters WHERE sender_id=3267 ORDER BY sent_at
And then in the LATERAL JOIN
SELECT … FROM letters WHERE sender_id = l1.receiver_i
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
The index won’t be used for
SELECT … FROM letters WHERE receiver_id=…;
The index (sender_id, sent_at, receiver_id) works fine
Better no ? 180 times faster than Python
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=154.76..767.34 rows=40 width=47) (actual time=1.092..7.694 rows=1067 loops=1)
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
-> Sort (cost=153.34..153.44 rows=40 width=20) (actual time=0.520..0.630 rows=1067 loops=1)
Sort Key: letters.sent_at
Sort Method: quicksort Memory: 132kB
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304
rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72
rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1
width=12) (actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
Planning time: 0.565 ms
Execution time: 7.804 ms
(20 rows)
Better no ?
-> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1)
Recheck Cond: (sender_id = 3267)
Heap Blocks: exact=74
-> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual
time=0.079..0.079 rows=1067 loops=1)
Index Cond: (sender_id = 3267)
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12)
(actual time=0.003..0.003 rows=0 loops=1067)
Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id))
The index is used where there
were sequential scans.
Focusing on this has join…
-> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1)
Hash Cond: (human.id = l1.receiver_id)
-> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1)
-> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1)
Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB
-> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1)
A hash is built from the subquery (1067 rows)
Human table having 10 000 rows
human
Human 1
Human 2
…
Human 22
…
Human 114
hash table
(receiver_id)
22
94
104
114
125
…
Rows
row 981
row 801
row 132
row 203
row 42
row 12
row 26
row 1012
…
Let’s use a pagination…
SELECT human.first_name, human.last_name, receiver_id, letter_id,
sent_at, answer_id, answer_sent_at
FROM (
SELECT
id as letter_id, receiver_id, sent_at, sender_id
FROM letters
WHERE
sender_id=3267 AND sent_at > '2010-03-22'
ORDER BY sent_at LIMIT 20
) l1 LEFT JOIN LATERAL (
SELECT
id as answer_id,
sent_at as answer_sent_at
FROM letters
WHERE
sender_id = l1.receiver_id
AND sent_at > l1.sent_at
AND receiver_id = l1.sender_id
LIMIT 1
) l2 ON true JOIN human ON (human.id=receiver_id)
Let’s use a pagination…
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.13..417.82 rows=20 width=47) (actual time=0.049..0.365 rows=20 loops=1)
-> Nested Loop Left Join (cost=0.84..255.57 rows=20 width=28) (actual time=0.040..0.231 rows=20
loops=1)
-> Limit (cost=0.42..82.82 rows=20 width=20) (actual time=0.022..0.038 rows=20 loops=1)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters
(cost=0.42..165.22 rows=40 width=20) (actual time=0.020..0.035 rows=20 loops=1)
Index Cond: ((sender_id = 3267) AND (sent_at > '2010-03-22 00:00:00+01'::timestamp
with time zone))
-> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=20)
-> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1
(cost=0.42..8.61 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=20)
Index Cond: ((sender_id = letters.receiver_id) AND (sent_at > letters.sent_at) AND
(receiver_id = letters.sender_id))
-> Index Scan using humain_pkey on human (cost=0.29..8.10 rows=1 width=23) (actual time=0.006..0.006
rows=1 loops=20)
Index Cond: (id = letters.receiver_id)
Planning time: 0.709 ms
Execution time: 0.436 ms
(12 rows)
Conclusion
Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
To go further - sources
https://momjian.us/main/writings/pgsql/optimizer.pdf
https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations
http://tech.novapost.fr/postgresql-application_name-django-settings.html
Slide used with the owls ans paintings from
https://www.instagram.com/zimmoriarty
1 sur 60

Recommandé

Python profiling par
Python profilingPython profiling
Python profilingdreampuf
1.7K vues44 diapositives
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn par
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnGoptuna Distributed Bayesian Optimization Framework at Go Conference 2019 Autumn
Goptuna Distributed Bayesian Optimization Framework at Go Conference 2019 AutumnMasashi Shibata
3.7K vues57 diapositives
밑바닥부터 시작하는 의료 AI par
밑바닥부터 시작하는 의료 AI밑바닥부터 시작하는 의료 AI
밑바닥부터 시작하는 의료 AINAVER Engineering
798 vues178 diapositives
Slides par
SlidesSlides
SlidesMohamed Mustaq Ahmed
141 vues49 diapositives
How fast ist it really? Benchmarking in practice par
How fast ist it really? Benchmarking in practiceHow fast ist it really? Benchmarking in practice
How fast ist it really? Benchmarking in practiceTobias Pfeiffer
4.7K vues140 diapositives
R part iii par
R part iiiR part iii
R part iiiRuru Chowdhury
528 vues9 diapositives

Contenu connexe

Tendances

Palestra sobre Collections com Python par
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Pythonpugpe
1.5K vues48 diapositives
[131]해커의 관점에서 바라보기 par
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기NAVER D2
12.2K vues101 diapositives
The Ring programming language version 1.5.2 book - Part 45 of 181 par
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
18 vues10 diapositives
R intro 20140716-advance par
R intro 20140716-advanceR intro 20140716-advance
R intro 20140716-advanceKevin Chun-Hsien Hsu
443 vues45 diapositives
PythonOOP par
PythonOOPPythonOOP
PythonOOPVeera Pendyala
908 vues157 diapositives
A tour of Python par
A tour of PythonA tour of Python
A tour of PythonAleksandar Veselinovic
1.1K vues42 diapositives

Tendances(20)

Palestra sobre Collections com Python par pugpe
Palestra sobre Collections com PythonPalestra sobre Collections com Python
Palestra sobre Collections com Python
pugpe1.5K vues
[131]해커의 관점에서 바라보기 par NAVER D2
[131]해커의 관점에서 바라보기[131]해커의 관점에서 바라보기
[131]해커의 관점에서 바라보기
NAVER D212.2K vues
The Ring programming language version 1.5.2 book - Part 45 of 181 par Mahmoud Samir Fayed
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181
yt: An Analysis and Visualization System for Astrophysical Simulation Data par John ZuHone
yt: An Analysis and Visualization System for Astrophysical Simulation Datayt: An Analysis and Visualization System for Astrophysical Simulation Data
yt: An Analysis and Visualization System for Astrophysical Simulation Data
John ZuHone1.9K vues
High performance GPU computing with Ruby RubyConf 2017 par Prasun Anand
High performance GPU computing with Ruby  RubyConf 2017High performance GPU computing with Ruby  RubyConf 2017
High performance GPU computing with Ruby RubyConf 2017
Prasun Anand471 vues
The Ring programming language version 1.10 book - Part 56 of 212 par Mahmoud Samir Fayed
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212
Elixir & Phoenix – fast, concurrent and explicit par Tobias Pfeiffer
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicit
Tobias Pfeiffer5.8K vues
Elixir & Phoenix – fast, concurrent and explicit par Tobias Pfeiffer
Elixir & Phoenix – fast, concurrent and explicitElixir & Phoenix – fast, concurrent and explicit
Elixir & Phoenix – fast, concurrent and explicit
Tobias Pfeiffer2.1K vues
30 分鐘學會實作 Python Feature Selection par James Huang
30 分鐘學會實作 Python Feature Selection30 分鐘學會實作 Python Feature Selection
30 分鐘學會實作 Python Feature Selection
James Huang404 vues
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지 par Jeongah Shin
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
[Droid knights 2019] Tensorflow Lite 부터 ML Kit, Mobile GPU 활용 까지
Jeongah Shin840 vues
Advanced Python, Part 2 par Zaar Hai
Advanced Python, Part 2Advanced Python, Part 2
Advanced Python, Part 2
Zaar Hai957 vues

Similaire à Becoming a better developer with EXPLAIN

The amazing world behind your ORM par
The amazing world behind your ORMThe amazing world behind your ORM
The amazing world behind your ORMLouise Grandjonc
1K vues51 diapositives
Conf orm - explain par
Conf orm - explainConf orm - explain
Conf orm - explainLouise Grandjonc
841 vues52 diapositives
Postgres can do THAT? par
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
217 vues77 diapositives
A Deeper Dive into EXPLAIN par
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
176 vues33 diapositives
Effective Numerical Computation in NumPy and SciPy par
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
8.8K vues35 diapositives
Profiling and optimization par
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
1.3K vues43 diapositives

Similaire à Becoming a better developer with EXPLAIN(20)

A Deeper Dive into EXPLAIN par EDB
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
EDB176 vues
Effective Numerical Computation in NumPy and SciPy par Kimikazu Kato
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
Kimikazu Kato8.8K vues
Profiling and optimization par g3_nittala
Profiling and optimizationProfiling and optimization
Profiling and optimization
g3_nittala1.3K vues
Pygrunn 2012 down the rabbit - profiling in python par Remco Wendt
Pygrunn 2012   down the rabbit - profiling in pythonPygrunn 2012   down the rabbit - profiling in python
Pygrunn 2012 down the rabbit - profiling in python
Remco Wendt808 vues
PyCon 2010 SQLAlchemy tutorial par jbellis
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
jbellis5.4K vues
Problem 1 Show the comparison of runtime of linear search and binar.pdf par ebrahimbadushata00
Problem 1 Show the comparison of runtime of linear search and binar.pdfProblem 1 Show the comparison of runtime of linear search and binar.pdf
Problem 1 Show the comparison of runtime of linear search and binar.pdf
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course par PROIDEA
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
PROIDEA52 vues
Down the rabbit hole, profiling in Django par Remco Wendt
Down the rabbit hole, profiling in DjangoDown the rabbit hole, profiling in Django
Down the rabbit hole, profiling in Django
Remco Wendt3.2K vues
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres... par Rainer Schuettengruber
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
2018 db-rainer schuettengruber-beating-oracles_optimizer_at_its_own_game-pres...
Gotcha! Ruby things that will come back to bite you. par David Tollmyr
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
David Tollmyr1K vues

Plus de Louise Grandjonc

Postgres index types par
Postgres index typesPostgres index types
Postgres index typesLouise Grandjonc
1.5K vues45 diapositives
Amazing SQL your django ORM can or can't do par
Amazing SQL your django ORM can or can't doAmazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doLouise Grandjonc
2.3K vues42 diapositives
Croco talk pgconfeu par
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeuLouise Grandjonc
1.2K vues68 diapositives
Indexes in postgres par
Indexes in postgresIndexes in postgres
Indexes in postgresLouise Grandjonc
1.3K vues64 diapositives
Pg exercices par
Pg exercicesPg exercices
Pg exercicesLouise Grandjonc
398 vues13 diapositives
Meetup pg recherche fulltext ES -> PG par
Meetup pg recherche fulltext ES -> PGMeetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGLouise Grandjonc
872 vues18 diapositives

Dernier

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
91 vues52 diapositives
The Power of Heat Decarbonisation Plans in the Built Environment par
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
84 vues20 diapositives
CryptoBotsAI par
CryptoBotsAICryptoBotsAI
CryptoBotsAIchandureddyvadala199
42 vues5 diapositives
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」 par
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PC Cluster Consortium
25 vues12 diapositives
Qualifying SaaS, IaaS.pptx par
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1.1K vues8 diapositives
"Node.js Development in 2024: trends and tools", Nikita Galkin par
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin Fwdays
33 vues38 diapositives

Dernier(20)

Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... par The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Power of Heat Decarbonisation Plans in the Built Environment par IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 vues
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」 par PC Cluster Consortium
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
"Node.js Development in 2024: trends and tools", Nikita Galkin par Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 vues
Initiating and Advancing Your Strategic GIS Governance Strategy par Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 vues
What is Authentication Active Directory_.pptx par HeenaMehta35
What is Authentication Active Directory_.pptxWhat is Authentication Active Directory_.pptx
What is Authentication Active Directory_.pptx
HeenaMehta3515 vues
NTGapps NTG LowCode Platform par Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu437 vues
The Coming AI Tsunami.pptx par johnhandby
The Coming AI Tsunami.pptxThe Coming AI Tsunami.pptx
The Coming AI Tsunami.pptx
johnhandby13 vues
Discover Aura Workshop (12.5.23).pdf par Neo4j
Discover Aura Workshop (12.5.23).pdfDiscover Aura Workshop (12.5.23).pdf
Discover Aura Workshop (12.5.23).pdf
Neo4j15 vues
AI + Memoori = AIM par Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori14 vues
Business Analyst Series 2023 - Week 4 Session 7 par DianaGray10
Business Analyst Series 2023 -  Week 4 Session 7Business Analyst Series 2023 -  Week 4 Session 7
Business Analyst Series 2023 - Week 4 Session 7
DianaGray10146 vues
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 par BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 vues
The Power of Generative AI in Accelerating No Code Adoption.pdf par Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
"Surviving highload with Node.js", Andrii Shumada par Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 vues
"Running students' code in isolation. The hard way", Yurii Holiuk par Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 vues

Becoming a better developer with EXPLAIN

  • 1. Becoming a better developer with explain Understanding Postgres Query planner Louise Grandjonc
  • 2. Louise Grandjonc (louise@ulule.com) Lead developer at Ulule (www.ulule.com) Python / Django developer - Postgres enthusiast @louisemeta on twitter About me
  • 3. 1. Can understanding the query plan help me? 2. What if I’m using an ORM? 3. So what does my database do when I query stuff? Today’s agenda
  • 4. Can understanding the query plan help me?
  • 5. The mystery of querying stuff You Query SELECT … FROM … id name 1 Louise 2 Alfred … … Result
  • 6. Zooming into it SQL Query SELECT … FROM … id name 1 Louise 2 Alfred … … Result Parse Planner Optimizer Execute - Generates execution plans for a query - Calculates the cost of each plan. - The best one is used to execute your query
  • 7. - Understand why is your filter / join / order slow. - Know if the new index you just added is used. - Stop guessing which index can be useful In conclusion: You will understand why your query is slow So what can I learn from the query plan
  • 8. What if I’m using an ORM ?
  • 9. But why can’t we trust it ? 1.The ORM executes queries that you might not expect 2.Your queries might not be optimised and you won’t know about it
  • 10. The story of the owls Owl RecipientRecipientJob Letters Human * 1 1* 1* id sender_id receiver_id sent_at delivered_by id first_name last_name id name employer_name feather_color favourite_food id name 10 002 owls 10 000 humans 411861 letters
  • 11. Loops (with django) SELECT id, name, employer_name, favourite_food, job_id, feather_color FROM owl WHERE employer_name = 'Hogwarts' SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 SELECT id, name FROM job WHERE id = 1 … owls = Owl.objects.filter(employer_name=‘Hogwarts’) for owl in owls: print(owl.job) # 1 query per loop iteration Awesome right ?
  • 12. Loops (with django) Owl.objects.filter(employer_name=‘Ulule’) .select_related(‘job’) SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" = "job"."id") WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "job" WHERE "job"."id" IN (2) Owl.objects.filter(employer_name=‘Ulule’) .prefetch_related(‘job’)
  • 13. Where are my logs? owl_conference=# show log_directory ; log_directory --------------- pg_log owl_conference=# show data_directory ; data_directory ------------------------- /usr/local/var/postgres owl_conference=# show log_filename ; log_filename ------------------------- postgresql-%Y-%m-%d.log Terminal command $ psql -U user -d your_database_name psql interface
  • 14. Having good looking logs (and logging everything like a crazy owl) owl_conference=# SHOW config_file; config_file ----------------------------------------- /usr/local/var/postgres/postgresql.conf (1 row) In your postgresql.conf log_filename = 'postgresql-%Y-%m-%d.log' log_statement = 'all' logging_collector = on log_min_duration_statement = 0
  • 15. Finding dirty queries pg_stat_statements In your psql CREATE EXTENSION pg_stat_statements; The module must be added to your shared_preload_libraries. You have to change your postgresql.conf and restart. shared_preload_libraries = 'pg_stat_statements' pg_stat_statements.max = 10000 pg_stat_statements.track = all
  • 16. Finding the painful queries SELECT total_time, min_time, max_time, mean_time, calls, query FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 100; -[ RECORD 6 ]--------------------------------------------------------- total_time | 519.491 min_time | 519.491 max_time | 519.491 mean_time | 519.491 calls | 1 query | SELECT COUNT(*) FROM letters;
  • 17. What is my database doing when I query?
  • 18. Let’s talk about EXPLAIN !
  • 19. What is EXPLAIN Gives you the execution plan chosen by the query planner that your database will use to execute your SQL statement Using ANALYZE will actually execute your query! (Don’t worry, you can ROLLBACK) EXPLAIN (ANALYZE) my super query; BEGIN; EXPLAIN ANALYZE UPDATE owl SET … WHERE …; ROLLBACK;
  • 20. So, what does it took like ? EXPLAIN ANALYZE SELECT * FROM owl WHERE employer_name=‘Ulule'; QUERY PLAN ------------------------------------- Seq Scan on owl (cost=0.00..205.01 rows=1 width=35) (actual time=1.945..1.946 rows=1 loops=1) Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10001 Planning time: 0.080 ms Execution time: 1.965 ms (5 rows)
  • 21. Let’s go step by step ! .. 1 Costs (cost=0.00..205.01 rows=1 width=35) Cost of retrieving all rows Number of rows returned Cost of retrieving first row Average width of a row (in bytes) (actual time=1.945..1.946 rows=1 loops=1) If you use ANALYZE Number of time your seq scan (index scan etc.) was executed
  • 22. Sequential Scan Seq Scan on owl ... Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10001 - Scan the entire database table. - Retrieves the rows matching your WHERE. It can be expensive ! Would an index make this query faster ?
  • 23. What is an index then? In an encyclopaedia, if you want to read about owls, you don’t read the entire book, you go to the index first! A database index contains the column value and pointers to the row that has this value. CREATE INDEX ON owl (employer_name); employer_name Hogwarts Hogwarts Hogwarts … Post office … Ulule … Pointer to Owl 1 Owl 12 Owl  23 … Owl m … Owl n …
  • 24. Index scan Index Scan using owl_employer_name_idx on owl (cost=0.29..8.30 rows=1 width=35) (actual time=0.055..0.056 rows=1 loops=1) Index Cond: ((employer_name)::text = 'Ulule'::text) Planning time: 0.191 ms Execution time: 0.109 ms The index is visited row by row in order to retrieve the data corresponding to your clause.
  • 25. Index scan or sequential scan? EXPLAIN SELECT * FROM owl WHERE employer_name = 'post office’; QUERY PLAN ------------------------------------------------- Seq Scan on owl (cost=0.00..205.03 rows=7001 width=35) Filter: ((employer_name)::text = 'post office'::text) With an index and a really common value ! 7000/10 000 owls work at the post office
  • 26. Why is it using a sequential scan? An index scan uses the order of the index, the head has to move between rows. Moving the read head of the database is 1000 times slower than reading the next physical block. Conclusion: For common values it’s quicker to read all data from the table in physical order By the way… Retrieving 7000 rows might not be a great idea :).
  • 27. Bitmap Heap Scan EXPLAIN ANALYZE SELECT * FROM owl WHERE owl.employer_name = 'Hogwarts'; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on owl (cost=23.50..128.50 rows=2000 width=35) (actual time=1.524..4.081 rows=2000 loops=1) Recheck Cond: ((employer_name)::text = 'Hogwarts'::text) Heap Blocks: exact=79 -> Bitmap Index Scan on owl_employer_name_idx1 (cost=0.00..23.00 rows=2000 width=0) (actual time=1.465..1.465 rows=2000 loops=1) Index Cond: ((employer_name)::text = 'Hogwarts'::text) Planning time: 15.642 ms Execution time: 5.309 ms (7 rows) With an index and a common value 2000 owls work at Hogwarts
  • 28. Bitmap Heap Scan - The tuple-pointer from index are ordered by physical memory - Go through the map. Limit physical jumps between rows. Recheck condition ? If the bitmap is too big - The bitmap contains the pages where there are rows - Goes through the pages, rechecks the condition
  • 29. So we have 3 types of scan 1. Sequential scan 2. Index scan 3. Bitmap heap scan And now let’s join stuff !
  • 30. And now let’s join ! Nested loops EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id=1; QUERY PLAN ------------------------------------------------------------------- Nested Loop (cost=0.00..296.14 rows=9003 width=56) (actual time=0.093..4.081 rows=9001 loops=1) -> Seq Scan on job (cost=0.00..1.09 rows=1 width=21) (actual time=0.064..0.064 rows=1 loops=1) Filter: (id = 1) Rows Removed by Filter: 6 -> Seq Scan on owl (cost=0.00..205.03 rows=9003 width=35) (actual time=0.015..2.658 rows=9001 loops=1) Filter: (job_id = 1) Rows Removed by Filter: 1001 Planning time: 0.188 ms Execution time: 4.757 ms
  • 31. Nested loops Python version jobs = Job.objects.all() owls = Owl.objects.all() for owl in owls: for job in jobs: if owl.job_id == job.id: owl.job = job break - Used for little tables - Complexity of O(n*m)
  • 32. Hash Join EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id > 1; QUERY PLAN ------------------------------------------------------------------------------------------------- Hash Join (cost=1.17..318.70 rows=10001 width=56) (actual time=0.058..3.830 rows=1000 loops=1) Hash Cond: (owl.job_id = job.id) -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.039..2.170 rows=10002 loops=1) -> Hash (cost=1.09..1.09 rows=7 width=21) (actual time=0.010..0.010 rows=6 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on job (cost=0.00..1.09 rows=7 width=21) (actual time=0.007..0.009 rows=6 loops=1) Filter: (id > 1) Rows Removed by Filter: 1 Planning time: 2.327 ms Execution time: 3.905 ms (10 rows)
  • 33. Hash Join Python version jobs = Job.objects.all() jobs_dict = {} for job in jobs: jobs_dict[job.id] = job owls = Owl.objects.all() for owl in owls: owl.job = jobs_dict[owl.job_id] - Used for small tables - the hash table has to fit in memory (You wouldn’t do a Python dictionary with 1M rows ;) ) - If the table is really small, a nested loop is used because of the complexity of creating a hash table
  • 34. Merge Join EXPLAIN ANALYZE SELECT * FROM letters JOIN human ON (receiver_id = human.id) ORDER BY letters.receiver_id; QUERY PLAN ------------------------------------------------------------- Merge Join (cost=55300.94..62558.10 rows=411861 width=51) (actual time=264.207..502.215 rows=411861 loops=1) Merge Cond: (human.id = letters.receiver_id) -> Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=3.191..4.413 rows=10000 loops=1) Sort Key: human.id Sort Method: quicksort Memory: 1166kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.067..1.131 rows=10000 loops=1) -> Materialize (cost=54406.35..56465.66 rows=411861 width=24) (actual time=261.009..414.465 rows=411861 loops=1) -> Sort (cost=54406.35..55436.00 rows=411861 width=24) (actual time=261.008..369.777 rows=411861 loops=1) Sort Key: letters.receiver_id Sort Method: external merge Disk: 15232kB -> Seq Scan on letters (cost=0.00..7547.61 rows=411861 width=24) (actual time=0.010..37.253 rows=411861 loops=1) Planning time: 0.205 ms Execution time: 537.119 ms (13 rows)
  • 35. Merge Join - 2 Used for big tables, an index can be used to avoid sorting Sorted Human Lida (id = 1) Mattie (id = 2) Cameron (id = 3) Carol (id = 4) Maxie (id = 5) Candy (id = 6) … Letters Letter (receiver_id = 1) Letter (receiver_id = 1) Letter (receiver_id = 1) Letter (receiver_id = 2) Letter (receiver_id = 2) Letter (receiver_id = 3) … Sorted
  • 36. So we have 3 types of joins 1. Nested loop 2. Hash join 3. Merge join And now, ORDER BY
  • 37. And now let’s order stuff… EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name; QUERY PLAN ------------------------------------------------------------- Sort (cost=894.39..919.39 rows=10000 width=23) (actual time=163.228..164.211 rows=10000 loops=1) Sort Key: last_name Sort Method: quicksort Memory: 1166kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=14.341..17.593 rows=10000 loops=1) Planning time: 0.189 ms Execution time: 164.702 ms (6 rows) Everything is sorted in the memory (which is why it can be costly in terms of memory)
  • 38. ORDER BY LIMIT EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 3; QUERY PLAN --------------------------------------------------------------- Limit (cost=446.10..446.12 rows=10 width=23) (actual time=11.942..11.944 rows=3 loops=1) -> Sort (cost=446.10..471.10 rows=10000 width=23) (actual time=11.942..11.942 rows=3 loops=1) Sort Key: last_name Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.074..0.947 rows=10000 loops=1) Planning time: 0.074 ms Execution time: 11.966 ms (7 rows) Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
  • 39. Top-N heap sort - A heap (sort of tree) is used with a limited size - For each row - If heap not full: add row in heap - Else - If value smaller than current values (for ASC): insert row in heap, pop last - Else pass
  • 40. Top-N heap sort Example LIMIT 3 Human id last_name 1 Potter 2 Bailey 3 Acosta 4 Weasley 5 Caroll … …
  • 41. Ordering with an index EXPLAIN ANALYZE SELECT * FROM human ORDER BY last_name LIMIT 5; QUERY PLAN ------------------------------------------------------------- Limit (cost=0.29..0.70 rows=5 width=23) (actual time=0.606..0.611 rows=5 loops=1) -> Index Scan using human_last_name_idx on human (cost=0.29..834.27 rows=10000 width=23) (actual time=0.605..0.610 rows=5 loops=1) Planning time: 0.860 ms Execution time: 0.808 ms (4 rows) Simply uses index order CREATE INDEX ON human (last_name);
  • 42. Be careful when you ORDER BY ! 1. Sorting with sort key without limit or index can be heavy in term of memory ! 2. You might need an index, only EXPLAIN will tell you
  • 44. An example a bit more complex The ministry of magic wants to identify suspect humans. To do that they ask their DBM (Database Magician) for - The list of the letters sent by Voldemort - For each we want the id and date of the reply (if exists) - All of this with the first and last name of the receiver
  • 45. Expected result first_name last_name receiver_id letter_id sent_at answer_id answer_sent_at Leticia O'Takatakaer 812 577732 2010-03-05 20:18:00 Almeda Steinkingjacer 899 577735 2010-03-09 11:44:00 Sherita Butchchterjac 1444 577744 2010-03-18 09:11:00 Devon Jactakadurmc 1812 577760 2010-04-02 14:38:00 Mariah Mctakachenmc 2000 577733 2010-03-06 18:00:00 Lupita Duro'Otosan 2332 577751 2010-03-24 18:04:00 Chloe Royo'Erfür 2464 577737 2010-03-10 14:46:00 Agustin Erillsancht 2668 577752 2010-03-25 22:57:00 Hiram Ibnchensteinill 2689 577746 2010-03-19 13:55:00 Jayme Erersteinvur 2856 577766 2010-04-08 23:03:00 578834 2010-6-16 6:26:36
  • 46. The python version letters_from_voldemort = Letters.objects.filter(sender_id=3267).select_related('receiver').order_by('sent_at') letters_to_voldemort = Letters.objects.filter(receiver_id=3267).order_by('sent_at') data = [] for letter in letters_from_voldemort: for answer in letters_to_voldemort: if letter.receiver_id == answer.sender_id and letter.sent_at < answer.sent_at: answer_found = True data.append([letter.receiver.first_name, letter.receiver.last_name, letter.receiver.id, letter.id, letter.sent_at, answer.id, answer.sent_at]) break else: data.append([letter.receiver.first_name, letter.receiver.last_name, letter.receiver.id, letter.id, letter.sent_at]) Takes about 1540 ms
  • 47. Why not have fun with SQL ? SELECT human.first_name, human.last_name, receiver_id, letter_id, sent_at, answer_id, answer_sent_at FROM ( SELECT id as letter_id, receiver_id, sent_at, sender_id FROM letters WHERE sender_id=3267 ORDER BY sent_at ) l1 LEFT JOIN LATERAL ( SELECT id as answer_id, sent_at as answer_sent_at FROM letters WHERE sender_id = l1.receiver_id AND sent_at > l1.sent_at AND receiver_id = l1.sender_id LIMIT 1 ) l2 ON true JOIN human ON (human.id=receiver_id)
  • 48. Oh well, it takes 48 seconds… QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=8579.33..434310.73 rows=40 width=47) (actual time=99.106..48459.498 rows=1067 loops=1) -> Hash Join (cost=8579.33..8847.23 rows=40 width=39) (actual time=53.903..60.985 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.060..1.303 rows=10000 loops=1) -> Hash (cost=8578.83..8578.83 rows=40 width=20) (actual time=53.828..53.828 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=8578.33..8578.83 rows=40 width=20) (actual time=53.377..53.580 rows=1067 loops=1) -> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual time=0.939..53.127 rows=1067 loops=1) Filter: (sender_id = 3267) Rows Removed by Filter: 410794 -> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067) -> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346 rows=0 loops=1067) Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id)) Rows Removed by Filter: 410896 Planning time: 0.859 ms Execution time: 48460.225 ms (19 rows)
  • 49. What is my problem ? -> Sort (cost=8578.33..8578.43 rows=40 width=20) (actual time=53.376..53.439 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Seq Scan on letters (cost=0.00..8577.26 rows=40 width=20) (actual time=0.939..53.127 rows=1067 loops=1) Filter: (sender_id = 3267) Rows Removed by Filter: 410794 -> Limit (cost=0.00..10636.57 rows=1 width=12) (actual time=45.356..45.356 rows=0 loops=1067) -> Seq Scan on letters letters_1 (cost=0.00..10636.57 rows=1 width=12) (actual time=45.346..45.346 rows=0 loops=1067) Filter: ((sent_at > l1.sent_at) AND (sender_id = l1.receiver_id) AND (receiver_id = l1.sender_id)) Rows Removed by Filter: 410896 The query planner is using a sequential scan to filter almost our entire table… So clearly we are missing some indexes on the columns receiver_id, sender_id and sent_at.
  • 50. Let’s create the following index CREATE INDEX ON letters (sender_id, sent_at, receiver_id); Quick reminder on multi column indexes sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … …
  • 51. Multi-columns indexes sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … sender_id 1 2 3 4 5 6 sent_at 2010-03-05 20:18:00 2010-03-09 11:44:00 2010-04-02 14:38:00 2010-03-05 20:18:00 2015-03-05 20:18:00 2010-01-05 20:18:00 … receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … The first column of the index is ordered The index will be used for SELECT … FROM letters WHERE sender_id=…; The same is true for SELECT … FROM letters WHERE sender_id=… AND sent_at = …;
  • 52. The order of the columns matters ! receiver_id 2 7 9 3 1 4 … Pointer to letter (id=1) letter (id=11) letter (id=16) letter (id=2) … … In our previous query we had SELECT … FROM letters WHERE sender_id=3267 ORDER BY sent_at And then in the LATERAL JOIN SELECT … FROM letters WHERE sender_id = l1.receiver_i AND sent_at > l1.sent_at AND receiver_id = l1.sender_id The index won’t be used for SELECT … FROM letters WHERE receiver_id=…; The index (sender_id, sent_at, receiver_id) works fine
  • 53. Better no ? 180 times faster than Python QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Nested Loop Left Join (cost=154.76..767.34 rows=40 width=47) (actual time=1.092..7.694 rows=1067 loops=1) -> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1) -> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1) -> Sort (cost=153.34..153.44 rows=40 width=20) (actual time=0.520..0.630 rows=1067 loops=1) Sort Key: letters.sent_at Sort Method: quicksort Memory: 132kB -> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1) Recheck Cond: (sender_id = 3267) Heap Blocks: exact=74 -> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1) Index Cond: (sender_id = 3267) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1067) Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id)) Planning time: 0.565 ms Execution time: 7.804 ms (20 rows)
  • 54. Better no ? -> Bitmap Heap Scan on letters (cost=4.73..152.27 rows=40 width=20) (actual time=0.089..0.304 rows=1067 loops=1) Recheck Cond: (sender_id = 3267) Heap Blocks: exact=74 -> Bitmap Index Scan on letters_sender_id_sent_at_receiver_id_idx (cost=0.00..4.72 rows=40 width=0) (actual time=0.079..0.079 rows=1067 loops=1) Index Cond: (sender_id = 3267) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.004..0.004 rows=0 loops=1067) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.003..0.003 rows=0 loops=1067) Index Cond: ((sender_id = l1.receiver_id) AND (sent_at > l1.sent_at) AND (receiver_id = l1.sender_id)) The index is used where there were sequential scans.
  • 55. Focusing on this has join… -> Hash Join (cost=154.34..422.24 rows=40 width=39) (actual time=1.080..3.471 rows=1067 loops=1) Hash Cond: (human.id = l1.receiver_id) -> Seq Scan on human (cost=0.00..230.00 rows=10000 width=23) (actual time=0.076..0.797 rows=10000 loops=1) -> Hash (cost=153.84..153.84 rows=40 width=20) (actual time=0.992..0.992 rows=1067 loops=1) Buckets: 2048 (originally 1024) Batches: 1 (originally 1) Memory Usage: 71kB -> Subquery Scan on l1 (cost=153.34..153.84 rows=40 width=20) (actual time=0.520..0.813 rows=1067 loops=1) A hash is built from the subquery (1067 rows) Human table having 10 000 rows human Human 1 Human 2 … Human 22 … Human 114 hash table (receiver_id) 22 94 104 114 125 … Rows row 981 row 801 row 132 row 203 row 42 row 12 row 26 row 1012 …
  • 56. Let’s use a pagination… SELECT human.first_name, human.last_name, receiver_id, letter_id, sent_at, answer_id, answer_sent_at FROM ( SELECT id as letter_id, receiver_id, sent_at, sender_id FROM letters WHERE sender_id=3267 AND sent_at > '2010-03-22' ORDER BY sent_at LIMIT 20 ) l1 LEFT JOIN LATERAL ( SELECT id as answer_id, sent_at as answer_sent_at FROM letters WHERE sender_id = l1.receiver_id AND sent_at > l1.sent_at AND receiver_id = l1.sender_id LIMIT 1 ) l2 ON true JOIN human ON (human.id=receiver_id)
  • 57. Let’s use a pagination… QUERY PLAN ---------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.13..417.82 rows=20 width=47) (actual time=0.049..0.365 rows=20 loops=1) -> Nested Loop Left Join (cost=0.84..255.57 rows=20 width=28) (actual time=0.040..0.231 rows=20 loops=1) -> Limit (cost=0.42..82.82 rows=20 width=20) (actual time=0.022..0.038 rows=20 loops=1) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters (cost=0.42..165.22 rows=40 width=20) (actual time=0.020..0.035 rows=20 loops=1) Index Cond: ((sender_id = 3267) AND (sent_at > '2010-03-22 00:00:00+01'::timestamp with time zone)) -> Limit (cost=0.42..8.61 rows=1 width=12) (actual time=0.009..0.009 rows=0 loops=20) -> Index Scan using letters_sender_id_sent_at_receiver_id_idx on letters letters_1 (cost=0.42..8.61 rows=1 width=12) (actual time=0.008..0.008 rows=0 loops=20) Index Cond: ((sender_id = letters.receiver_id) AND (sent_at > letters.sent_at) AND (receiver_id = letters.sender_id)) -> Index Scan using humain_pkey on human (cost=0.29..8.10 rows=1 width=23) (actual time=0.006..0.006 rows=1 loops=20) Index Cond: (id = letters.receiver_id) Planning time: 0.709 ms Execution time: 0.436 ms (12 rows)
  • 59. Thank you for your attention ! Any questions? Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
  • 60. To go further - sources https://momjian.us/main/writings/pgsql/optimizer.pdf https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations http://tech.novapost.fr/postgresql-application_name-django-settings.html Slide used with the owls ans paintings from https://www.instagram.com/zimmoriarty