The amazing world behind your ORM

The amazing world
behind your ORM
Louise Grandjonc

Louise Grandjonc (louise@ulule.com)
Lead developer at Ulule (www.ulule.com)
Django developer - Postgres enthusiast
@louisemeta on twitter
About me

1. How do we end up with performance problems?
2. How can we catch the performance problems without
having to guess?
3. What does it change in our everyday developer job?
Today’s agenda

How do we end up with performance
problems?

1.The ORM executes queries that you might not expect
2.Your queries might not be optimised and you won’t
know about it
Why we should know what our
ORM is doing

How can we catch the performance
problems (without having to guess)?

How can I see what is happening
when I do stuff?
1. Django debug toolbar (to see queries and their explain in your django view)
Advantages: can be easily included in your django templates
Problems: Does not allow you to see everything (ajax calls !), if you’re working on
an API, you cannot use it!
2. Django devserver : puts all the logs of your database into your runserver output
Advantages: you’re not missing the ajax calls
3. Simply look at your database logs
Advantages: you can see everything, you won’t be disturbed if you ever change
project/programming languages/framework/computer, you can configure how you see
your logs
Problems: you don’t know where your logs are?

Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_directory ;
data_directory
-------------------------
/usr/local/var/postgres
owl_conference=# show log_filename ;
log_filename
-------------------------
postgresql-%Y-%m-%d.log
Terminal command
$ psql -U user -d your_database_name
psql interface

Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
-----------------------------------------
/usr/local/var/postgres/postgresql.conf
(1 row)
In your postgresql.conf
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'
logging_collector = on
log_min_duration_statement = 0

I’ve seen my logs… But …
Where are these queries executed in my code?
Let’s take an example…
I have an owl DB with two tables.
10 000 owls 7 jobs

Example
Query executed in Template
def index(request):
owls = Owl.objects.filter(employer_name=‘Ulule’)
context = {‘owls': owls}
return render(request, 'owls/index.html', context)
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
{% for owl in owls %}
<p> {{ owl.name }} </p>
{% end for %}

Example
Query executed in View
def index(request):
for owl in owls:
# Do something
context = {‘owls': owls}
return render(request, 'owls/index.html', context)
{% for owl in owls %}
<p> {{ owl.name }} </p>
{% end for %}

Yep ! I’ve seen my logs… But …
Where are these queries executed in my code?
How to spot where your query is executed?
1. Each model has a table to store data.
Find the model.
2. Where in my view, or in my form am I
using this model to get/ﬁlter objects?
3. Where am I using this objects? Is it in my
view/form? Passed into the context and
used in templates?

What does it change in our everyday
developer job?
(Or how to really do something when you have a problem)

The two most common
problems of any developer…
1. I have way too many queries… Why ?
2. One of my queries is freakin' slow… Why?

Once upon a time… 1000 times
The danger of loops in your code, and how your templates
are making fun of you…
1. Use your context !
2. Preload stuff in the query!
• prefetch_related() - ManyToMany or ForeignKey
• select_related () - ForeignKey

Once upon a time… 1000 times
select_related or prefetch_related?
In django, select_related and prefetch_related will help you lower your
amount of query by preloading the foreign keys or many-to-many.
1. select_related uses a join (only for foreign keys):
- Advantages: only one request
- Problem: if you are joining big tables, with a lot of columns and no index,
it can be slow… We’ll talk about that next.
2. prefetch_related does a second request on your join table (for foreign keys
and many-to-many
- Advantages: no big join
- Problem: more queries

Example …
for owl in owls:
print(owl.job) # 1 query per loop
owls = Owl.objects
.filter(employer_name=‘Ulule’)
.select_related(‘job’)
for owl in owls:
print(owl.job) # no extra queries

Example …
Using select_related/prefetch_related
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" =
"job"."id")
WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "job" WHERE "job"."id" IN (2)
.prefetch_related(‘job’)

One of my query is super slow…
Let’s talk about EXPLAIN !

What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL statement
Using ANALYZE will actually execute your query! (Don’t worry, you
can ROLLBACK)
EXPLAIN (ANALYZE) my super query;
BEGIN;
EXPLAIN ANALYZE my super query;
ROLLBACK;

Mmmm… Query planner?
The magical thing that generates execution plans for a query and calculates
what is the cost of each plan.
The best one is used to execute your query

So, what does it took like ?
Let’s take a slow query…
SELECT "owl"."id", "owl"."name",
"owl"."employer_name", "owl"."favourite_food",
"owl"."job_id", "owl"."fur_color"
FROM "owl" WHERE "owl"."employer_name" = 'Ulule'

And…
owl_conference=# EXPLAIN ANALYZE
SELECT * FROM owl WHERE
employer_name=‘Ulule'
QUERY PLAN
------------------------------------
Seq Scan on owl (cost=0.00..205.01
rows=1 width=35) (actual
time=1.945..1.946 rows=1 loops=1)
Filter: ((employer_name)::text =
'Ulule'::text)
Rows Removed by Filter: 10000
Planning time: 0.080 ms
Execution time: 1.965 ms
(5 rows)

Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned
Cost of retrieving
first row
Average width of a
row (in bytes)
(actual time=1.945..1.946 rows=1 loops=1)
If you use ANALYZE
Number of time your seq scan
(index scan etc.) was executed

Seq Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
- Scan the entire database.
- Retrieves the rows matching your WHERE.
It can be expensive !
Do you need an index? So… Is it why my query is
slow?

What is an index then?
In encyclopaedia, if you
want every page where you
can find the word « Owl »,
you don’t read the entire
book, you go to the index !
A database index contains
the column value and
pointers to each row that
has this value.

Index scan
QUERY PLAN
-------------------------------------------------
Index Scan using employer_name_owl on owl
…
Index Cond: ((employer_name)::text =
'Ulule'::text)
(4 rows)
What if there is an index on the « employer_name » column?
The index is visited row by row in order to
retrieve the data corresponding to your clause.

owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE
"owl"."employer_name" = 'post office’;
QUERY PLAN
-------------------------------------------------
Seq Scan on owl
…
Filter: ((employer_name)::text = 'post
office'::text)
With an index and a really common value !
7000 owls work at the post office
Owl.objects.filter(employer_name=‘post office’)

Why is it using an seq scan?
An index scan uses the order of the index, the
head has to move between rows.
Moving the read head of the database is 1000
times slower than reading the next physical
block.
Conclusion: For common values it’s quicker to
read all data from the table in physical order
By the way… Retrieving 7000 rows might not be a great idea :).

Bitmap Heap Scan
owl_conference=# EXPLAIN SELECT * FROM owl WHERE
owl.employer_name = ‘Hogwarts’;
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on owl
…
Recheck Cond: ((employer_name)::text =
'Hogwarts'::text)
-> Bitmap Index Scan on employer_name_owl
(cost=0.00..47.28 rows=2000 width=0)
Index Cond: ((employer_name)::text =
'Hogwarts'::text)
With an index and a common value
2000 owls work at Hogwarts
Owl.objects.filter(employer_name=‘Hogwarts’)

Let’s go step by step ! ..4
Bitmap Heap Scan…
Index scan : goes through your index tuple-pointer one at a time
and reads the data from the pages. Uses the index order.
Bitmap Heap Scan: orders the tuple-pointer in physical memory
order and go through it.
Avoids little «physical jumps » between pages.

So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !

And now let’s join stuff…
Nested loops
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.job_id) WHERE job.id=1;
QUERY PLAN
-------------------------------------------------------------
Nested Loop
…
-> Seq Scan on job …
-> Seq Scan on owl …
Filter: (job_id = 1)
(9 rows)
Owl.objects.filter(job_id=1).select_related(‘job’)

Nested loops
Used for little tables, can be slow because it is doing two nested « for » loops !
This image
does not
match
the previous
query ;)

Hash Join
(job.id = owl.job_id) WHERE job.id>1;
QUERY PLAN
-------------------------------------------------------------
Hash Join …
Hash Cond: (owl.job_id = job.id)
-> Seq Scan on owl (cost=blabla(
-> Hash (cost=blabla)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on job (cost=blabla)
Filter: (id > 1)
(10 rows)
Owl.objects.filter(job_id__gte=1).select_related(‘job’)

Hash Join
Used for smaller tables, because the hash table has
to ﬁt in memory

Merge Join
(job.id = owl.id);
QUERY PLAN
-------------------------------------------------------------
Merge Join …
Merge Cond: (owl.id = job.id)
-> Index Scan using owl_pkey on owl
-> Sort …
Sort Key: job.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on job …
(10 rows)
Owl.objects.all().select_related(‘job’)

Merge Join
Used for big tables, an index can be
used to avoid sorting

So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And a last word about
ORDER BY
(last part, I swear !)

And now let’s order stuff…
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favourite_food;
QUERY PLAN
-------------------------------------------------------------
Sort …
Sort Key: job_id, favourite_food
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35)
(actual time=0.017..1.181 rows=10001 loops=1)
(6 rows)
Everything is sorted into the memory (which is why it can be costly in terms of memory)
Owl.objects.order_by(‘job_id’, ‘favourite_food’)

ORDER BY LIMIT
owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name
FROM owl ORDER BY owl.job_id, owl.favourite_food LIMIT 10;
QUERY PLAN
---------------------------------------------------------------
-----------------------------------------------------
Limit (cost…) (actual time…)
-> Sort (cost…) (actual time…)
Sort Key: name
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on owl (cost=0.00..180.01 rows=10001
width=16) (actual time=0.032..5.856 rows=10002 loops=1)
(7 rows)
Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
Owl.objects.order_by(‘job_id’, ‘favourite_food’)[0:10]

Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- Else
- If value smaller than current values (for ASC): insert row
in heap, pop last
- Else pass

Top-N heap sort
Data to order with a LIMIT 10 Iterations 1.. 2.. 3
Iteration 10

Top-N heap sort
Example
Iteration 11: Post Ofﬁce, nothing to do
Iteration 12: Ahmann in smaller than other values
Inserted in tree
Potter removed

With an index
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favourite_food;
QUERY PLAN
-------------------------------------------------------------
Index Scan using owl_job_id_favourite_food on owl
(cost=0.29..544.66 rows=10001 width=35) (actual
time=0.016..2.835 rows=10001 loops=1)
(3 rows)
Simply uses index order

Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You might need an index, only EXPLAIN will tell
you

Conclusion
- Looking at your DB logs could help you build a website
with good performance
- Always know where your queries come from
- Careful about loops ! Use prefetch_related and
select_related
- If you have a slow query, using EXPLAIN will help you find
a solution

Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)

To go further - sources
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
https://momjian.us/main/writings/pgsql/optimizer.pdf
https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations
http://tech.novapost.fr/postgresql-application_name-django-settings.html

The amazing world behind your ORM

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The amazing world behind your ORM

Similaire à The amazing world behind your ORM (20)

Plus de Louise Grandjonc

Plus de Louise Grandjonc (6)

Dernier

Dernier (20)

The amazing world behind your ORM