Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
The amazing world
behind your ORM
Louise Grandjonc
Louise Grandjonc (louise@ulule.com)
Lead developer at Ulule (www.ulule.com)
Django developer - Postgres enthusiast
@louise...
1. How do we end up with performance problems?
2. How can we catch the performance problems without
having to guess?
3. Wh...
How do we end up with performance
problems?
1.The ORM executes queries that you might not expect
2.Your queries might not be optimised and you won’t
know about it
Why...
How can we catch the performance
problems (without having to guess)?
How can I see what is happening
when I do stuff?
1. Django debug toolbar (to see queries and their explain in your django ...
Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_d...
Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
--------...
I’ve seen my logs… But …
Where are these queries executed in my code?
Let’s take an example…
I have an owl DB with two tab...
Example
Query executed in Template
def index(request):
owls = Owl.objects.filter(employer_name=‘Ulule’)
context = {‘owls':...
Example
Query executed in View
def index(request):
owls = Owl.objects.filter(employer_name=‘Ulule’)
for owl in owls:
# Do ...
Yep ! I’ve seen my logs… But …
Where are these queries executed in my code?
How to spot where your query is executed?
1. E...
What does it change in our everyday
developer job?
(Or how to really do something when you have a problem)
The two most common
problems of any developer…
1. I have way too many queries… Why ?
2. One of my queries is freakin' slow...
Once upon a time… 1000 times
The danger of loops in your code, and how your templates
are making fun of you…
1. Use your c...
Once upon a time… 1000 times
select_related or prefetch_related?
In django, select_related and prefetch_related will help ...
Example …
owls = Owl.objects.filter(employer_name=‘Ulule’)
for owl in owls:
print(owl.job) # 1 query per loop
owls = Owl.o...
Example …
Using select_related/prefetch_related
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … ...
One of my query is super slow…
Let’s talk about EXPLAIN !
What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL s...
Mmmm… Query planner?
The magical thing that generates execution plans for a query and calculates
what is the cost of each ...
So, what does it took like ?
Let’s take a slow query…
Owl.objects.filter(employer_name=‘Ulule’)
SELECT "owl"."id", "owl"."...
And…
owl_conference=# EXPLAIN ANALYZE
SELECT * FROM owl WHERE
employer_name=‘Ulule'
QUERY PLAN
---------------------------...
Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned...
Let’s go step by step ! .. 2
Seq Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by ...
What is an index then?
In encyclopaedia, if you
want every page where you
can find the word « Owl »,
you don’t read the en...
Let’s go step by step ! .. 3
Index scan
QUERY PLAN
-------------------------------------------------
Index Scan using empl...
Let’s go step by step ! .. 4
owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE
"owl"."employer_name" = 'post office’;
QUE...
Let’s go step by step ! .. 4
Why is it using an seq scan?
An index scan uses the order of the index, the
head has to move ...
Let’s go step by step ! .. 5
Bitmap Heap Scan
owl_conference=# EXPLAIN SELECT * FROM owl WHERE
owl.employer_name = ‘Hogwar...
Let’s go step by step ! ..4
Bitmap Heap Scan…
Index scan : goes through your index tuple-pointer one at a time
and reads t...
So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !
And now let’s join stuff…
Nested loops
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.job_id...
And now let’s join stuff…
Nested loops
Used for little tables, can be slow because it is doing two nested « for » loops !
...
And now let’s join stuff…
Hash Join
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.job_id) W...
And now let’s join stuff…
Hash Join
Used for smaller tables, because the hash table has
to fit in memory
And now let’s join stuff…
Merge Join
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.id);
QUE...
And now let’s join stuff…
Merge Join
Used for big tables, an index can be
used to avoid sorting
So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And a last word about
ORDER BY
(last part, I swear !)
And now let’s order stuff…
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favourite_food;
QUE...
And now let’s order stuff…
ORDER BY LIMIT
owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name
FROM owl ORDER BY ow...
Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- E...
Top-N heap sort
Data to order with a LIMIT 10 Iterations 1.. 2.. 3
Iteration 10
Top-N heap sort
Example
Iteration 11: Post Office, nothing to do
Iteration 12: Ahmann in smaller than other values
Inserted...
And now let’s order stuff…
With an index
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favou...
Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You mig...
Conclusion
Conclusion
- Looking at your DB logs could help you build a website
with good performance
- Always know where your queries...
Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
To go further - sources
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
https://momjian.us/main/writings...
Prochain SlideShare
Chargement dans…5
×

sur

The amazing world behind your ORM Slide 1 The amazing world behind your ORM Slide 2 The amazing world behind your ORM Slide 3 The amazing world behind your ORM Slide 4 The amazing world behind your ORM Slide 5 The amazing world behind your ORM Slide 6 The amazing world behind your ORM Slide 7 The amazing world behind your ORM Slide 8 The amazing world behind your ORM Slide 9 The amazing world behind your ORM Slide 10 The amazing world behind your ORM Slide 11 The amazing world behind your ORM Slide 12 The amazing world behind your ORM Slide 13 The amazing world behind your ORM Slide 14 The amazing world behind your ORM Slide 15 The amazing world behind your ORM Slide 16 The amazing world behind your ORM Slide 17 The amazing world behind your ORM Slide 18 The amazing world behind your ORM Slide 19 The amazing world behind your ORM Slide 20 The amazing world behind your ORM Slide 21 The amazing world behind your ORM Slide 22 The amazing world behind your ORM Slide 23 The amazing world behind your ORM Slide 24 The amazing world behind your ORM Slide 25 The amazing world behind your ORM Slide 26 The amazing world behind your ORM Slide 27 The amazing world behind your ORM Slide 28 The amazing world behind your ORM Slide 29 The amazing world behind your ORM Slide 30 The amazing world behind your ORM Slide 31 The amazing world behind your ORM Slide 32 The amazing world behind your ORM Slide 33 The amazing world behind your ORM Slide 34 The amazing world behind your ORM Slide 35 The amazing world behind your ORM Slide 36 The amazing world behind your ORM Slide 37 The amazing world behind your ORM Slide 38 The amazing world behind your ORM Slide 39 The amazing world behind your ORM Slide 40 The amazing world behind your ORM Slide 41 The amazing world behind your ORM Slide 42 The amazing world behind your ORM Slide 43 The amazing world behind your ORM Slide 44 The amazing world behind your ORM Slide 45 The amazing world behind your ORM Slide 46 The amazing world behind your ORM Slide 47 The amazing world behind your ORM Slide 48 The amazing world behind your ORM Slide 49 The amazing world behind your ORM Slide 50 The amazing world behind your ORM Slide 51
Prochain SlideShare
What to Upload to SlideShare
Suivant
Télécharger pour lire hors ligne et voir en mode plein écran

4 j’aime

Partager

Télécharger pour lire hors ligne

The amazing world behind your ORM

Télécharger pour lire hors ligne

Slides for the talk given at the DjangoCon Europe 2017 in Firenze

The amazing world behind your ORM

  1. 1. The amazing world behind your ORM Louise Grandjonc
  2. 2. Louise Grandjonc (louise@ulule.com) Lead developer at Ulule (www.ulule.com) Django developer - Postgres enthusiast @louisemeta on twitter About me
  3. 3. 1. How do we end up with performance problems? 2. How can we catch the performance problems without having to guess? 3. What does it change in our everyday developer job? Today’s agenda
  4. 4. How do we end up with performance problems?
  5. 5. 1.The ORM executes queries that you might not expect 2.Your queries might not be optimised and you won’t know about it Why we should know what our ORM is doing
  6. 6. How can we catch the performance problems (without having to guess)?
  7. 7. How can I see what is happening when I do stuff? 1. Django debug toolbar (to see queries and their explain in your django view) Advantages: can be easily included in your django templates Problems: Does not allow you to see everything (ajax calls !), if you’re working on an API, you cannot use it! 2. Django devserver : puts all the logs of your database into your runserver output Advantages: you’re not missing the ajax calls 3. Simply look at your database logs Advantages: you can see everything, you won’t be disturbed if you ever change project/programming languages/framework/computer, you can configure how you see your logs Problems: you don’t know where your logs are?
  8. 8. Where are my logs? owl_conference=# show log_directory ; log_directory --------------- pg_log owl_conference=# show data_directory ; data_directory ------------------------- /usr/local/var/postgres owl_conference=# show log_filename ; log_filename ------------------------- postgresql-%Y-%m-%d.log Terminal command $ psql -U user -d your_database_name psql interface
  9. 9. Having good looking logs (and logging everything like a crazy owl) owl_conference=# SHOW config_file; config_file ----------------------------------------- /usr/local/var/postgres/postgresql.conf (1 row) In your postgresql.conf log_filename = 'postgresql-%Y-%m-%d.log' log_statement = 'all' logging_collector = on log_min_duration_statement = 0
  10. 10. I’ve seen my logs… But … Where are these queries executed in my code? Let’s take an example… I have an owl DB with two tables. 10 000 owls 7 jobs
  11. 11. Example Query executed in Template def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  12. 12. Example Query executed in View def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: # Do something context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  13. 13. Yep ! I’ve seen my logs… But … Where are these queries executed in my code? How to spot where your query is executed? 1. Each model has a table to store data. Find the model. 2. Where in my view, or in my form am I using this model to get/filter objects? 3. Where am I using this objects? Is it in my view/form? Passed into the context and used in templates?
  14. 14. What does it change in our everyday developer job? (Or how to really do something when you have a problem)
  15. 15. The two most common problems of any developer… 1. I have way too many queries… Why ? 2. One of my queries is freakin' slow… Why?
  16. 16. Once upon a time… 1000 times The danger of loops in your code, and how your templates are making fun of you… 1. Use your context ! 2. Preload stuff in the query! • prefetch_related() - ManyToMany or ForeignKey • select_related () - ForeignKey
  17. 17. Once upon a time… 1000 times select_related or prefetch_related? In django, select_related and prefetch_related will help you lower your amount of query by preloading the foreign keys or many-to-many. 1. select_related uses a join (only for foreign keys): - Advantages: only one request - Problem: if you are joining big tables, with a lot of columns and no index, it can be slow… We’ll talk about that next. 2. prefetch_related does a second request on your join table (for foreign keys and many-to-many - Advantages: no big join - Problem: more queries
  18. 18. Example … owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: print(owl.job) # 1 query per loop owls = Owl.objects .filter(employer_name=‘Ulule’) .select_related(‘job’) for owl in owls: print(owl.job) # no extra queries
  19. 19. Example … Using select_related/prefetch_related Owl.objects.filter(employer_name=‘Ulule’) .select_related(‘job’) SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" = "job"."id") WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "job" WHERE "job"."id" IN (2) Owl.objects.filter(employer_name=‘Ulule’) .prefetch_related(‘job’)
  20. 20. One of my query is super slow… Let’s talk about EXPLAIN !
  21. 21. What is EXPLAIN Gives you the execution plan chosen by the query planner that your database will use to execute your SQL statement Using ANALYZE will actually execute your query! (Don’t worry, you can ROLLBACK) EXPLAIN (ANALYZE) my super query; BEGIN; EXPLAIN ANALYZE my super query; ROLLBACK;
  22. 22. Mmmm… Query planner? The magical thing that generates execution plans for a query and calculates what is the cost of each plan. The best one is used to execute your query
  23. 23. So, what does it took like ? Let’s take a slow query… Owl.objects.filter(employer_name=‘Ulule’) SELECT "owl"."id", "owl"."name", "owl"."employer_name", "owl"."favourite_food", "owl"."job_id", "owl"."fur_color" FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
  24. 24. And… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl WHERE employer_name=‘Ulule' QUERY PLAN ------------------------------------ Seq Scan on owl (cost=0.00..205.01 rows=1 width=35) (actual time=1.945..1.946 rows=1 loops=1) Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 Planning time: 0.080 ms Execution time: 1.965 ms (5 rows)
  25. 25. Let’s go step by step ! .. 1 Costs (cost=0.00..205.01 rows=1 width=35) Cost of retrieving all rows Number of rows returned Cost of retrieving first row Average width of a row (in bytes) (actual time=1.945..1.946 rows=1 loops=1) If you use ANALYZE Number of time your seq scan (index scan etc.) was executed
  26. 26. Let’s go step by step ! .. 2 Seq Scan Seq Scan on owl ... Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 - Scan the entire database. - Retrieves the rows matching your WHERE. It can be expensive ! Do you need an index? So… Is it why my query is slow?
  27. 27. What is an index then? In encyclopaedia, if you want every page where you can find the word « Owl », you don’t read the entire book, you go to the index ! A database index contains the column value and pointers to each row that has this value.
  28. 28. Let’s go step by step ! .. 3 Index scan QUERY PLAN ------------------------------------------------- Index Scan using employer_name_owl on owl … Index Cond: ((employer_name)::text = 'Ulule'::text) Planning time: 0.387 ms Execution time: 0.066 ms (4 rows) What if there is an index on the « employer_name » column? The index is visited row by row in order to retrieve the data corresponding to your clause.
  29. 29. Let’s go step by step ! .. 4 owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE "owl"."employer_name" = 'post office’; QUERY PLAN ------------------------------------------------- Seq Scan on owl … Filter: ((employer_name)::text = 'post office'::text) With an index and a really common value ! 7000 owls work at the post office Owl.objects.filter(employer_name=‘post office’)
  30. 30. Let’s go step by step ! .. 4 Why is it using an seq scan? An index scan uses the order of the index, the head has to move between rows. Moving the read head of the database is 1000 times slower than reading the next physical block. Conclusion: For common values it’s quicker to read all data from the table in physical order By the way… Retrieving 7000 rows might not be a great idea :).
  31. 31. Let’s go step by step ! .. 5 Bitmap Heap Scan owl_conference=# EXPLAIN SELECT * FROM owl WHERE owl.employer_name = ‘Hogwarts’; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on owl … Recheck Cond: ((employer_name)::text = 'Hogwarts'::text) -> Bitmap Index Scan on employer_name_owl (cost=0.00..47.28 rows=2000 width=0) Index Cond: ((employer_name)::text = 'Hogwarts'::text) With an index and a common value 2000 owls work at Hogwarts Owl.objects.filter(employer_name=‘Hogwarts’)
  32. 32. Let’s go step by step ! ..4 Bitmap Heap Scan… Index scan : goes through your index tuple-pointer one at a time and reads the data from the pages. Uses the index order. Bitmap Heap Scan: orders the tuple-pointer in physical memory order and go through it. Avoids little «physical jumps » between pages.
  33. 33. So we have 3 types of scan 1. Sequential scan 2. Index scan 3. Bitmap heap scan And now let’s join stuff !
  34. 34. And now let’s join stuff… Nested loops owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id=1; QUERY PLAN ------------------------------------------------------------- Nested Loop … -> Seq Scan on job … Rows Removed by Filter: 6 -> Seq Scan on owl … Filter: (job_id = 1) Rows Removed by Filter: 1000 Planning time: 0.150 ms Execution time: 3.663 ms (9 rows) Owl.objects.filter(job_id=1).select_related(‘job’)
  35. 35. And now let’s join stuff… Nested loops Used for little tables, can be slow because it is doing two nested « for » loops ! This image does not match the previous query ;)
  36. 36. And now let’s join stuff… Hash Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id>1; QUERY PLAN ------------------------------------------------------------- Hash Join … Hash Cond: (owl.job_id = job.id) -> Seq Scan on owl (cost=blabla( -> Hash (cost=blabla) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on job (cost=blabla) Filter: (id > 1) Rows Removed by Filter: 1 Planning time: 0.235 ms (10 rows) Owl.objects.filter(job_id__gte=1).select_related(‘job’)
  37. 37. And now let’s join stuff… Hash Join Used for smaller tables, because the hash table has to fit in memory
  38. 38. And now let’s join stuff… Merge Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.id); QUERY PLAN ------------------------------------------------------------- Merge Join … Merge Cond: (owl.id = job.id) -> Index Scan using owl_pkey on owl -> Sort … Sort Key: job.id Sort Method: quicksort Memory: 25kB -> Seq Scan on job … Planning time: 0.453 ms Execution time: 0.102 ms (10 rows) Owl.objects.all().select_related(‘job’)
  39. 39. And now let’s join stuff… Merge Join Used for big tables, an index can be used to avoid sorting
  40. 40. So we have 3 types of joins 1. Nested loop 2. Hash join 3. Merge join And a last word about ORDER BY (last part, I swear !)
  41. 41. And now let’s order stuff… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Sort … Sort Key: job_id, favourite_food Sort Method: quicksort Memory: 1166kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.017..1.181 rows=10001 loops=1) Planning time: 0.142 ms Execution time: 8.665 ms (6 rows) Everything is sorted into the memory (which is why it can be costly in terms of memory) Owl.objects.order_by(‘job_id’, ‘favourite_food’)
  42. 42. And now let’s order stuff… ORDER BY LIMIT owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name FROM owl ORDER BY owl.job_id, owl.favourite_food LIMIT 10; QUERY PLAN --------------------------------------------------------------- ----------------------------------------------------- Limit (cost…) (actual time…) -> Sort (cost…) (actual time…) Sort Key: name Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=16) (actual time=0.032..5.856 rows=10002 loops=1) Planning time: 0.201 ms Execution time: 15.846 ms (7 rows) Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller? Owl.objects.order_by(‘job_id’, ‘favourite_food’)[0:10]
  43. 43. Top-N heap sort - A heap (sort of tree) is used with a limited size - For each row - If heap not full: add row in heap - Else - If value smaller than current values (for ASC): insert row in heap, pop last - Else pass
  44. 44. Top-N heap sort Data to order with a LIMIT 10 Iterations 1.. 2.. 3 Iteration 10
  45. 45. Top-N heap sort Example Iteration 11: Post Office, nothing to do Iteration 12: Ahmann in smaller than other values Inserted in tree Potter removed
  46. 46. And now let’s order stuff… With an index owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Index Scan using owl_job_id_favourite_food on owl (cost=0.29..544.66 rows=10001 width=35) (actual time=0.016..2.835 rows=10001 loops=1) Planning time: 0.098 ms Execution time: 3.510 ms (3 rows) Simply uses index order
  47. 47. Be careful when you ORDER BY ! 1. Sorting with sort key without limit or index can be heavy in term of memory ! 2. You might need an index, only EXPLAIN will tell you
  48. 48. Conclusion
  49. 49. Conclusion - Looking at your DB logs could help you build a website with good performance - Always know where your queries come from - Careful about loops ! Use prefetch_related and select_related - If you have a slow query, using EXPLAIN will help you find a solution
  50. 50. Thank you for your attention ! Any questions? Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
  51. 51. To go further - sources Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/) https://momjian.us/main/writings/pgsql/optimizer.pdf https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations http://tech.novapost.fr/postgresql-application_name-django-settings.html
  • godlaugh

    Apr. 13, 2020
  • JulienCharpentier

    Mar. 16, 2018
  • PatrickColmant

    Apr. 23, 2017
  • PaoloMelchiorre1

    Apr. 10, 2017

Slides for the talk given at the DjangoCon Europe 2017 in Firenze

Vues

Nombre de vues

962

Sur Slideshare

0

À partir des intégrations

0

Nombre d'intégrations

35

Actions

Téléchargements

9

Partages

0

Commentaires

0

Mentions J'aime

4

×