The amazing world behind your ORM

346 vues

Publié le

Slides for the talk given at the DjangoCon Europe 2017 in Firenze

Publié dans : Technologie
0 commentaire
2 j’aime
Statistiques
Remarques
  • Soyez le premier à commenter

Aucun téléchargement
Vues
Nombre de vues
346
Sur SlideShare
0
Issues des intégrations
0
Intégrations
33
Actions
Partages
0
Téléchargements
4
Commentaires
0
J’aime
2
Intégrations 0
Aucune incorporation

Aucune remarque pour cette diapositive

The amazing world behind your ORM

  1. 1. The amazing world behind your ORM Louise Grandjonc
  2. 2. Louise Grandjonc (louise@ulule.com) Lead developer at Ulule (www.ulule.com) Django developer - Postgres enthusiast @louisemeta on twitter About me
  3. 3. 1. How do we end up with performance problems? 2. How can we catch the performance problems without having to guess? 3. What does it change in our everyday developer job? Today’s agenda
  4. 4. How do we end up with performance problems?
  5. 5. 1.The ORM executes queries that you might not expect 2.Your queries might not be optimised and you won’t know about it Why we should know what our ORM is doing
  6. 6. How can we catch the performance problems (without having to guess)?
  7. 7. How can I see what is happening when I do stuff? 1. Django debug toolbar (to see queries and their explain in your django view) Advantages: can be easily included in your django templates Problems: Does not allow you to see everything (ajax calls !), if you’re working on an API, you cannot use it! 2. Django devserver : puts all the logs of your database into your runserver output Advantages: you’re not missing the ajax calls 3. Simply look at your database logs Advantages: you can see everything, you won’t be disturbed if you ever change project/programming languages/framework/computer, you can configure how you see your logs Problems: you don’t know where your logs are?
  8. 8. Where are my logs? owl_conference=# show log_directory ; log_directory --------------- pg_log owl_conference=# show data_directory ; data_directory ------------------------- /usr/local/var/postgres owl_conference=# show log_filename ; log_filename ------------------------- postgresql-%Y-%m-%d.log Terminal command $ psql -U user -d your_database_name psql interface
  9. 9. Having good looking logs (and logging everything like a crazy owl) owl_conference=# SHOW config_file; config_file ----------------------------------------- /usr/local/var/postgres/postgresql.conf (1 row) In your postgresql.conf log_filename = 'postgresql-%Y-%m-%d.log' log_statement = 'all' logging_collector = on log_min_duration_statement = 0
  10. 10. I’ve seen my logs… But … Where are these queries executed in my code? Let’s take an example… I have an owl DB with two tables. 10 000 owls 7 jobs
  11. 11. Example Query executed in Template def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  12. 12. Example Query executed in View def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: # Do something context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  13. 13. Yep ! I’ve seen my logs… But … Where are these queries executed in my code? How to spot where your query is executed? 1. Each model has a table to store data. Find the model. 2. Where in my view, or in my form am I using this model to get/filter objects? 3. Where am I using this objects? Is it in my view/form? Passed into the context and used in templates?
  14. 14. What does it change in our everyday developer job? (Or how to really do something when you have a problem)
  15. 15. The two most common problems of any developer… 1. I have way too many queries… Why ? 2. One of my queries is freakin' slow… Why?
  16. 16. Once upon a time… 1000 times The danger of loops in your code, and how your templates are making fun of you… 1. Use your context ! 2. Preload stuff in the query! • prefetch_related() - ManyToMany or ForeignKey • select_related () - ForeignKey
  17. 17. Once upon a time… 1000 times select_related or prefetch_related? In django, select_related and prefetch_related will help you lower your amount of query by preloading the foreign keys or many-to-many. 1. select_related uses a join (only for foreign keys): - Advantages: only one request - Problem: if you are joining big tables, with a lot of columns and no index, it can be slow… We’ll talk about that next. 2. prefetch_related does a second request on your join table (for foreign keys and many-to-many - Advantages: no big join - Problem: more queries
  18. 18. Example … owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: print(owl.job) # 1 query per loop owls = Owl.objects .filter(employer_name=‘Ulule’) .select_related(‘job’) for owl in owls: print(owl.job) # no extra queries
  19. 19. Example … Using select_related/prefetch_related Owl.objects.filter(employer_name=‘Ulule’) .select_related(‘job’) SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" = "job"."id") WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "job" WHERE "job"."id" IN (2) Owl.objects.filter(employer_name=‘Ulule’) .prefetch_related(‘job’)
  20. 20. One of my query is super slow… Let’s talk about EXPLAIN !
  21. 21. What is EXPLAIN Gives you the execution plan chosen by the query planner that your database will use to execute your SQL statement Using ANALYZE will actually execute your query! (Don’t worry, you can ROLLBACK) EXPLAIN (ANALYZE) my super query; BEGIN; EXPLAIN ANALYZE my super query; ROLLBACK;
  22. 22. Mmmm… Query planner? The magical thing that generates execution plans for a query and calculates what is the cost of each plan. The best one is used to execute your query
  23. 23. So, what does it took like ? Let’s take a slow query… Owl.objects.filter(employer_name=‘Ulule’) SELECT "owl"."id", "owl"."name", "owl"."employer_name", "owl"."favourite_food", "owl"."job_id", "owl"."fur_color" FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
  24. 24. And… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl WHERE employer_name=‘Ulule' QUERY PLAN ------------------------------------ Seq Scan on owl (cost=0.00..205.01 rows=1 width=35) (actual time=1.945..1.946 rows=1 loops=1) Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 Planning time: 0.080 ms Execution time: 1.965 ms (5 rows)
  25. 25. Let’s go step by step ! .. 1 Costs (cost=0.00..205.01 rows=1 width=35) Cost of retrieving all rows Number of rows returned Cost of retrieving first row Average width of a row (in bytes) (actual time=1.945..1.946 rows=1 loops=1) If you use ANALYZE Number of time your seq scan (index scan etc.) was executed
  26. 26. Let’s go step by step ! .. 2 Seq Scan Seq Scan on owl ... Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 - Scan the entire database. - Retrieves the rows matching your WHERE. It can be expensive ! Do you need an index? So… Is it why my query is slow?
  27. 27. What is an index then? In encyclopaedia, if you want every page where you can find the word « Owl », you don’t read the entire book, you go to the index ! A database index contains the column value and pointers to each row that has this value.
  28. 28. Let’s go step by step ! .. 3 Index scan QUERY PLAN ------------------------------------------------- Index Scan using employer_name_owl on owl … Index Cond: ((employer_name)::text = 'Ulule'::text) Planning time: 0.387 ms Execution time: 0.066 ms (4 rows) What if there is an index on the « employer_name » column? The index is visited row by row in order to retrieve the data corresponding to your clause.
  29. 29. Let’s go step by step ! .. 4 owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE "owl"."employer_name" = 'post office’; QUERY PLAN ------------------------------------------------- Seq Scan on owl … Filter: ((employer_name)::text = 'post office'::text) With an index and a really common value ! 7000 owls work at the post office Owl.objects.filter(employer_name=‘post office’)
  30. 30. Let’s go step by step ! .. 4 Why is it using an seq scan? An index scan uses the order of the index, the head has to move between rows. Moving the read head of the database is 1000 times slower than reading the next physical block. Conclusion: For common values it’s quicker to read all data from the table in physical order By the way… Retrieving 7000 rows might not be a great idea :).
  31. 31. Let’s go step by step ! .. 5 Bitmap Heap Scan owl_conference=# EXPLAIN SELECT * FROM owl WHERE owl.employer_name = ‘Hogwarts’; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on owl … Recheck Cond: ((employer_name)::text = 'Hogwarts'::text) -> Bitmap Index Scan on employer_name_owl (cost=0.00..47.28 rows=2000 width=0) Index Cond: ((employer_name)::text = 'Hogwarts'::text) With an index and a common value 2000 owls work at Hogwarts Owl.objects.filter(employer_name=‘Hogwarts’)
  32. 32. Let’s go step by step ! ..4 Bitmap Heap Scan… Index scan : goes through your index tuple-pointer one at a time and reads the data from the pages. Uses the index order. Bitmap Heap Scan: orders the tuple-pointer in physical memory order and go through it. Avoids little «physical jumps » between pages.
  33. 33. So we have 3 types of scan 1. Sequential scan 2. Index scan 3. Bitmap heap scan And now let’s join stuff !
  34. 34. And now let’s join stuff… Nested loops owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id=1; QUERY PLAN ------------------------------------------------------------- Nested Loop … -> Seq Scan on job … Rows Removed by Filter: 6 -> Seq Scan on owl … Filter: (job_id = 1) Rows Removed by Filter: 1000 Planning time: 0.150 ms Execution time: 3.663 ms (9 rows) Owl.objects.filter(job_id=1).select_related(‘job’)
  35. 35. And now let’s join stuff… Nested loops Used for little tables, can be slow because it is doing two nested « for » loops ! This image does not match the previous query ;)
  36. 36. And now let’s join stuff… Hash Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id>1; QUERY PLAN ------------------------------------------------------------- Hash Join … Hash Cond: (owl.job_id = job.id) -> Seq Scan on owl (cost=blabla( -> Hash (cost=blabla) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on job (cost=blabla) Filter: (id > 1) Rows Removed by Filter: 1 Planning time: 0.235 ms (10 rows) Owl.objects.filter(job_id__gte=1).select_related(‘job’)
  37. 37. And now let’s join stuff… Hash Join Used for smaller tables, because the hash table has to fit in memory
  38. 38. And now let’s join stuff… Merge Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.id); QUERY PLAN ------------------------------------------------------------- Merge Join … Merge Cond: (owl.id = job.id) -> Index Scan using owl_pkey on owl -> Sort … Sort Key: job.id Sort Method: quicksort Memory: 25kB -> Seq Scan on job … Planning time: 0.453 ms Execution time: 0.102 ms (10 rows) Owl.objects.all().select_related(‘job’)
  39. 39. And now let’s join stuff… Merge Join Used for big tables, an index can be used to avoid sorting
  40. 40. So we have 3 types of joins 1. Nested loop 2. Hash join 3. Merge join And a last word about ORDER BY (last part, I swear !)
  41. 41. And now let’s order stuff… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Sort … Sort Key: job_id, favourite_food Sort Method: quicksort Memory: 1166kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.017..1.181 rows=10001 loops=1) Planning time: 0.142 ms Execution time: 8.665 ms (6 rows) Everything is sorted into the memory (which is why it can be costly in terms of memory) Owl.objects.order_by(‘job_id’, ‘favourite_food’)
  42. 42. And now let’s order stuff… ORDER BY LIMIT owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name FROM owl ORDER BY owl.job_id, owl.favourite_food LIMIT 10; QUERY PLAN --------------------------------------------------------------- ----------------------------------------------------- Limit (cost…) (actual time…) -> Sort (cost…) (actual time…) Sort Key: name Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=16) (actual time=0.032..5.856 rows=10002 loops=1) Planning time: 0.201 ms Execution time: 15.846 ms (7 rows) Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller? Owl.objects.order_by(‘job_id’, ‘favourite_food’)[0:10]
  43. 43. Top-N heap sort - A heap (sort of tree) is used with a limited size - For each row - If heap not full: add row in heap - Else - If value smaller than current values (for ASC): insert row in heap, pop last - Else pass
  44. 44. Top-N heap sort Data to order with a LIMIT 10 Iterations 1.. 2.. 3 Iteration 10
  45. 45. Top-N heap sort Example Iteration 11: Post Office, nothing to do Iteration 12: Ahmann in smaller than other values Inserted in tree Potter removed
  46. 46. And now let’s order stuff… With an index owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Index Scan using owl_job_id_favourite_food on owl (cost=0.29..544.66 rows=10001 width=35) (actual time=0.016..2.835 rows=10001 loops=1) Planning time: 0.098 ms Execution time: 3.510 ms (3 rows) Simply uses index order
  47. 47. Be careful when you ORDER BY ! 1. Sorting with sort key without limit or index can be heavy in term of memory ! 2. You might need an index, only EXPLAIN will tell you
  48. 48. Conclusion
  49. 49. Conclusion - Looking at your DB logs could help you build a website with good performance - Always know where your queries come from - Careful about loops ! Use prefetch_related and select_related - If you have a slow query, using EXPLAIN will help you find a solution
  50. 50. Thank you for your attention ! Any questions? Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
  51. 51. To go further - sources Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/) https://momjian.us/main/writings/pgsql/optimizer.pdf https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations http://tech.novapost.fr/postgresql-application_name-django-settings.html

×