SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
The amazing world
behind your ORM
Louise Grandjonc
Louise Grandjonc (louise@ulule.com)
Lead developer at Ulule (www.ulule.com)
Django developer - Postgres enthusiast
@louisemeta on twitter
About me
1. How do we end up with performance problems?
2. How can we catch the performance problems without
having to guess?
3. What does it change in our everyday developer job?
Today’s agenda
How do we end up with performance
problems?
1.The ORM executes queries that you might not expect
2.Your queries might not be optimised and you won’t
know about it
Why we should know what our
ORM is doing
How can we catch the performance
problems (without having to guess)?
How can I see what is happening
when I do stuff?
1. Django debug toolbar (to see queries and their explain in your django view)
Advantages: can be easily included in your django templates
Problems: Does not allow you to see everything (ajax calls !), if you’re working on
an API, you cannot use it!
2. Django devserver : puts all the logs of your database into your runserver output
Advantages: you’re not missing the ajax calls
3. Simply look at your database logs
Advantages: you can see everything, you won’t be disturbed if you ever change
project/programming languages/framework/computer, you can configure how you see
your logs
Problems: you don’t know where your logs are?
Where are my logs?
owl_conference=# show log_directory ;
log_directory
---------------
pg_log
owl_conference=# show data_directory ;
data_directory
-------------------------
/usr/local/var/postgres
owl_conference=# show log_filename ;
log_filename
-------------------------
postgresql-%Y-%m-%d.log
Terminal command
$ psql -U user -d your_database_name
psql interface
Having good looking logs
(and logging everything like a crazy owl)
owl_conference=# SHOW config_file;
config_file
-----------------------------------------
/usr/local/var/postgres/postgresql.conf
(1 row)
In your postgresql.conf
log_filename = 'postgresql-%Y-%m-%d.log'
log_statement = 'all'
logging_collector = on
log_min_duration_statement = 0
I’ve seen my logs… But …
Where are these queries executed in my code?
Let’s take an example…
I have an owl DB with two tables.
10 000 owls 7 jobs
Example
Query executed in Template
def index(request):
owls = Owl.objects.filter(employer_name=‘Ulule’)
context = {‘owls': owls}
return render(request, 'owls/index.html', context)
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
{% for owl in owls %}
<p> {{ owl.name }} </p>
{% end for %}
Example
Query executed in View
def index(request):
owls = Owl.objects.filter(employer_name=‘Ulule’)
for owl in owls:
# Do something
context = {‘owls': owls}
return render(request, 'owls/index.html', context)
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
{% for owl in owls %}
<p> {{ owl.name }} </p>
{% end for %}
Yep ! I’ve seen my logs… But …
Where are these queries executed in my code?
How to spot where your query is executed?
1. Each model has a table to store data.
Find the model.
2. Where in my view, or in my form am I
using this model to get/filter objects?
3. Where am I using this objects? Is it in my
view/form? Passed into the context and
used in templates?
What does it change in our everyday
developer job?
(Or how to really do something when you have a problem)
The two most common
problems of any developer…
1. I have way too many queries… Why ?
2. One of my queries is freakin' slow… Why?
Once upon a time… 1000 times
The danger of loops in your code, and how your templates
are making fun of you…
1. Use your context !
2. Preload stuff in the query!
• prefetch_related() - ManyToMany or ForeignKey
• select_related () - ForeignKey
Once upon a time… 1000 times
select_related or prefetch_related?
In django, select_related and prefetch_related will help you lower your
amount of query by preloading the foreign keys or many-to-many.
1. select_related uses a join (only for foreign keys):
- Advantages: only one request
- Problem: if you are joining big tables, with a lot of columns and no index,
it can be slow… We’ll talk about that next.
2. prefetch_related does a second request on your join table (for foreign keys
and many-to-many
- Advantages: no big join
- Problem: more queries
Example …
owls = Owl.objects.filter(employer_name=‘Ulule’)
for owl in owls:
print(owl.job) # 1 query per loop
owls = Owl.objects
.filter(employer_name=‘Ulule’)
.select_related(‘job’)
for owl in owls:
print(owl.job) # no extra queries
Example …
Using select_related/prefetch_related
Owl.objects.filter(employer_name=‘Ulule’)
.select_related(‘job’)
SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" =
"job"."id")
WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
SELECT … FROM "job" WHERE "job"."id" IN (2)
Owl.objects.filter(employer_name=‘Ulule’)
.prefetch_related(‘job’)
One of my query is super slow…
Let’s talk about EXPLAIN !
What is EXPLAIN
Gives you the execution plan chosen by the query planner that your
database will use to execute your SQL statement
Using ANALYZE will actually execute your query! (Don’t worry, you
can ROLLBACK)
EXPLAIN (ANALYZE) my super query;
BEGIN;
EXPLAIN ANALYZE my super query;
ROLLBACK;
Mmmm… Query planner?
The magical thing that generates execution plans for a query and calculates
what is the cost of each plan.
The best one is used to execute your query
So, what does it took like ?
Let’s take a slow query…
Owl.objects.filter(employer_name=‘Ulule’)
SELECT "owl"."id", "owl"."name",
"owl"."employer_name", "owl"."favourite_food",
"owl"."job_id", "owl"."fur_color"
FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
And…
owl_conference=# EXPLAIN ANALYZE
SELECT * FROM owl WHERE
employer_name=‘Ulule'
QUERY PLAN
------------------------------------
Seq Scan on owl (cost=0.00..205.01
rows=1 width=35) (actual
time=1.945..1.946 rows=1 loops=1)
Filter: ((employer_name)::text =
'Ulule'::text)
Rows Removed by Filter: 10000
Planning time: 0.080 ms
Execution time: 1.965 ms
(5 rows)
Let’s go step by step ! .. 1
Costs
(cost=0.00..205.01 rows=1 width=35)
Cost of retrieving
all rows
Number of rows
returned
Cost of retrieving
first row
Average width of a
row (in bytes)
(actual time=1.945..1.946 rows=1 loops=1)
If you use ANALYZE
Number of time your seq scan
(index scan etc.) was executed
Let’s go step by step ! .. 2
Seq Scan
Seq Scan on owl ...
Filter: ((employer_name)::text = 'Ulule'::text)
Rows Removed by Filter: 10000
- Scan the entire database.
- Retrieves the rows matching your WHERE.
It can be expensive !
Do you need an index? So… Is it why my query is
slow?
What is an index then?
In encyclopaedia, if you
want every page where you
can find the word « Owl »,
you don’t read the entire
book, you go to the index !
A database index contains
the column value and
pointers to each row that
has this value.
Let’s go step by step ! .. 3
Index scan
QUERY PLAN
-------------------------------------------------
Index Scan using employer_name_owl on owl
…
Index Cond: ((employer_name)::text =
'Ulule'::text)
Planning time: 0.387 ms
Execution time: 0.066 ms
(4 rows)
What if there is an index on the « employer_name » column?
The index is visited row by row in order to
retrieve the data corresponding to your clause.
Let’s go step by step ! .. 4
owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE
"owl"."employer_name" = 'post office’;
QUERY PLAN
-------------------------------------------------
Seq Scan on owl
…
Filter: ((employer_name)::text = 'post
office'::text)
With an index and a really common value !
7000 owls work at the post office
Owl.objects.filter(employer_name=‘post office’)
Let’s go step by step ! .. 4
Why is it using an seq scan?
An index scan uses the order of the index, the
head has to move between rows.
Moving the read head of the database is 1000
times slower than reading the next physical
block.
Conclusion: For common values it’s quicker to
read all data from the table in physical order
By the way… Retrieving 7000 rows might not be a great idea :).
Let’s go step by step ! .. 5
Bitmap Heap Scan
owl_conference=# EXPLAIN SELECT * FROM owl WHERE
owl.employer_name = ‘Hogwarts’;
QUERY PLAN
-------------------------------------------------
Bitmap Heap Scan on owl
…
Recheck Cond: ((employer_name)::text =
'Hogwarts'::text)
-> Bitmap Index Scan on employer_name_owl
(cost=0.00..47.28 rows=2000 width=0)
Index Cond: ((employer_name)::text =
'Hogwarts'::text)
With an index and a common value
2000 owls work at Hogwarts
Owl.objects.filter(employer_name=‘Hogwarts’)
Let’s go step by step ! ..4
Bitmap Heap Scan…
Index scan : goes through your index tuple-pointer one at a time
and reads the data from the pages. Uses the index order.
Bitmap Heap Scan: orders the tuple-pointer in physical memory
order and go through it.
Avoids little «physical jumps » between pages.
So we have 3 types of scan
1. Sequential scan
2. Index scan
3. Bitmap heap scan
And now let’s join stuff !
And now let’s join stuff…
Nested loops
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.job_id) WHERE job.id=1;
QUERY PLAN
-------------------------------------------------------------
Nested Loop
…
-> Seq Scan on job …
Rows Removed by Filter: 6
-> Seq Scan on owl …
Filter: (job_id = 1)
Rows Removed by Filter: 1000
Planning time: 0.150 ms
Execution time: 3.663 ms
(9 rows)
Owl.objects.filter(job_id=1).select_related(‘job’)
And now let’s join stuff…
Nested loops
Used for little tables, can be slow because it is doing two nested « for » loops !
This image
does not
match
the previous
query ;)
And now let’s join stuff…
Hash Join
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.job_id) WHERE job.id>1;
QUERY PLAN
-------------------------------------------------------------
Hash Join …
Hash Cond: (owl.job_id = job.id)
-> Seq Scan on owl (cost=blabla(
-> Hash (cost=blabla)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on job (cost=blabla)
Filter: (id > 1)
Rows Removed by Filter: 1
Planning time: 0.235 ms
(10 rows)
Owl.objects.filter(job_id__gte=1).select_related(‘job’)
And now let’s join stuff…
Hash Join
Used for smaller tables, because the hash table has
to fit in memory
And now let’s join stuff…
Merge Join
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON
(job.id = owl.id);
QUERY PLAN
-------------------------------------------------------------
Merge Join …
Merge Cond: (owl.id = job.id)
-> Index Scan using owl_pkey on owl
-> Sort …
Sort Key: job.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on job …
Planning time: 0.453 ms
Execution time: 0.102 ms
(10 rows)
Owl.objects.all().select_related(‘job’)
And now let’s join stuff…
Merge Join
Used for big tables, an index can be
used to avoid sorting
So we have 3 types of joins
1. Nested loop
2. Hash join
3. Merge join
And a last word about
ORDER BY
(last part, I swear !)
And now let’s order stuff…
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favourite_food;
QUERY PLAN
-------------------------------------------------------------
Sort …
Sort Key: job_id, favourite_food
Sort Method: quicksort Memory: 1166kB
-> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35)
(actual time=0.017..1.181 rows=10001 loops=1)
Planning time: 0.142 ms
Execution time: 8.665 ms
(6 rows)
Everything is sorted into the memory (which is why it can be costly in terms of memory)
Owl.objects.order_by(‘job_id’, ‘favourite_food’)
And now let’s order stuff…
ORDER BY LIMIT
owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name
FROM owl ORDER BY owl.job_id, owl.favourite_food LIMIT 10;
QUERY PLAN
---------------------------------------------------------------
-----------------------------------------------------
Limit (cost…) (actual time…)
-> Sort (cost…) (actual time…)
Sort Key: name
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on owl (cost=0.00..180.01 rows=10001
width=16) (actual time=0.032..5.856 rows=10002 loops=1)
Planning time: 0.201 ms
Execution time: 15.846 ms
(7 rows)
Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller?
Owl.objects.order_by(‘job_id’, ‘favourite_food’)[0:10]
Top-N heap sort
- A heap (sort of tree) is used with a limited size
- For each row
- If heap not full: add row in heap
- Else
- If value smaller than current values (for ASC): insert row
in heap, pop last
- Else pass
Top-N heap sort
Data to order with a LIMIT 10 Iterations 1.. 2.. 3
Iteration 10
Top-N heap sort
Example
Iteration 11: Post Office, nothing to do
Iteration 12: Ahmann in smaller than other values
Inserted in tree
Potter removed
And now let’s order stuff…
With an index
owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY
owl.job_id, owl.favourite_food;
QUERY PLAN
-------------------------------------------------------------
Index Scan using owl_job_id_favourite_food on owl
(cost=0.29..544.66 rows=10001 width=35) (actual
time=0.016..2.835 rows=10001 loops=1)
Planning time: 0.098 ms
Execution time: 3.510 ms
(3 rows)
Simply uses index order
Be careful when you ORDER BY !
1. Sorting with sort key without limit or index can be
heavy in term of memory !
2. You might need an index, only EXPLAIN will tell
you
Conclusion
Conclusion
- Looking at your DB logs could help you build a website
with good performance
- Always know where your queries come from
- Careful about loops ! Use prefetch_related and
select_related
- If you have a slow query, using EXPLAIN will help you find
a solution
Thank you for your attention !
Any questions?
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
To go further - sources
Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
https://momjian.us/main/writings/pgsql/optimizer.pdf
https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations
http://tech.novapost.fr/postgresql-application_name-django-settings.html

Contenu connexe

Tendances

Super heroes training_simulator
Super heroes training_simulatorSuper heroes training_simulator
Super heroes training_simulator
joustin12
 
Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)
Damien Seguy
 
OO Perl with Moose
OO Perl with MooseOO Perl with Moose
OO Perl with Moose
Nelo Onyiah
 

Tendances (20)

Learn python - for beginners - part-2
Learn python - for beginners - part-2Learn python - for beginners - part-2
Learn python - for beginners - part-2
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
 
java 8 Hands on Workshop
java 8 Hands on Workshopjava 8 Hands on Workshop
java 8 Hands on Workshop
 
JavaScript Design Patterns
JavaScript Design PatternsJavaScript Design Patterns
JavaScript Design Patterns
 
ES6 and BEYOND
ES6 and BEYONDES6 and BEYOND
ES6 and BEYOND
 
Intro to Python
Intro to PythonIntro to Python
Intro to Python
 
Python Part 2
Python Part 2Python Part 2
Python Part 2
 
Python Part 1
Python Part 1Python Part 1
Python Part 1
 
Test du futur avec Spock
Test du futur avec SpockTest du futur avec Spock
Test du futur avec Spock
 
Super heroes training_simulator
Super heroes training_simulatorSuper heroes training_simulator
Super heroes training_simulator
 
Django in the Office: Get Your Admin for Nothing and Your SQL for Free
Django in the Office: Get Your Admin for Nothing and Your SQL for FreeDjango in the Office: Get Your Admin for Nothing and Your SQL for Free
Django in the Office: Get Your Admin for Nothing and Your SQL for Free
 
ES6
ES6ES6
ES6
 
Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)Php Code Audits (PHP UK 2010)
Php Code Audits (PHP UK 2010)
 
Moose talk at FOSDEM 2011 (Perl devroom)
Moose talk at FOSDEM 2011 (Perl devroom)Moose talk at FOSDEM 2011 (Perl devroom)
Moose talk at FOSDEM 2011 (Perl devroom)
 
A Few of My Favorite (Python) Things
A Few of My Favorite (Python) ThingsA Few of My Favorite (Python) Things
A Few of My Favorite (Python) Things
 
Solving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with RailsSolving the Riddle of Search: Using Sphinx with Rails
Solving the Riddle of Search: Using Sphinx with Rails
 
Intro
IntroIntro
Intro
 
Active Support Core Extensions (1)
Active Support Core Extensions (1)Active Support Core Extensions (1)
Active Support Core Extensions (1)
 
OO Perl with Moose
OO Perl with MooseOO Perl with Moose
OO Perl with Moose
 
Introduction To Moose
Introduction To MooseIntroduction To Moose
Introduction To Moose
 

Similaire à The amazing world behind your ORM

Similaire à The amazing world behind your ORM (20)

Becoming a better developer with EXPLAIN
Becoming a better developer with EXPLAINBecoming a better developer with EXPLAIN
Becoming a better developer with EXPLAIN
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
The well tempered search application
The well tempered search applicationThe well tempered search application
The well tempered search application
 
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya MorimotoSQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
SQL Injection 101 : It is not just about ' or '1'='1 - Pichaya Morimoto
 
Extreme Swift
Extreme SwiftExtreme Swift
Extreme Swift
 
Practical Celery
Practical CeleryPractical Celery
Practical Celery
 
Swift, functional programming, and the future of Objective-C
Swift, functional programming, and the future of Objective-CSwift, functional programming, and the future of Objective-C
Swift, functional programming, and the future of Objective-C
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Tulip
TulipTulip
Tulip
 
Scala ntnu
Scala ntnuScala ntnu
Scala ntnu
 
Elixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental ConceptsElixir in a nutshell - Fundamental Concepts
Elixir in a nutshell - Fundamental Concepts
 
“Insulin” for Scala’s Syntactic Diabetes
“Insulin” for Scala’s Syntactic Diabetes“Insulin” for Scala’s Syntactic Diabetes
“Insulin” for Scala’s Syntactic Diabetes
 
How to create a high performance excel engine in java script
How to create a high performance excel engine in java scriptHow to create a high performance excel engine in java script
How to create a high performance excel engine in java script
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Spock: Test Well and Prosper
Spock: Test Well and ProsperSpock: Test Well and Prosper
Spock: Test Well and Prosper
 
Hive - ORIEN IT
Hive - ORIEN ITHive - ORIEN IT
Hive - ORIEN IT
 
Sub query_SQL
Sub query_SQLSub query_SQL
Sub query_SQL
 
Awesomeness of JavaScript…almost
Awesomeness of JavaScript…almostAwesomeness of JavaScript…almost
Awesomeness of JavaScript…almost
 
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle TextFind Anything In Your APEX App - Fuzzy Search with Oracle Text
Find Anything In Your APEX App - Fuzzy Search with Oracle Text
 
7li7w devcon5
7li7w devcon57li7w devcon5
7li7w devcon5
 

Plus de Louise Grandjonc

Plus de Louise Grandjonc (6)

Postgres index types
Postgres index typesPostgres index types
Postgres index types
 
Amazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't doAmazing SQL your django ORM can or can't do
Amazing SQL your django ORM can or can't do
 
Croco talk pgconfeu
Croco talk pgconfeuCroco talk pgconfeu
Croco talk pgconfeu
 
Indexes in postgres
Indexes in postgresIndexes in postgres
Indexes in postgres
 
Pg exercices
Pg exercicesPg exercices
Pg exercices
 
Meetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PGMeetup pg recherche fulltext ES -> PG
Meetup pg recherche fulltext ES -> PG
 

Dernier

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

The amazing world behind your ORM

  • 1. The amazing world behind your ORM Louise Grandjonc
  • 2. Louise Grandjonc (louise@ulule.com) Lead developer at Ulule (www.ulule.com) Django developer - Postgres enthusiast @louisemeta on twitter About me
  • 3. 1. How do we end up with performance problems? 2. How can we catch the performance problems without having to guess? 3. What does it change in our everyday developer job? Today’s agenda
  • 4. How do we end up with performance problems?
  • 5. 1.The ORM executes queries that you might not expect 2.Your queries might not be optimised and you won’t know about it Why we should know what our ORM is doing
  • 6. How can we catch the performance problems (without having to guess)?
  • 7. How can I see what is happening when I do stuff? 1. Django debug toolbar (to see queries and their explain in your django view) Advantages: can be easily included in your django templates Problems: Does not allow you to see everything (ajax calls !), if you’re working on an API, you cannot use it! 2. Django devserver : puts all the logs of your database into your runserver output Advantages: you’re not missing the ajax calls 3. Simply look at your database logs Advantages: you can see everything, you won’t be disturbed if you ever change project/programming languages/framework/computer, you can configure how you see your logs Problems: you don’t know where your logs are?
  • 8. Where are my logs? owl_conference=# show log_directory ; log_directory --------------- pg_log owl_conference=# show data_directory ; data_directory ------------------------- /usr/local/var/postgres owl_conference=# show log_filename ; log_filename ------------------------- postgresql-%Y-%m-%d.log Terminal command $ psql -U user -d your_database_name psql interface
  • 9. Having good looking logs (and logging everything like a crazy owl) owl_conference=# SHOW config_file; config_file ----------------------------------------- /usr/local/var/postgres/postgresql.conf (1 row) In your postgresql.conf log_filename = 'postgresql-%Y-%m-%d.log' log_statement = 'all' logging_collector = on log_min_duration_statement = 0
  • 10. I’ve seen my logs… But … Where are these queries executed in my code? Let’s take an example… I have an owl DB with two tables. 10 000 owls 7 jobs
  • 11. Example Query executed in Template def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  • 12. Example Query executed in View def index(request): owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: # Do something context = {‘owls': owls} return render(request, 'owls/index.html', context) SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' {% for owl in owls %} <p> {{ owl.name }} </p> {% end for %}
  • 13. Yep ! I’ve seen my logs… But … Where are these queries executed in my code? How to spot where your query is executed? 1. Each model has a table to store data. Find the model. 2. Where in my view, or in my form am I using this model to get/filter objects? 3. Where am I using this objects? Is it in my view/form? Passed into the context and used in templates?
  • 14. What does it change in our everyday developer job? (Or how to really do something when you have a problem)
  • 15. The two most common problems of any developer… 1. I have way too many queries… Why ? 2. One of my queries is freakin' slow… Why?
  • 16. Once upon a time… 1000 times The danger of loops in your code, and how your templates are making fun of you… 1. Use your context ! 2. Preload stuff in the query! • prefetch_related() - ManyToMany or ForeignKey • select_related () - ForeignKey
  • 17. Once upon a time… 1000 times select_related or prefetch_related? In django, select_related and prefetch_related will help you lower your amount of query by preloading the foreign keys or many-to-many. 1. select_related uses a join (only for foreign keys): - Advantages: only one request - Problem: if you are joining big tables, with a lot of columns and no index, it can be slow… We’ll talk about that next. 2. prefetch_related does a second request on your join table (for foreign keys and many-to-many - Advantages: no big join - Problem: more queries
  • 18. Example … owls = Owl.objects.filter(employer_name=‘Ulule’) for owl in owls: print(owl.job) # 1 query per loop owls = Owl.objects .filter(employer_name=‘Ulule’) .select_related(‘job’) for owl in owls: print(owl.job) # no extra queries
  • 19. Example … Using select_related/prefetch_related Owl.objects.filter(employer_name=‘Ulule’) .select_related(‘job’) SELECT … FROM "owl" LEFT OUTER JOIN "job" ON ("owl"."job_id" = "job"."id") WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "owl" WHERE "owl"."employer_name" = 'Ulule' SELECT … FROM "job" WHERE "job"."id" IN (2) Owl.objects.filter(employer_name=‘Ulule’) .prefetch_related(‘job’)
  • 20. One of my query is super slow… Let’s talk about EXPLAIN !
  • 21. What is EXPLAIN Gives you the execution plan chosen by the query planner that your database will use to execute your SQL statement Using ANALYZE will actually execute your query! (Don’t worry, you can ROLLBACK) EXPLAIN (ANALYZE) my super query; BEGIN; EXPLAIN ANALYZE my super query; ROLLBACK;
  • 22. Mmmm… Query planner? The magical thing that generates execution plans for a query and calculates what is the cost of each plan. The best one is used to execute your query
  • 23. So, what does it took like ? Let’s take a slow query… Owl.objects.filter(employer_name=‘Ulule’) SELECT "owl"."id", "owl"."name", "owl"."employer_name", "owl"."favourite_food", "owl"."job_id", "owl"."fur_color" FROM "owl" WHERE "owl"."employer_name" = 'Ulule'
  • 24. And… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl WHERE employer_name=‘Ulule' QUERY PLAN ------------------------------------ Seq Scan on owl (cost=0.00..205.01 rows=1 width=35) (actual time=1.945..1.946 rows=1 loops=1) Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 Planning time: 0.080 ms Execution time: 1.965 ms (5 rows)
  • 25. Let’s go step by step ! .. 1 Costs (cost=0.00..205.01 rows=1 width=35) Cost of retrieving all rows Number of rows returned Cost of retrieving first row Average width of a row (in bytes) (actual time=1.945..1.946 rows=1 loops=1) If you use ANALYZE Number of time your seq scan (index scan etc.) was executed
  • 26. Let’s go step by step ! .. 2 Seq Scan Seq Scan on owl ... Filter: ((employer_name)::text = 'Ulule'::text) Rows Removed by Filter: 10000 - Scan the entire database. - Retrieves the rows matching your WHERE. It can be expensive ! Do you need an index? So… Is it why my query is slow?
  • 27. What is an index then? In encyclopaedia, if you want every page where you can find the word « Owl », you don’t read the entire book, you go to the index ! A database index contains the column value and pointers to each row that has this value.
  • 28. Let’s go step by step ! .. 3 Index scan QUERY PLAN ------------------------------------------------- Index Scan using employer_name_owl on owl … Index Cond: ((employer_name)::text = 'Ulule'::text) Planning time: 0.387 ms Execution time: 0.066 ms (4 rows) What if there is an index on the « employer_name » column? The index is visited row by row in order to retrieve the data corresponding to your clause.
  • 29. Let’s go step by step ! .. 4 owl_conference=# EXPLAIN SELECT * FROM "owl" WHERE "owl"."employer_name" = 'post office’; QUERY PLAN ------------------------------------------------- Seq Scan on owl … Filter: ((employer_name)::text = 'post office'::text) With an index and a really common value ! 7000 owls work at the post office Owl.objects.filter(employer_name=‘post office’)
  • 30. Let’s go step by step ! .. 4 Why is it using an seq scan? An index scan uses the order of the index, the head has to move between rows. Moving the read head of the database is 1000 times slower than reading the next physical block. Conclusion: For common values it’s quicker to read all data from the table in physical order By the way… Retrieving 7000 rows might not be a great idea :).
  • 31. Let’s go step by step ! .. 5 Bitmap Heap Scan owl_conference=# EXPLAIN SELECT * FROM owl WHERE owl.employer_name = ‘Hogwarts’; QUERY PLAN ------------------------------------------------- Bitmap Heap Scan on owl … Recheck Cond: ((employer_name)::text = 'Hogwarts'::text) -> Bitmap Index Scan on employer_name_owl (cost=0.00..47.28 rows=2000 width=0) Index Cond: ((employer_name)::text = 'Hogwarts'::text) With an index and a common value 2000 owls work at Hogwarts Owl.objects.filter(employer_name=‘Hogwarts’)
  • 32. Let’s go step by step ! ..4 Bitmap Heap Scan… Index scan : goes through your index tuple-pointer one at a time and reads the data from the pages. Uses the index order. Bitmap Heap Scan: orders the tuple-pointer in physical memory order and go through it. Avoids little «physical jumps » between pages.
  • 33. So we have 3 types of scan 1. Sequential scan 2. Index scan 3. Bitmap heap scan And now let’s join stuff !
  • 34. And now let’s join stuff… Nested loops owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id=1; QUERY PLAN ------------------------------------------------------------- Nested Loop … -> Seq Scan on job … Rows Removed by Filter: 6 -> Seq Scan on owl … Filter: (job_id = 1) Rows Removed by Filter: 1000 Planning time: 0.150 ms Execution time: 3.663 ms (9 rows) Owl.objects.filter(job_id=1).select_related(‘job’)
  • 35. And now let’s join stuff… Nested loops Used for little tables, can be slow because it is doing two nested « for » loops ! This image does not match the previous query ;)
  • 36. And now let’s join stuff… Hash Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.job_id) WHERE job.id>1; QUERY PLAN ------------------------------------------------------------- Hash Join … Hash Cond: (owl.job_id = job.id) -> Seq Scan on owl (cost=blabla( -> Hash (cost=blabla) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Seq Scan on job (cost=blabla) Filter: (id > 1) Rows Removed by Filter: 1 Planning time: 0.235 ms (10 rows) Owl.objects.filter(job_id__gte=1).select_related(‘job’)
  • 37. And now let’s join stuff… Hash Join Used for smaller tables, because the hash table has to fit in memory
  • 38. And now let’s join stuff… Merge Join owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl JOIN job ON (job.id = owl.id); QUERY PLAN ------------------------------------------------------------- Merge Join … Merge Cond: (owl.id = job.id) -> Index Scan using owl_pkey on owl -> Sort … Sort Key: job.id Sort Method: quicksort Memory: 25kB -> Seq Scan on job … Planning time: 0.453 ms Execution time: 0.102 ms (10 rows) Owl.objects.all().select_related(‘job’)
  • 39. And now let’s join stuff… Merge Join Used for big tables, an index can be used to avoid sorting
  • 40. So we have 3 types of joins 1. Nested loop 2. Hash join 3. Merge join And a last word about ORDER BY (last part, I swear !)
  • 41. And now let’s order stuff… owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Sort … Sort Key: job_id, favourite_food Sort Method: quicksort Memory: 1166kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=35) (actual time=0.017..1.181 rows=10001 loops=1) Planning time: 0.142 ms Execution time: 8.665 ms (6 rows) Everything is sorted into the memory (which is why it can be costly in terms of memory) Owl.objects.order_by(‘job_id’, ‘favourite_food’)
  • 42. And now let’s order stuff… ORDER BY LIMIT owl_conference=# EXPLAIN ANALYZE SELECT name, employer_name FROM owl ORDER BY owl.job_id, owl.favourite_food LIMIT 10; QUERY PLAN --------------------------------------------------------------- ----------------------------------------------------- Limit (cost…) (actual time…) -> Sort (cost…) (actual time…) Sort Key: name Sort Method: top-N heapsort Memory: 25kB -> Seq Scan on owl (cost=0.00..180.01 rows=10001 width=16) (actual time=0.032..5.856 rows=10002 loops=1) Planning time: 0.201 ms Execution time: 15.846 ms (7 rows) Like with quicksort, all the data has to be sorted… Why is the memory taken so much smaller? Owl.objects.order_by(‘job_id’, ‘favourite_food’)[0:10]
  • 43. Top-N heap sort - A heap (sort of tree) is used with a limited size - For each row - If heap not full: add row in heap - Else - If value smaller than current values (for ASC): insert row in heap, pop last - Else pass
  • 44. Top-N heap sort Data to order with a LIMIT 10 Iterations 1.. 2.. 3 Iteration 10
  • 45. Top-N heap sort Example Iteration 11: Post Office, nothing to do Iteration 12: Ahmann in smaller than other values Inserted in tree Potter removed
  • 46. And now let’s order stuff… With an index owl_conference=# EXPLAIN ANALYZE SELECT * FROM owl ORDER BY owl.job_id, owl.favourite_food; QUERY PLAN ------------------------------------------------------------- Index Scan using owl_job_id_favourite_food on owl (cost=0.29..544.66 rows=10001 width=35) (actual time=0.016..2.835 rows=10001 loops=1) Planning time: 0.098 ms Execution time: 3.510 ms (3 rows) Simply uses index order
  • 47. Be careful when you ORDER BY ! 1. Sorting with sort key without limit or index can be heavy in term of memory ! 2. You might need an index, only EXPLAIN will tell you
  • 49. Conclusion - Looking at your DB logs could help you build a website with good performance - Always know where your queries come from - Careful about loops ! Use prefetch_related and select_related - If you have a slow query, using EXPLAIN will help you find a solution
  • 50. Thank you for your attention ! Any questions? Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/)
  • 51. To go further - sources Owly design: zimmoriarty (https://www.instagram.com/zimmoriarty/) https://momjian.us/main/writings/pgsql/optimizer.pdf https://use-the-index-luke.com/sql/plans-dexecution/postgresql/operations http://tech.novapost.fr/postgresql-application_name-django-settings.html