SlideShare a Scribd company logo
1 of 31
Download to read offline
PostgreSQL: Advanced features
         in practice

          JÁN SUCHAL
           22.11.2011
          @RUBYSLAVA
Why PostgreSQL?

 The world’s most advanced open source database.
 Features!
   Transactional DDL

   Cost-based query optimizer + Graphical explain

   Partial indexes

   Function indexes

   K-nearest search

   Views

   Recursive Queries

   Window Functions
Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration
  def change
    create_table :posts do |t|
      t.string :name, null: false
      t.text :body, null: false
      t.references :author, null: false
      t.timestamps null: false
    end

    add_index :posts, :title, unique: true
  end
end

 Where is the problem?
Transactional DDL

class CreatePostsMigration < ActiveRecord::Migration
  def change
    create_table :posts do |t|
      t.string :name, null: false
                             Column title does not exist!
      t.text :body, null: false is created, index is not. Oops!
                             Table
      t.references :author, null: false
                             Transactional DDL FTW!
      t.timestamps null: false
    end

    add_index :posts, :title, unique: true
  end
end

 Where is the problem?
Cost-based query optimizer

 What is the best plan to execute a given query?
 Cost = I/O + CPU operations needed
 Sequential vs. random seek
 Join order
 Join type (nested loop, hash join, merge join)
Graphical EXPLAIN

 pgAdmin (www.pgadmin.org)
Partial indexes

 Conditional indexes
 Problem: Async job/queue table, find failed jobs
   Create index on failed_at column

   99% of index is never used
Partial indexes

 Conditional indexes
 Problem: Async job/queue table, find failed jobs
   Create index on failed_at column

   99% of index is never used



 Solution:
CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at)
  WHERE failed_at IS NOT NULL;
    smaller index
    faster updates
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’

 “Solution”:
   Add reverse_code column, populate, add triggers for updates,
    create index on reverse_code column
   reverse queries WHERE reverse_code LIKE “321%”
Function Indexes

 Problem: Suffix search
   SELECT … WHERE code LIKE ‘%123’

 “Solution”:
   Add reverse_code column, populate, add triggers for updates,
    create index on reverse_code column,
   reverse queries WHERE reverse_code LIKE “321%”



 PostgreSQL solution:
  CREATE INDEX idx_reversed ON projects
  (reverse((code)::text) text_pattern_ops);
  SELECT … WHERE reverse(code) LIKE
  reverse(‘%123’)
K-nearest search

 Problem: Fuzzy string matching
   900K rows




 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
K-nearest search

 Problem: Fuzzy string matching
   900K rows



 Solution: Ngram/Trigram search
   johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”}

 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
K-nearest search

 Problem: Fuzzy string matching
   900K rows



 Solution: Ngram/Trigram search
   johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”}

 CREATE INDEX idx_trgm_name ON subjects USING gist (name
 gist_trgm_ops);

 SELECT name, name <-> 'Michl Brla' AS dist
   FROM subjects ORDER BY dist ASC LIMIT 10; (312ms)

 "Michal Barla“   ;   0.588235
 "Michal Bula“    ;   0.647059
 "Michal Broz“    ;   0.647059
 "Pavel Michl“    ;   0.647059
 "Michal Brna“    ;   0.647059
Views

 Constraints propagated down to views

CREATE VIEW edges AS
  SELECT subject_id AS source_id,
    connected_subject_id AS target_id FROM raw_connections
  UNION ALL
  SELECT connected_subject_id AS source_id,
    subject_id AS target_id FROM raw_connections;

 SELECT * FROM edges WHERE source_id = 123;
 SELECT * FROM edges WHERE source_id < 500 ORDER BY
  source_id LIMIT 10
  No materialization, 2x indexed select + 1x append/merge
Views

 Constraints propagated down to views

CREATE VIEW edges AS
  SELECT subject_id AS source_id,
    connected_subject_id AS target_id FROM raw_connections
  UNION ALL
  SELECT connected_subject_id AS source_id,
    subject_id AS target_id FROM raw_connections;

 SELECT * FROM edges WHERE source_id = 123;
 SELECT * FROM edges WHERE source_id < 500 ORDER BY
  source_id LIMIT 10
     No materialization, 2x indexed select + 1x append/merge
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph LIMIT 100
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph LIMIT 100
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive Queries

 Problem: Find paths between two nodes in graph

WITH RECURSIVE search_graph(source,target,distance,path) AS
(
  SELECT source_id, target_id, 1,
    ARRAY[source_id, target_id]
  FROM edges WHERE source_id = 552506
  UNION ALL
  SELECT sg.source, e.target_id, sg.distance + 1,
    path || ARRAY[e.target_id]
  FROM search_graph sg
    JOIN edges e ON sg.target = e.source_id
    WHERE NOT e.target_id = ANY(path) AND distance < 4
)
SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
Recursive queries
Recursive queries

 Graph with ~1M edges (61ms)
 source; target; distance; path
 530556; 552506; 2; {530556,185423,552506}
   JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján
    Počiatek

 530556; 552506; 2; {530556,183291,552506}
   JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján
    Počiatek

 530556; 552506; 4;
 {530556,183291,552522,185423,552506}
    JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena
     Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
  Order by sum of path scores
  Path score = 0.9^<distance> / log(1 + <number of paths>)

SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance,target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance,target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS n
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS score
 FROM ( … ) AS paths
) as scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 “Aggregate functions without grouping”
   avg, count, sum, rank, row_number, ntile…

 Problem: Find closest nodes to a given node
   Order by sum of path scores
   Path score = 0.9^<distance> / log(1 + <number of paths>)


SELECT source, target FROM (
 SELECT source, target, path, distance,
  0.9 ^ distance / log(1 +
   COUNT(*) OVER (PARTITION BY distance, target)
  ) AS score
 FROM ( … ) AS paths
) AS scored_paths
GROUP BY source, target ORDER BY SUM(score) DESC
Window functions

 Example: Closest to Róbert Kaliňák
  "Bussines Park Bratislava a.s."
  "JARABINY a.s."
  "Ing. Robert Pintér"
  "Ing. Ján Počiatek"
  "Bratislava trade center a.s.“
  …
 1M edges, 41ms
Additional resources

 www.postgresql.org
   Read the docs, seriously

 www.explainextended.com
   SQL guru blog

 explain.depesz.com
   First aid for slow queries

 www.wikivs.com/wiki/MySQL_vs_PostgreSQL
   MySQL vs. PostgreSQL comparison
Real World Explain

 www.postgresql.org

More Related Content

What's hot

Python tutorial
Python tutorialPython tutorial
Python tutorial
Rajiv Risi
 

What's hot (20)

Postgres rules
Postgres rulesPostgres rules
Postgres rules
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
A tour of Python
A tour of PythonA tour of Python
A tour of Python
 
The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181The Ring programming language version 1.5.2 book - Part 44 of 181
The Ring programming language version 1.5.2 book - Part 44 of 181
 
Unsung Heroes of PHP
Unsung Heroes of PHPUnsung Heroes of PHP
Unsung Heroes of PHP
 
Postgresql 9.3 overview
Postgresql 9.3 overviewPostgresql 9.3 overview
Postgresql 9.3 overview
 
Implementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with SlickImplementing a many-to-many Relationship with Slick
Implementing a many-to-many Relationship with Slick
 
Green dao
Green daoGreen dao
Green dao
 
Xm lparsers
Xm lparsersXm lparsers
Xm lparsers
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
Getting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commitsGetting started with R when analysing GitHub commits
Getting started with R when analysing GitHub commits
 
Graph Database Query Languages
Graph Database Query LanguagesGraph Database Query Languages
Graph Database Query Languages
 
Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?Why async and functional programming in PHP7 suck and how to get overr it?
Why async and functional programming in PHP7 suck and how to get overr it?
 
Scala best practices
Scala best practicesScala best practices
Scala best practices
 
The Ring programming language version 1.5.3 book - Part 30 of 184
The Ring programming language version 1.5.3 book - Part 30 of 184The Ring programming language version 1.5.3 book - Part 30 of 184
The Ring programming language version 1.5.3 book - Part 30 of 184
 
Pytables
PytablesPytables
Pytables
 
JDK 8
JDK 8JDK 8
JDK 8
 
The Ring programming language version 1.8 book - Part 35 of 202
The Ring programming language version 1.8 book - Part 35 of 202The Ring programming language version 1.8 book - Part 35 of 202
The Ring programming language version 1.8 book - Part 35 of 202
 
WorkingWithSlick2.1.0
WorkingWithSlick2.1.0WorkingWithSlick2.1.0
WorkingWithSlick2.1.0
 
Programming Haskell Chapter8
Programming Haskell Chapter8Programming Haskell Chapter8
Programming Haskell Chapter8
 

Viewers also liked

Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Ontico
 
PrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw CopyPrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw Copy
Amber Hollingsworth
 

Viewers also liked (20)

Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 
PostgreSQL Advanced Queries
PostgreSQL Advanced QueriesPostgreSQL Advanced Queries
PostgreSQL Advanced Queries
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
Сравнение форматов и библиотек сериализации / Антон Рыжов (Qrator Labs)
 
Streaming replication in practice
Streaming replication in practiceStreaming replication in practice
Streaming replication in practice
 
Freedom! Employee Empowerment the $2000 Way
Freedom! Employee Empowerment the $2000 WayFreedom! Employee Empowerment the $2000 Way
Freedom! Employee Empowerment the $2000 Way
 
Mission: Launch a Digital Workplace
Mission: Launch a Digital Workplace Mission: Launch a Digital Workplace
Mission: Launch a Digital Workplace
 
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
Et si la RH partagée devenait une nouvelle spécialité bretonne ?Et si la RH partagée devenait une nouvelle spécialité bretonne ?
Et si la RH partagée devenait une nouvelle spécialité bretonne ?
 
PrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw CopyPrescriptionPillsToHeroine_hw Copy
PrescriptionPillsToHeroine_hw Copy
 
Goed jaar voor firma Kevin Pauwels
Goed jaar voor firma Kevin PauwelsGoed jaar voor firma Kevin Pauwels
Goed jaar voor firma Kevin Pauwels
 
TRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI and Trustless Computing ConsortiumTRUSTLESS.AI and Trustless Computing Consortium
TRUSTLESS.AI and Trustless Computing Consortium
 
Mesa job _fair_flyer
Mesa job _fair_flyerMesa job _fair_flyer
Mesa job _fair_flyer
 
Los Desafíos de la educación a distancia
Los Desafíos de la educación a distanciaLos Desafíos de la educación a distancia
Los Desafíos de la educación a distancia
 
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
The adoption and impact of OEP and OER in the Global South: Theoretical, conc...
 
Game Studio Leadership: You Can Do It
Game Studio Leadership: You Can Do ItGame Studio Leadership: You Can Do It
Game Studio Leadership: You Can Do It
 
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
MOOC Aspects juridiques de la création d'entreprises innovantes -  attestationMOOC Aspects juridiques de la création d'entreprises innovantes -  attestation
MOOC Aspects juridiques de la création d'entreprises innovantes - attestation
 
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HEChoosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
Choosing Open (#OEGlobal) - Openness and praxis: Using OEP in HE
 
Marketing Week Live 2017
Marketing Week Live 2017Marketing Week Live 2017
Marketing Week Live 2017
 
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
コードクローン研究 ふりかえり ~ストロング・スタイルで行こう~
 
はじめての CircleCI
はじめての CircleCIはじめての CircleCI
はじめての CircleCI
 

Similar to PostgreSQL: Advanced features in practice

PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
Jerome Eteve
 

Similar to PostgreSQL: Advanced features in practice (20)

GreenDao Introduction
GreenDao IntroductionGreenDao Introduction
GreenDao Introduction
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
 
PerlApp2Postgresql (2)
PerlApp2Postgresql (2)PerlApp2Postgresql (2)
PerlApp2Postgresql (2)
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
 
MongoDB
MongoDB MongoDB
MongoDB
 
MySQL Indexes
MySQL IndexesMySQL Indexes
MySQL Indexes
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
Mapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQLMapping Graph Queries to PostgreSQL
Mapping Graph Queries to PostgreSQL
 
Beyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the codeBeyond PHP - It's not (just) about the code
Beyond PHP - It's not (just) about the code
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
Less08 Schema
Less08 SchemaLess08 Schema
Less08 Schema
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
 
Lecture 3.pdf
Lecture 3.pdfLecture 3.pdf
Lecture 3.pdf
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
The Ring programming language version 1.10 book - Part 47 of 212
The Ring programming language version 1.10 book - Part 47 of 212The Ring programming language version 1.10 book - Part 47 of 212
The Ring programming language version 1.10 book - Part 47 of 212
 

More from Jano Suchal

Rank all the (geo) things!
Rank all the (geo) things!Rank all the (geo) things!
Rank all the (geo) things!
Jano Suchal
 
Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?
Jano Suchal
 
Bonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet WorkshopBonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet Workshop
Jano Suchal
 
Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?
Jano Suchal
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoringGarelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoring
Jano Suchal
 
Vojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenostiVojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenosti
Jano Suchal
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
Jano Suchal
 
Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?
Jano Suchal
 
Petr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.czPetr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.cz
Jano Suchal
 
Metaprogramovanie #1
Metaprogramovanie #1Metaprogramovanie #1
Metaprogramovanie #1
Jano Suchal
 

More from Jano Suchal (20)

Slovensko.Digital: Čo ďalej?
Slovensko.Digital: Čo ďalej?Slovensko.Digital: Čo ďalej?
Slovensko.Digital: Čo ďalej?
 
Datanest 3.0
Datanest 3.0Datanest 3.0
Datanest 3.0
 
Improving code quality
Improving code qualityImproving code quality
Improving code quality
 
Beyond search queries
Beyond search queriesBeyond search queries
Beyond search queries
 
Rank all the things!
Rank all the things!Rank all the things!
Rank all the things!
 
Rank all the (geo) things!
Rank all the (geo) things!Rank all the (geo) things!
Rank all the (geo) things!
 
Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?Ako si vybrať programovácí jazyk alebo framework?
Ako si vybrať programovácí jazyk alebo framework?
 
Bonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet WorkshopBonetics: Mastering Puppet Workshop
Bonetics: Mastering Puppet Workshop
 
Peter Mihalik: Puppet
Peter Mihalik: PuppetPeter Mihalik: Puppet
Peter Mihalik: Puppet
 
Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3Tomáš Čorej: Configuration management & CFEngine3
Tomáš Čorej: Configuration management & CFEngine3
 
Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?Ako si vybrať programovací jazyk a framework?
Ako si vybrať programovací jazyk a framework?
 
SQL: Query optimization in practice
SQL: Query optimization in practiceSQL: Query optimization in practice
SQL: Query optimization in practice
 
Garelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoringGarelic: Google Analytics as App Performance monitoring
Garelic: Google Analytics as App Performance monitoring
 
Miroslav Šimulčík: Temporálne databázy
Miroslav Šimulčík: Temporálne databázyMiroslav Šimulčík: Temporálne databázy
Miroslav Šimulčík: Temporálne databázy
 
Vojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenostiVojtech Rinik: Internship v USA - moje skúsenosti
Vojtech Rinik: Internship v USA - moje skúsenosti
 
Profiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applicationsProfiling and monitoring ruby & rails applications
Profiling and monitoring ruby & rails applications
 
Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?Aký programovací jazyk a framework si vybrať a prečo?
Aký programovací jazyk a framework si vybrať a prečo?
 
Čo po GAMČI?
Čo po GAMČI?Čo po GAMČI?
Čo po GAMČI?
 
Petr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.czPetr Joachim: Redis na Super.cz
Petr Joachim: Redis na Super.cz
 
Metaprogramovanie #1
Metaprogramovanie #1Metaprogramovanie #1
Metaprogramovanie #1
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

PostgreSQL: Advanced features in practice

  • 1. PostgreSQL: Advanced features in practice JÁN SUCHAL 22.11.2011 @RUBYSLAVA
  • 2. Why PostgreSQL?  The world’s most advanced open source database.  Features!  Transactional DDL  Cost-based query optimizer + Graphical explain  Partial indexes  Function indexes  K-nearest search  Views  Recursive Queries  Window Functions
  • 3. Transactional DDL class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false t.text :body, null: false t.references :author, null: false t.timestamps null: false end add_index :posts, :title, unique: true end end  Where is the problem?
  • 4. Transactional DDL class CreatePostsMigration < ActiveRecord::Migration def change create_table :posts do |t| t.string :name, null: false Column title does not exist! t.text :body, null: false is created, index is not. Oops! Table t.references :author, null: false Transactional DDL FTW! t.timestamps null: false end add_index :posts, :title, unique: true end end  Where is the problem?
  • 5. Cost-based query optimizer  What is the best plan to execute a given query?  Cost = I/O + CPU operations needed  Sequential vs. random seek  Join order  Join type (nested loop, hash join, merge join)
  • 6. Graphical EXPLAIN  pgAdmin (www.pgadmin.org)
  • 7. Partial indexes  Conditional indexes  Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used
  • 8. Partial indexes  Conditional indexes  Problem: Async job/queue table, find failed jobs  Create index on failed_at column  99% of index is never used  Solution: CREATE INDEX idx_dj_only_failed ON delayed_jobs (failed_at) WHERE failed_at IS NOT NULL;  smaller index  faster updates
  • 9. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’
  • 10. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’  “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column  reverse queries WHERE reverse_code LIKE “321%”
  • 11. Function Indexes  Problem: Suffix search  SELECT … WHERE code LIKE ‘%123’  “Solution”:  Add reverse_code column, populate, add triggers for updates, create index on reverse_code column,  reverse queries WHERE reverse_code LIKE “321%”  PostgreSQL solution: CREATE INDEX idx_reversed ON projects (reverse((code)::text) text_pattern_ops); SELECT … WHERE reverse(code) LIKE reverse(‘%123’)
  • 12. K-nearest search  Problem: Fuzzy string matching  900K rows CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 13. K-nearest search  Problem: Fuzzy string matching  900K rows  Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 14. K-nearest search  Problem: Fuzzy string matching  900K rows  Solution: Ngram/Trigram search  johno = {" j"," jo",”hno”,”joh”,"no ",”ohn”} CREATE INDEX idx_trgm_name ON subjects USING gist (name gist_trgm_ops); SELECT name, name <-> 'Michl Brla' AS dist FROM subjects ORDER BY dist ASC LIMIT 10; (312ms) "Michal Barla“ ; 0.588235 "Michal Bula“ ; 0.647059 "Michal Broz“ ; 0.647059 "Pavel Michl“ ; 0.647059 "Michal Brna“ ; 0.647059
  • 15. Views  Constraints propagated down to views CREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections;  SELECT * FROM edges WHERE source_id = 123;  SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10 No materialization, 2x indexed select + 1x append/merge
  • 16. Views  Constraints propagated down to views CREATE VIEW edges AS SELECT subject_id AS source_id, connected_subject_id AS target_id FROM raw_connections UNION ALL SELECT connected_subject_id AS source_id, subject_id AS target_id FROM raw_connections;  SELECT * FROM edges WHERE source_id = 123;  SELECT * FROM edges WHERE source_id < 500 ORDER BY source_id LIMIT 10  No materialization, 2x indexed select + 1x append/merge
  • 17. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph LIMIT 100
  • 18. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph LIMIT 100
  • 19. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 20. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 21. Recursive Queries  Problem: Find paths between two nodes in graph WITH RECURSIVE search_graph(source,target,distance,path) AS ( SELECT source_id, target_id, 1, ARRAY[source_id, target_id] FROM edges WHERE source_id = 552506 UNION ALL SELECT sg.source, e.target_id, sg.distance + 1, path || ARRAY[e.target_id] FROM search_graph sg JOIN edges e ON sg.target = e.source_id WHERE NOT e.target_id = ANY(path) AND distance < 4 ) SELECT * FROM search_graph WHERE target = 530556 LIMIT 100;
  • 23. Recursive queries  Graph with ~1M edges (61ms)  source; target; distance; path  530556; 552506; 2; {530556,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Ing. Ján Počiatek  530556; 552506; 2; {530556,183291,552506}  JUDr. Robert Kaliňák -> FoRest s.r.o. -> Ing. Ján Počiatek  530556; 552506; 4; {530556,183291,552522,185423,552506}  JUDr. Robert Kaliňák -> FoodRest s.r.o. -> Lena Sisková -> FoRest s.r.o. -> Ing. Ján Počiatek
  • 24. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node Order by sum of path scores Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 25. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance,target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 26. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS n FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 27. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths ) as scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 28. Window functions  “Aggregate functions without grouping”  avg, count, sum, rank, row_number, ntile…  Problem: Find closest nodes to a given node  Order by sum of path scores  Path score = 0.9^<distance> / log(1 + <number of paths>) SELECT source, target FROM ( SELECT source, target, path, distance, 0.9 ^ distance / log(1 + COUNT(*) OVER (PARTITION BY distance, target) ) AS score FROM ( … ) AS paths ) AS scored_paths GROUP BY source, target ORDER BY SUM(score) DESC
  • 29. Window functions  Example: Closest to Róbert Kaliňák "Bussines Park Bratislava a.s." "JARABINY a.s." "Ing. Robert Pintér" "Ing. Ján Počiatek" "Bratislava trade center a.s.“ …  1M edges, 41ms
  • 30. Additional resources  www.postgresql.org  Read the docs, seriously  www.explainextended.com  SQL guru blog  explain.depesz.com  First aid for slow queries  www.wikivs.com/wiki/MySQL_vs_PostgreSQL  MySQL vs. PostgreSQL comparison
  • 31. Real World Explain  www.postgresql.org