SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
JSONB in PostgreSQL
Working with JSON in PostgreSQL vs. MongoDB
Dharshan Rangegowda
Founder, ScaleGrid.io | @dharshanrg
What is JSON?
● JSON stands for Javascript object
notation.
● Open standard format RFC 7159.
● Most popular format to store and
exchange documents.
Working with JSON in PostgreSQL vs. MongoDB
Why does PostgreSQL need to care about JSON?
• Schema flexibility
• Dealing with transient or changing columns.
• Nested objects
• Might not need to deserialize to query.
• Handling objects from other systems
• E.g. Stripe transaction
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL + JSON Timeline
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL JSON Support
• Wave 1: PostgreSQL 9.2 (2012) added support for the JSON datatype
• Text field with JSON validation
• No index support
• Wave 2: PostgreSQL 9.4 (2014) added support for JSONB datatype
• Binary data structure to store JSON
• Index support
Working with JSON in PostgreSQL vs. MongoDB
PostgreSQL JSON Support
• Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard
• JSONPath support
• Powerful query and projection engine for JSON data
• Further improvements to JSONPath in PostgreSQL 13
• JSON roadmap
Working with JSON in PostgreSQL vs. MongoDB
JSON vs. JSONB
• JSONB is what you should be using (in most cases)
• However, there are some scenarios where JSON is useful:
• JSON preserves the original formatting (a.k.a whitespace)
• JSON preserves ordering of the keys
• JSON preserves duplicate keys
• JSON is faster to ingest vs. JSONB
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● What is the best way to use JSONB?
○ Do we even need columns any more?
○ Why not just use <int id, jsonb data>?
● JSONB has some high-level limitations you need to
be aware of:
○ Statistics collection
○ Storage bloat
● Commonly occurring fields should be stored as
columns.
○ Use JSONB for variable or intermittent columns.
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● PostgreSQL collect stats on column data distribution
○ Most common values (MCV)
○ Fraction of null values
○ Histogram of distribution
● No column statistics collected for JSONB
○ Query planner doesn’t have stats to make smart decisions
○ Could make wrong choice – cross join vs hash join etc
● More details in blog post - When To Avoid JSONB
In A PostgreSQL Schema
Working with JSON in PostgreSQL vs. MongoDB
JSONB Anti Patterns
● Storage bloat
○ Keys are stored in the data (Similar to MongoDB mmapv1)
○ Use smaller key names to reduce footprint
○ Relies on TOAST compression
○ Sample table with 1M rows (11GB of data)
○ PostgreSQL - 8.7 GB
○ MongoDB Snappy – 8GB, Zlib – 5.3 GB
Working with JSON in PostgreSQL vs. MongoDB
JSONB & TOAST
● If the size of your column exceeds the
TOAST_TUPLE_THRESHOLD (2KB default) data
could be moved to out of line storage - TOAST
● TOAST also provides compression (pglz)
○ Decent Compression
○ MongoDB WiredTiger snappy/zlib is potentially better
● To access the data it needs to be De’TOASTed
○ Could result in performance overhead
Working with JSON in PostgreSQL vs. MongoDB
JSONB Data Structures
Working with JSON in PostgreSQL vs. MongoDB
Images courtesy: https://erthalion.info/2017/12/21/advanced-json-benchmarks/
BSON Data Structures
Working with JSON in PostgreSQL vs. MongoDB
JSONB Operators
Working with JSON in PostgreSQL vs. MongoDB
Operator Description
->, ->> Get JSON object field by key
@>, <@ Does the left JSONB value contain the right JSONB path/value entries at
the top level?
?, ?!, ?& Does the string exist as a top-level key within the JSON value?
@@, @@> JSONPath operators
Full list of operators can be found in the docs – JSONB op table
JSONB Functions
• PostgreSQL provides a wide variety of functions to create and process
JSON data
• Creation functions
• Processing functions
Working with JSON in PostgreSQL vs. MongoDB
MongoDB Query language
• Query language based on JSON syntax
• db.books.find( {} ) , db.books.find( { publisher: "D" } )
• Array operators
• db.books.find( { tags: ["red", "blank"] } )
• AND and OR operators
• db.books.find( { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )
Working with JSON in PostgreSQL vs. MongoDB
MongoDB Query language
• Query nested documents
• db.books.find( { "size.uom": "in" } )
• Query an Array of objects
• db.books.find( { 'instock.qty': { $lte: 20 } } ))
• Project fields to return from query
• db.books.find( {prints: 1}, { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } )
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes
• JSONB provides a wide array of options to index your JSON data.
• We are going to dig into three types of indexes:
• GIN
• BTREE
• HASH
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes : GIN
• GIN stands for “Generalized
Inverted Indexes”
• GIN supports two operator classes
• jsonb_ops
• ?, ?|, ?&, @>, @@, @?
• [Index each key and value]
• jsonb_pathops
• @>, @@, @?
• [Index only the values]
Copyright © ScaleGrid.io
JSON sample data
Fictional book database (apologies to any librarians in the audience):
Working with JSON in PostgreSQL vs. MongoDB
demo=# d+ books
Table "public.books"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
--------+------------------------+-----------+----------+---------+----------+--------------+-------------
id | integer | | not null | | plain | |
author | character varying(255) | | | | extended | |
isbn | character varying(25) | | | | extended | |
rating | integer | | | | plain | |
data | jsonb | | | | extended | |
Indexes:
"books_pkey" PRIMARY KEY, btree (id)
…..
JSON sample data
Working with JSON in PostgreSQL vs. MongoDB
demo=# select jsonb_pretty(data) from books where id = 1000021;
jsonb_pretty
------------------------------------
{ +
"tags": { +
"nk906585": { +
"ik844766": "iv364087"+
} +
}, +
"prints": [ +
{ +
"price": 100, +
"style": "hc" +
}, +
{ +
"price": 50, +
"style": "pb" +
} +
], +
"braille": false, +
"keywords": [ +
"abc", +
"kef", +
"keh" +
], +
"hardcover": true, +
"publisher": "nVxJVA8Bwx", +
"criticrating": 2 +
}
JSONB Indexes: GIN - ?
Find all books that are available in braille? Let’s create the GIN index on the ‘data’ JSONB column:
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX datagin ON books USING gin (data);
demo=# select * from books where data ? 'braille';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
.....
demo=# explain analyze select * from books where data ? 'braille';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1)
Recheck Cond: (data ? 'braille'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1)
Index Cond: (data ? 'braille'::text)
Planning Time: 0.102 ms
Execution Time: 0.067 ms
(7 rows)
JSONB Indexes: GIN - ?
What if we wanted to find books that were in braille or in hardcover?
Working with JSON in PostgreSQL vs. MongoDB
demo=# explain analyze select * from books where data ?| array['braille','hardcover'];
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.029..0.035 rows=15 loops=1)
Recheck Cond: (data ?| '{braille,hardcover}'::text[])
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.023..0.023 rows=15 loops=1)
Index Cond: (data ?| '{braille,hardcover}'::text[])
Planning Time: 0.138 ms
Execution Time: 0.057 ms
(7 rows)
JSONB Indexes: GIN
GIN index supports the “existence” operators only on “top level” keys. If the key is not at the top level, then
the index will not be used.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data->'tags' ? 'nk455671';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1)
Filter: ((data -> 'tags'::text) ? 'nk455671'::text)
Rows Removed by Filter: 1000017
Planning Time: 0.078 ms
Execution Time: 270.728 ms
(5 rows)
JSONB Indexes: GIN
The way to check for existence in nested docs is to use “Expression indexes”. Let’s create an index on
data->tags:
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX datatagsgin ON books USING gin (data->'tags');
demo=# select * from books where data->'tags' ? 'nk455671';
id | author | isbn | rating | data
---------+-----------------+------------+--------+-----------------------------------------------------------------------------------------------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0}
(2 rows)
demo=# explain analyze select * from books where data->'tags' ? 'nk455671';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1007.75 rows=1000 width=158) (actual time=0.031..0.035 rows=2 loops=1)
Recheck Cond: ((data ->'tags'::text) ? 'nk455671'::text)
Heap Blocks: exact=2
-> Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1)
Index Cond: ((data ->'tags'::text) ? 'nk455671'::text)
Planning Time: 0.098 ms
Execution Time: 0.061 ms
(7 rows)
JSONB Indexes: GIN - @>
The “path” operator can be used for multi-level queries of your JSON data. Let’s use it similar to the ?
operator.
Working with JSON in PostgreSQL vs. MongoDB
select * from books where data @> '{"braille":true}'::jsonb;
demo=# explain analyze select * from books where data @> '{"braille":true}'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.040..0.048 rows=6 loops=1)
Recheck Cond: (data @> '{"braille": true}'::jsonb)
Rows Removed by Index Recheck: 9
Heap Blocks: exact=2
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.030..0.030 rows=15 loops=1)
Index Cond: (data @> '{"braille": true}'::jsonb)
Planning Time: 0.100 ms
Execution Time: 0.076 ms
(8 rows)
JSONB Indexes: GIN - @>
The "path" operator can be used for multi level queries of your JSON data.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.491..0.492 rows=1 loops=1)
Recheck Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.092..0.092 rows=1 loops=1)
Index Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb)
Planning Time: 0.090 ms
Execution Time: 0.523 ms
JSONB Indexes: GIN - @>
The JSON queries can be nested to many levels. You can also use the ># operation but GIN does not
support it.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
(1 row)
JSONB Indexes: GIN - jsonb_pathops
GIN also supports a “pathops” option to reduce the size of the GIN index.
From the docs:
“The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates
independent index items for each key and value in the data, while the latter creates index items only for
each value in the data.”
On my small dataset of 1M books, you can see that the pathops GIN index is smaller – you should test
with your dataset to understand the savings.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops);
public | dataginpathops | index | sgpostgres | books | 67 MB |
public | datatagsgin | index | sgpostgres | books | 84 MB |
JSONB Indexes: GIN - jsonb_pathops
Let’s rerun our query from before with the pathops index:
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
---------------------------------
------------------
1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false,
"publisher": "zSfZIAjGGs", "
criticrating": 4}
(1 row)
demo=# explain select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb;
QUERY PLAN
-----------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158)
Recheck Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
-> Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0)
Index Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb)
(4 rows)
JSONB Indexes: GIN - jsonb_pathops
The “jsonb_pathops” option supports only the @> operator.
Smaller index but more limited scenarios.
The following queries below can no longer leverage the GIN index:
Working with JSON in PostgreSQL vs. MongoDB
select * from books where data ? 'tags'; => Sequential scan
select * from books where data @> '{"tags" :{}}'; => Sequential scan
select * from books where data @> '{"tags" :{"k7888":{}}}' => Sequential scan
JSONB Indexes: B-tree
• B-tree indexes are the most common index type in relational databases.
• If you index an entire JSONB column with a B-tree index, the only useful
operators are the comparison operators:
• =, <, <=, >, >=
• Can be used only for whole object comparisons.
• Very limited use case.
Working with JSON in PostgreSQL vs. MongoDB
JSONB Indexes: B-tree
• Use B-tree “Expression indexes”
• B-tree expression indexes can support the common comparison operators '=', '<', '>', '>=', '<=‘ (which
GIN doesn't support).
• Retrieve all books with a data->criticrating > 4.
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data->'criticrating' > 4;
ERROR: operator does not exist: jsonb >= integer
LINE 1: select * from books where data->'criticrating’ > 4;
^
HINT: No operator matches the given name and argument types. You might need to add explicit type casts.
#Lets cast JSONB to integer
demo=# select * from books where (data->'criticrating')::int4 > 4;
#If you are using a version prior to pg11 you need to query as text and then cast
demo=# select * from books where (data->>'criticrating')::int4 > 4;
JSONB Indexes: B-tree
For expression indexes, the index needs to be an exact match with the query expression:
Working with JSON in PostgreSQL vs. MongoDB
demo=# CREATE INDEX criticrating ON books USING BTREE (((data->'criticrating')::int4));
CREATE INDEX
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)
demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1)
Index Cond: (((data -> 'criticrating'::text))::integer = 3)
Planning Time: 0.103 ms
Execution Time: 79.019 ms
(4 rows)
1
From above we can see that the BTREE index is being used as expected.
JSONB Indexes: HASH
• If you are only interested in the "=" operator, then Hash indexes become interesting.
• Hash indexes tend to be smaller than B-tree indexes.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX publisherhash ON books USING HASH ((data->'publisher'));
demo=# select * from books where data->'publisher' = 'XlekfkLOtL'
demo-# ;
id | author | isbn | rating | data
-----+-----------------+------------+--------+-------------------------------------------------------------------------------------
346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3}
(1 row)
demo=# explain analyze select * from books where data->'publisher' = 'XlekfkLOtL';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1)
Index Cond: ((data -> 'publisher'::text) = 'XlekfkLOtL'::text)
Planning Time: 0.080 ms
Execution Time: 0.035 ms
(4 rows)
JSONB Indexes: GIN - Trigram
• PostgreSQL supports string matching using Trigram indexes.
• Trigrams are basically words broken up into sequences of 3 letters.
• We can search for any arbitrary regex (not just left anchored).
Working with JSON in PostgreSQL vs. MongoDB
CREATE EXTENSION pg_trgm;
CREATE INDEX publisher ON books USING GIN ((data->'publisher') gin_trgm_ops);
demo=# select * from books where data->'publisher' LIKE '%I0UB%';
id | author | isbn | rating | data
----+-----------------+------------+--------+---------------------------------------------------------------------------------
4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm | 0 | {"tags": {"nk3": {"ik1": "iv1"}}, "publisher": "MI0UBqZJDt", "criticrating": 1}
(1 row)
demo=# explain analyze select * from books where data->'publisher' LIKE '%I0UB%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=9.78..111.28 rows=100 width=158) (actual time=0.033..0.033 rows=1 loops=1)
Recheck Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
Heap Blocks: exact=1
-> Bitmap Index Scan on publisher (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1)
Index Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text)
Planning Time: 0.213 ms
Execution Time: 0.058 ms
(7 rows)
JSONB Indexes: GIN - Arrays
• GIN indexes are great for indexing arrays.
• Indexing and searching the keyword array.
Working with JSON in PostgreSQL vs. MongoDB
CREATE INDEX keywords ON books USING GIN ((data->'keywords') jsonb_path_ops);
demo=# select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
--------------
1000003 | zEG406sLKQ2IU8O | viPdlu3DZm | 4 | {"tags": {"nk263020": {"ik203820": "iv817928"}}, "keywords": ["abc", "kef", "keh"], "publisher": "7NClevxuTM",
"criticrating": 2}
1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 | 4 | {"tags": {"nk780341": {"ik397357": "iv632731"}}, "keywords": ["abc", "kef", "keh"], "publisher": "fqaJuAdjP5",
"criticrating": 2}
(2 rows)
demo=# explain analyze select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=54.75..1049.75 rows=1000 width=158) (actual time=0.026..0.028 rows=2 loops=1)
Recheck Cond: ((data -> 'keywords'::text) @> '["abc", "keh"]'::jsonb)
Heap Blocks: exact=1
-> Bitmap Index Scan on keywords (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1)
Index Cond: ((data -> 'keywords'::text) @&amp;amp;amp;amp;amp;gt; '["abc", "keh"]'::jsonb)
Planning Time: 0.131 ms
Execution Time: 0.063 ms
(7 rows)
SQL/JSON
• SQL standard added support for JSON – SQL Standard-2016 (SQL/JSON).
• SQL/JSON Data model
• JSONPath
• SQL/JSON functions
• With PG12 release, PostgreSQL has one of the best implementations of
SQL/JSON.
Working with JSON in PostgreSQL vs. MongoDB
SQL/JSON 2016
● A sequence of SQL/JSON items, each item can be (recursively) any of:
○ SQL/JSON scalar — non-null value of SQL types: Unicode character string, numeric, Boolean
or datetime.
○ SQL/JSON null, value that is distinct from any value of any SQL type (not the same as NULL).
○ SQL/JSON arrays, ordered list of zero or more SQL/JSON items — SQL/JSON element
○ SQL/JSON objects — unordered collections of zero or more SQL/JSON members.
■ (key, SQL/JSON item)
Working with JSON in PostgreSQL vs. MongoDB
JSONPath
Working with JSON in PostgreSQL vs. MongoDB
.key Returns an object member with the specified key
[*] Wildcard array element accessor that returns all array elements
.* Wildcard member accessor that returns the values of all members located at the top level of
the current object
.** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the
current object and returns all the member values, regardless of their nesting level
JSONPath allows you to specify an expression (using a syntax similar to the
property access notation in Javascript) to query or project your JSON data.
SQL/JSON Functions
● PG 12 provides several functions to use JSONPATH to query your JSON
data
○ jsonb_path_exists - Checks whether JSON path returns any item for the
specified JSON value
○ jsonb_path_match - Returns the result of JSON path predicate check for
the specified JSON value.
○ jsonb_path_query - Gets all JSON items returned by JSON path for the
specified JSON value.
Working with JSON in PostgreSQL vs. MongoDB
JSONPath
Finding books by publisher?
Working with JSON in PostgreSQL vs. MongoDB
demo=# select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
id | author | isbn | rating | data
---------+-----------------+------------+--------+---------------------------------------------------------------------------------------------------------------------
-------------
1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {"tags": {"nk542369": {"ik55240": "iv305393"}}, "keywords": ["abc", "def", "geh"], "publisher": "ktjKEZ1tvq",
"criticrating": 0}
(1 row)
demo=# explain analyze select * from books where data @@ '$.publisher == "ktjKEZ1tvq"';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on books (cost=21.75..1014.25 rows=1000 width=158) (actual time=0.123..0.124 rows=1 loops=1)
Recheck Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
Heap Blocks: exact=1
-&amp;amp;amp;amp;gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1)
Index Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath)
Planning Time: 0.137 ms
Execution Time: 0.194 ms
(7 rows)
JSONPath
Add a JSONPath filter:
Working with JSON in PostgreSQL vs. MongoDB
select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
Build complicated filter expressions:
select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)');
Index support for JSONPath is very limited.
demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1)
Filter: jsonb_path_exists(data, '$."publisher"?(@ == "ktjKEZ1tvq")'::jsonpath, '{}'::jsonb, false)
Rows Removed by Filter: 1000028
Planning Time: 0.095 ms
Execution Time: 480.348 ms
(5 rows)
JSONPath: Projection JSON
Select the last element of the array
Working with JSON in PostgreSQL vs. MongoDB
demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029;
jsonb_path_query
------------------------------
{"price": 50, "style": "pb"}
(1 row)
Select only the hardcover prints from the array
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc")') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)
We can also chain the filters
demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 1000029;
jsonb_path_query
-------------------------------
{"price": 100, "style": "hc"}
(1 row)
Roadmap
● Improvements to the JSONPath implementation in PG13
● Future Roadmap
Working with JSON in PostgreSQL vs. MongoDB
Questions?
Working with JSON in PostgreSQL vs. MongoDB

Contenu connexe

Tendances

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep InternalEXEM
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...ScaleGrid.io
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An IntroductionSmita Prasad
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...Mydbops
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsCloudera, Inc.
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesIbrar Ahmed
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performancePostgreSQL-Consulting
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Icebergkbajda
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationEDB
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesDatabricks
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Alexey Lesovsky
 

Tendances (20)

The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
PostgreSQL Deep Internal
PostgreSQL Deep InternalPostgreSQL Deep Internal
PostgreSQL Deep Internal
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
Spark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted MalaskaSpark Summit EU talk by Ted Malaska
Spark Summit EU talk by Ted Malaska
 
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
What’s the Best PostgreSQL High Availability Framework? PAF vs. repmgr vs. Pa...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
PostgreSQL- An Introduction
PostgreSQL- An IntroductionPostgreSQL- An Introduction
PostgreSQL- An Introduction
 
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
PostgreSQL 15 and its Major Features -(Aakash M - Mydbops) - Mydbops Opensour...
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
Deep dive to PostgreSQL Indexes
Deep dive to PostgreSQL IndexesDeep dive to PostgreSQL Indexes
Deep dive to PostgreSQL Indexes
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
Linux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performanceLinux tuning to improve PostgreSQL performance
Linux tuning to improve PostgreSQL performance
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Spark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student SlidesSpark Summit East 2015 Advanced Devops Student Slides
Spark Summit East 2015 Advanced Devops Student Slides
 
Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.Deep dive into PostgreSQL statistics.
Deep dive into PostgreSQL statistics.
 

Similaire à Working with JSON Data in PostgreSQL vs. MongoDB

Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesJonathan Katz
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Ontico
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsNicholas Kiraly
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlTO THE NEW | Technology
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
UNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxUNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxDharaDarji5
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
Jsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportJsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportAlexander Korotkov
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongoMichael Bright
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and PythonMike Bright
 
Power JSON with PostgreSQL
Power JSON with PostgreSQLPower JSON with PostgreSQL
Power JSON with PostgreSQLEDB
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the RoadmapEDB
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresqlbotsplash.com
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesMongoDB
 

Similaire à Working with JSON Data in PostgreSQL vs. MongoDB (20)

Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)Postgres vs Mongo / Олег Бартунов (Postgres Professional)
Postgres vs Mongo / Олег Бартунов (Postgres Professional)
 
PostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and OperatorsPostgreSQL 9.4 JSON Types and Operators
PostgreSQL 9.4 JSON Types and Operators
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Einführung in MongoDB
Einführung in MongoDBEinführung in MongoDB
Einführung in MongoDB
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
UNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptxUNIT-1 MongoDB.pptx
UNIT-1 MongoDB.pptx
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
Jsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing supportJsquery - the jsonb query language with GIN indexing support
Jsquery - the jsonb query language with GIN indexing support
 
2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo2016 feb-23 pyugre-py_mongo
2016 feb-23 pyugre-py_mongo
 
Using MongoDB and Python
Using MongoDB and PythonUsing MongoDB and Python
Using MongoDB and Python
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mondodb
MondodbMondodb
Mondodb
 
Power JSON with PostgreSQL
Power JSON with PostgreSQLPower JSON with PostgreSQL
Power JSON with PostgreSQL
 
Mongo db
Mongo dbMongo db
Mongo db
 
Json in Postgres - the Roadmap
 Json in Postgres - the Roadmap Json in Postgres - the Roadmap
Json in Postgres - the Roadmap
 
MongoDB_ppt.pptx
MongoDB_ppt.pptxMongoDB_ppt.pptx
MongoDB_ppt.pptx
 
Migrating to postgresql
Migrating to postgresqlMigrating to postgresql
Migrating to postgresql
 
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial IndexesBack to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
Back to Basics Webinar 4: Advanced Indexing, Text and Geospatial Indexes
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 

Plus de ScaleGrid.io

Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory EngineRedis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory EngineScaleGrid.io
 
Introduction to Redis Data Structures: Sets
Introduction to Redis Data Structures: Sets Introduction to Redis Data Structures: Sets
Introduction to Redis Data Structures: Sets ScaleGrid.io
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsScaleGrid.io
 
Cassandra vs. MongoDB
Cassandra vs. MongoDBCassandra vs. MongoDB
Cassandra vs. MongoDBScaleGrid.io
 
Introduction to Redis Data Structures: Hashes
Introduction to Redis Data Structures: HashesIntroduction to Redis Data Structures: Hashes
Introduction to Redis Data Structures: HashesScaleGrid.io
 
Introduction to Redis Data Structures
Introduction to Redis Data Structures Introduction to Redis Data Structures
Introduction to Redis Data Structures ScaleGrid.io
 

Plus de ScaleGrid.io (6)

Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory EngineRedis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
Redis vs. MongoDB: Comparing In-Memory Databases with Percona Memory Engine
 
Introduction to Redis Data Structures: Sets
Introduction to Redis Data Structures: Sets Introduction to Redis Data Structures: Sets
Introduction to Redis Data Structures: Sets
 
Introduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted SetsIntroduction to Redis Data Structures: Sorted Sets
Introduction to Redis Data Structures: Sorted Sets
 
Cassandra vs. MongoDB
Cassandra vs. MongoDBCassandra vs. MongoDB
Cassandra vs. MongoDB
 
Introduction to Redis Data Structures: Hashes
Introduction to Redis Data Structures: HashesIntroduction to Redis Data Structures: Hashes
Introduction to Redis Data Structures: Hashes
 
Introduction to Redis Data Structures
Introduction to Redis Data Structures Introduction to Redis Data Structures
Introduction to Redis Data Structures
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Working with JSON Data in PostgreSQL vs. MongoDB

  • 1. JSONB in PostgreSQL Working with JSON in PostgreSQL vs. MongoDB Dharshan Rangegowda Founder, ScaleGrid.io | @dharshanrg
  • 2. What is JSON? ● JSON stands for Javascript object notation. ● Open standard format RFC 7159. ● Most popular format to store and exchange documents. Working with JSON in PostgreSQL vs. MongoDB
  • 3. Why does PostgreSQL need to care about JSON? • Schema flexibility • Dealing with transient or changing columns. • Nested objects • Might not need to deserialize to query. • Handling objects from other systems • E.g. Stripe transaction Working with JSON in PostgreSQL vs. MongoDB
  • 4. PostgreSQL + JSON Timeline Working with JSON in PostgreSQL vs. MongoDB
  • 5. PostgreSQL JSON Support • Wave 1: PostgreSQL 9.2 (2012) added support for the JSON datatype • Text field with JSON validation • No index support • Wave 2: PostgreSQL 9.4 (2014) added support for JSONB datatype • Binary data structure to store JSON • Index support Working with JSON in PostgreSQL vs. MongoDB
  • 6. PostgreSQL JSON Support • Wave 3: PostgreSQL 12 (2019) added support for SQL/JSON standard • JSONPath support • Powerful query and projection engine for JSON data • Further improvements to JSONPath in PostgreSQL 13 • JSON roadmap Working with JSON in PostgreSQL vs. MongoDB
  • 7. JSON vs. JSONB • JSONB is what you should be using (in most cases) • However, there are some scenarios where JSON is useful: • JSON preserves the original formatting (a.k.a whitespace) • JSON preserves ordering of the keys • JSON preserves duplicate keys • JSON is faster to ingest vs. JSONB Working with JSON in PostgreSQL vs. MongoDB
  • 8. JSONB Anti Patterns ● What is the best way to use JSONB? ○ Do we even need columns any more? ○ Why not just use <int id, jsonb data>? ● JSONB has some high-level limitations you need to be aware of: ○ Statistics collection ○ Storage bloat ● Commonly occurring fields should be stored as columns. ○ Use JSONB for variable or intermittent columns. Working with JSON in PostgreSQL vs. MongoDB
  • 9. JSONB Anti Patterns ● PostgreSQL collect stats on column data distribution ○ Most common values (MCV) ○ Fraction of null values ○ Histogram of distribution ● No column statistics collected for JSONB ○ Query planner doesn’t have stats to make smart decisions ○ Could make wrong choice – cross join vs hash join etc ● More details in blog post - When To Avoid JSONB In A PostgreSQL Schema Working with JSON in PostgreSQL vs. MongoDB
  • 10. JSONB Anti Patterns ● Storage bloat ○ Keys are stored in the data (Similar to MongoDB mmapv1) ○ Use smaller key names to reduce footprint ○ Relies on TOAST compression ○ Sample table with 1M rows (11GB of data) ○ PostgreSQL - 8.7 GB ○ MongoDB Snappy – 8GB, Zlib – 5.3 GB Working with JSON in PostgreSQL vs. MongoDB
  • 11. JSONB & TOAST ● If the size of your column exceeds the TOAST_TUPLE_THRESHOLD (2KB default) data could be moved to out of line storage - TOAST ● TOAST also provides compression (pglz) ○ Decent Compression ○ MongoDB WiredTiger snappy/zlib is potentially better ● To access the data it needs to be De’TOASTed ○ Could result in performance overhead Working with JSON in PostgreSQL vs. MongoDB
  • 12. JSONB Data Structures Working with JSON in PostgreSQL vs. MongoDB Images courtesy: https://erthalion.info/2017/12/21/advanced-json-benchmarks/
  • 13. BSON Data Structures Working with JSON in PostgreSQL vs. MongoDB
  • 14. JSONB Operators Working with JSON in PostgreSQL vs. MongoDB Operator Description ->, ->> Get JSON object field by key @>, <@ Does the left JSONB value contain the right JSONB path/value entries at the top level? ?, ?!, ?& Does the string exist as a top-level key within the JSON value? @@, @@> JSONPath operators Full list of operators can be found in the docs – JSONB op table
  • 15. JSONB Functions • PostgreSQL provides a wide variety of functions to create and process JSON data • Creation functions • Processing functions Working with JSON in PostgreSQL vs. MongoDB
  • 16. MongoDB Query language • Query language based on JSON syntax • db.books.find( {} ) , db.books.find( { publisher: "D" } ) • Array operators • db.books.find( { tags: ["red", "blank"] } ) • AND and OR operators • db.books.find( { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } ) Working with JSON in PostgreSQL vs. MongoDB
  • 17. MongoDB Query language • Query nested documents • db.books.find( { "size.uom": "in" } ) • Query an Array of objects • db.books.find( { 'instock.qty': { $lte: 20 } } )) • Project fields to return from query • db.books.find( {prints: 1}, { $or: [ { publisher: "A" }, { criticrating: { $lt: 30 } } ] } ) Working with JSON in PostgreSQL vs. MongoDB
  • 18. JSONB Indexes • JSONB provides a wide array of options to index your JSON data. • We are going to dig into three types of indexes: • GIN • BTREE • HASH Working with JSON in PostgreSQL vs. MongoDB
  • 19. JSONB Indexes : GIN • GIN stands for “Generalized Inverted Indexes” • GIN supports two operator classes • jsonb_ops • ?, ?|, ?&, @>, @@, @? • [Index each key and value] • jsonb_pathops • @>, @@, @? • [Index only the values] Copyright © ScaleGrid.io
  • 20. JSON sample data Fictional book database (apologies to any librarians in the audience): Working with JSON in PostgreSQL vs. MongoDB demo=# d+ books Table "public.books" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description --------+------------------------+-----------+----------+---------+----------+--------------+------------- id | integer | | not null | | plain | | author | character varying(255) | | | | extended | | isbn | character varying(25) | | | | extended | | rating | integer | | | | plain | | data | jsonb | | | | extended | | Indexes: "books_pkey" PRIMARY KEY, btree (id) …..
  • 21. JSON sample data Working with JSON in PostgreSQL vs. MongoDB demo=# select jsonb_pretty(data) from books where id = 1000021; jsonb_pretty ------------------------------------ { + "tags": { + "nk906585": { + "ik844766": "iv364087"+ } + }, + "prints": [ + { + "price": 100, + "style": "hc" + }, + { + "price": 50, + "style": "pb" + } + ], + "braille": false, + "keywords": [ + "abc", + "kef", + "keh" + ], + "hardcover": true, + "publisher": "nVxJVA8Bwx", + "criticrating": 2 + }
  • 22. JSONB Indexes: GIN - ? Find all books that are available in braille? Let’s create the GIN index on the ‘data’ JSONB column: Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX datagin ON books USING gin (data); demo=# select * from books where data ? 'braille'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} ..... demo=# explain analyze select * from books where data ? 'braille'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) (actual time=0.033..0.039 rows=15 loops=1) Recheck Cond: (data ? 'braille'::text) Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.022..0.022 rows=15 loops=1) Index Cond: (data ? 'braille'::text) Planning Time: 0.102 ms Execution Time: 0.067 ms (7 rows)
  • 23. JSONB Indexes: GIN - ? What if we wanted to find books that were in braille or in hardcover? Working with JSON in PostgreSQL vs. MongoDB demo=# explain analyze select * from books where data ?| array['braille','hardcover']; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.029..0.035 rows=15 loops=1) Recheck Cond: (data ?| '{braille,hardcover}'::text[]) Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.023..0.023 rows=15 loops=1) Index Cond: (data ?| '{braille,hardcover}'::text[]) Planning Time: 0.138 ms Execution Time: 0.057 ms (7 rows)
  • 24. JSONB Indexes: GIN GIN index supports the “existence” operators only on “top level” keys. If the key is not at the top level, then the index will not be used. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data->'tags' ? 'nk455671'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} 685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0} (2 rows) demo=# explain analyze select * from books where data->'tags' ? 'nk455671'; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Seq Scan on books (cost=0.00..38807.29 rows=1000 width=158) (actual time=0.018..270.641 rows=2 loops=1) Filter: ((data -> 'tags'::text) ? 'nk455671'::text) Rows Removed by Filter: 1000017 Planning Time: 0.078 ms Execution Time: 270.728 ms (5 rows)
  • 25. JSONB Indexes: GIN The way to check for existence in nested docs is to use “Expression indexes”. Let’s create an index on data->tags: Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX datatagsgin ON books USING gin (data->'tags'); demo=# select * from books where data->'tags' ? 'nk455671'; id | author | isbn | rating | data ---------+-----------------+------------+--------+----------------------------------------------------------------------------------------------------------- 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} 685122 | GWfuvKfQ1PCe1IL | jnyhYYcF66 | 3 | {"tags": {"nk455671": {"ik615925": "iv253423"}}, "publisher": "b2NwVg7VY3", "criticrating": 0} (2 rows) demo=# explain analyze select * from books where data->'tags' ? 'nk455671'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ Bitmap Heap Scan on books (cost=12.75..1007.75 rows=1000 width=158) (actual time=0.031..0.035 rows=2 loops=1) Recheck Cond: ((data ->'tags'::text) ? 'nk455671'::text) Heap Blocks: exact=2 -> Bitmap Index Scan on datatagsgin (cost=0.00..12.50 rows=1000 width=0) (actual time=0.021..0.021 rows=2 loops=1) Index Cond: ((data ->'tags'::text) ? 'nk455671'::text) Planning Time: 0.098 ms Execution Time: 0.061 ms (7 rows)
  • 26. JSONB Indexes: GIN - @> The “path” operator can be used for multi-level queries of your JSON data. Let’s use it similar to the ? operator. Working with JSON in PostgreSQL vs. MongoDB select * from books where data @> '{"braille":true}'::jsonb; demo=# explain analyze select * from books where data @> '{"braille":true}'::jsonb; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.040..0.048 rows=6 loops=1) Recheck Cond: (data @> '{"braille": true}'::jsonb) Rows Removed by Index Recheck: 9 Heap Blocks: exact=2 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.030..0.030 rows=15 loops=1) Index Cond: (data @> '{"braille": true}'::jsonb) Planning Time: 0.100 ms Execution Time: 0.076 ms (8 rows)
  • 27. JSONB Indexes: GIN - @> The "path" operator can be used for multi level queries of your JSON data. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb; id | author | isbn | rating | data -----+-----------------+------------+--------+------------------------------------------------------------------------------------- 346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3} (1 row) demo=# explain analyze select * from books where data @> '{"publisher":"XlekfkLOtL"}'::jsonb; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=16.75..1009.25 rows=1000 width=158) (actual time=0.491..0.492 rows=1 loops=1) Recheck Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on datagin (cost=0.00..16.50 rows=1000 width=0) (actual time=0.092..0.092 rows=1 loops=1) Index Cond: (data @> '{"publisher": "XlekfkLOtL"}'::jsonb) Planning Time: 0.090 ms Execution Time: 0.523 ms
  • 28. JSONB Indexes: GIN - @> The JSON queries can be nested to many levels. You can also use the ># operation but GIN does not support it. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} (1 row)
  • 29. JSONB Indexes: GIN - jsonb_pathops GIN also supports a “pathops” option to reduce the size of the GIN index. From the docs: “The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.” On my small dataset of 1M books, you can see that the pathops GIN index is smaller – you should test with your dataset to understand the savings. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX dataginpathops ON books USING gin (data jsonb_path_ops); public | dataginpathops | index | sgpostgres | books | 67 MB | public | datatagsgin | index | sgpostgres | books | 84 MB |
  • 30. JSONB Indexes: GIN - jsonb_pathops Let’s rerun our query from before with the pathops index: Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- --------------------------------- ------------------ 1000005 | XEI7xShT8bPu6H7 | 2kD5XJDZUF | 0 | {"tags": {"nk455671": {"ik937456": "iv506075"}}, "braille": true, "keywords": ["abc", "kef", "keh"], "hardcover": false, "publisher": "zSfZIAjGGs", " criticrating": 4} (1 row) demo=# explain select * from books where data @> '{"tags":{"nk455671":{"ik937456":"iv506075"}}}'::jsonb; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=12.75..1005.25 rows=1000 width=158) Recheck Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb) -> Bitmap Index Scan on dataginpathops (cost=0.00..12.50 rows=1000 width=0) Index Cond: (data @> '{"tags": {"nk455671": {"ik937456": "iv506075"}}}'::jsonb) (4 rows)
  • 31. JSONB Indexes: GIN - jsonb_pathops The “jsonb_pathops” option supports only the @> operator. Smaller index but more limited scenarios. The following queries below can no longer leverage the GIN index: Working with JSON in PostgreSQL vs. MongoDB select * from books where data ? 'tags'; => Sequential scan select * from books where data @> '{"tags" :{}}'; => Sequential scan select * from books where data @> '{"tags" :{"k7888":{}}}' => Sequential scan
  • 32. JSONB Indexes: B-tree • B-tree indexes are the most common index type in relational databases. • If you index an entire JSONB column with a B-tree index, the only useful operators are the comparison operators: • =, <, <=, >, >= • Can be used only for whole object comparisons. • Very limited use case. Working with JSON in PostgreSQL vs. MongoDB
  • 33. JSONB Indexes: B-tree • Use B-tree “Expression indexes” • B-tree expression indexes can support the common comparison operators '=', '<', '>', '>=', '<=‘ (which GIN doesn't support). • Retrieve all books with a data->criticrating > 4. Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data->'criticrating' > 4; ERROR: operator does not exist: jsonb >= integer LINE 1: select * from books where data->'criticrating’ > 4; ^ HINT: No operator matches the given name and argument types. You might need to add explicit type casts. #Lets cast JSONB to integer demo=# select * from books where (data->'criticrating')::int4 > 4; #If you are using a version prior to pg11 you need to query as text and then cast demo=# select * from books where (data->>'criticrating')::int4 > 4;
  • 34. JSONB Indexes: B-tree For expression indexes, the index needs to be an exact match with the query expression: Working with JSON in PostgreSQL vs. MongoDB demo=# CREATE INDEX criticrating ON books USING BTREE (((data->'criticrating')::int4)); CREATE INDEX demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1) Index Cond: (((data -> 'criticrating'::text))::integer = 3) Planning Time: 0.103 ms Execution Time: 79.019 ms (4 rows) demo=# explain analyze select * from books where (data->'criticrating')::int4 = 3; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Index Scan using criticrating on books (cost=0.42..4626.93 rows=5000 width=158) (actual time=0.069..70.221 rows=199883 loops=1) Index Cond: (((data -> 'criticrating'::text))::integer = 3) Planning Time: 0.103 ms Execution Time: 79.019 ms (4 rows) 1 From above we can see that the BTREE index is being used as expected.
  • 35. JSONB Indexes: HASH • If you are only interested in the "=" operator, then Hash indexes become interesting. • Hash indexes tend to be smaller than B-tree indexes. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX publisherhash ON books USING HASH ((data->'publisher')); demo=# select * from books where data->'publisher' = 'XlekfkLOtL' demo-# ; id | author | isbn | rating | data -----+-----------------+------------+--------+------------------------------------------------------------------------------------- 346 | uD3QOvHfJdxq2ez | KiAaIRu8QE | 1 | {"tags": {"nk88": {"ik37": "iv161"}}, "publisher": "XlekfkLOtL", "criticrating": 3} (1 row) demo=# explain analyze select * from books where data->'publisher' = 'XlekfkLOtL'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Index Scan using publisherhash on books (cost=0.00..2.02 rows=1 width=158) (actual time=0.016..0.017 rows=1 loops=1) Index Cond: ((data -> 'publisher'::text) = 'XlekfkLOtL'::text) Planning Time: 0.080 ms Execution Time: 0.035 ms (4 rows)
  • 36. JSONB Indexes: GIN - Trigram • PostgreSQL supports string matching using Trigram indexes. • Trigrams are basically words broken up into sequences of 3 letters. • We can search for any arbitrary regex (not just left anchored). Working with JSON in PostgreSQL vs. MongoDB CREATE EXTENSION pg_trgm; CREATE INDEX publisher ON books USING GIN ((data->'publisher') gin_trgm_ops); demo=# select * from books where data->'publisher' LIKE '%I0UB%'; id | author | isbn | rating | data ----+-----------------+------------+--------+--------------------------------------------------------------------------------- 4 | KiEk3xjqvTpmZeS | EYqXO9Nwmm | 0 | {"tags": {"nk3": {"ik1": "iv1"}}, "publisher": "MI0UBqZJDt", "criticrating": 1} (1 row) demo=# explain analyze select * from books where data->'publisher' LIKE '%I0UB%'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=9.78..111.28 rows=100 width=158) (actual time=0.033..0.033 rows=1 loops=1) Recheck Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text) Heap Blocks: exact=1 -> Bitmap Index Scan on publisher (cost=0.00..9.75 rows=100 width=0) (actual time=0.025..0.025 rows=1 loops=1) Index Cond: ((data -&gt; 'publisher'::text) ~~ '%I0UB%'::text) Planning Time: 0.213 ms Execution Time: 0.058 ms (7 rows)
  • 37. JSONB Indexes: GIN - Arrays • GIN indexes are great for indexing arrays. • Indexing and searching the keyword array. Working with JSON in PostgreSQL vs. MongoDB CREATE INDEX keywords ON books USING GIN ((data->'keywords') jsonb_path_ops); demo=# select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- -------------- 1000003 | zEG406sLKQ2IU8O | viPdlu3DZm | 4 | {"tags": {"nk263020": {"ik203820": "iv817928"}}, "keywords": ["abc", "kef", "keh"], "publisher": "7NClevxuTM", "criticrating": 2} 1000004 | GCe9NypHYKDH4rD | so6TQDYzZ3 | 4 | {"tags": {"nk780341": {"ik397357": "iv632731"}}, "keywords": ["abc", "kef", "keh"], "publisher": "fqaJuAdjP5", "criticrating": 2} (2 rows) demo=# explain analyze select * from books where data->'keywords' @> '["abc", "keh"]'::jsonb; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=54.75..1049.75 rows=1000 width=158) (actual time=0.026..0.028 rows=2 loops=1) Recheck Cond: ((data -> 'keywords'::text) @> '["abc", "keh"]'::jsonb) Heap Blocks: exact=1 -> Bitmap Index Scan on keywords (cost=0.00..54.50 rows=1000 width=0) (actual time=0.014..0.014 rows=2 loops=1) Index Cond: ((data -> 'keywords'::text) @&amp;amp;amp;amp;amp;gt; '["abc", "keh"]'::jsonb) Planning Time: 0.131 ms Execution Time: 0.063 ms (7 rows)
  • 38. SQL/JSON • SQL standard added support for JSON – SQL Standard-2016 (SQL/JSON). • SQL/JSON Data model • JSONPath • SQL/JSON functions • With PG12 release, PostgreSQL has one of the best implementations of SQL/JSON. Working with JSON in PostgreSQL vs. MongoDB
  • 39. SQL/JSON 2016 ● A sequence of SQL/JSON items, each item can be (recursively) any of: ○ SQL/JSON scalar — non-null value of SQL types: Unicode character string, numeric, Boolean or datetime. ○ SQL/JSON null, value that is distinct from any value of any SQL type (not the same as NULL). ○ SQL/JSON arrays, ordered list of zero or more SQL/JSON items — SQL/JSON element ○ SQL/JSON objects — unordered collections of zero or more SQL/JSON members. ■ (key, SQL/JSON item) Working with JSON in PostgreSQL vs. MongoDB
  • 40. JSONPath Working with JSON in PostgreSQL vs. MongoDB .key Returns an object member with the specified key [*] Wildcard array element accessor that returns all array elements .* Wildcard member accessor that returns the values of all members located at the top level of the current object .** Recursive wildcard member accessor that processes all levels of the JSON hierarchy of the current object and returns all the member values, regardless of their nesting level JSONPath allows you to specify an expression (using a syntax similar to the property access notation in Javascript) to query or project your JSON data.
  • 41. SQL/JSON Functions ● PG 12 provides several functions to use JSONPATH to query your JSON data ○ jsonb_path_exists - Checks whether JSON path returns any item for the specified JSON value ○ jsonb_path_match - Returns the result of JSON path predicate check for the specified JSON value. ○ jsonb_path_query - Gets all JSON items returned by JSON path for the specified JSON value. Working with JSON in PostgreSQL vs. MongoDB
  • 42. JSONPath Finding books by publisher? Working with JSON in PostgreSQL vs. MongoDB demo=# select * from books where data @@ '$.publisher == "ktjKEZ1tvq"'; id | author | isbn | rating | data ---------+-----------------+------------+--------+--------------------------------------------------------------------------------------------------------------------- ------------- 1000001 | 4RNsovI2haTgU7l | GwSoX67gLS | 2 | {"tags": {"nk542369": {"ik55240": "iv305393"}}, "keywords": ["abc", "def", "geh"], "publisher": "ktjKEZ1tvq", "criticrating": 0} (1 row) demo=# explain analyze select * from books where data @@ '$.publisher == "ktjKEZ1tvq"'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on books (cost=21.75..1014.25 rows=1000 width=158) (actual time=0.123..0.124 rows=1 loops=1) Recheck Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath) Heap Blocks: exact=1 -&amp;amp;amp;amp;gt; Bitmap Index Scan on datagin (cost=0.00..21.50 rows=1000 width=0) (actual time=0.110..0.110 rows=1 loops=1) Index Cond: (data @@ '($."publisher" == "ktjKEZ1tvq")'::jsonpath) Planning Time: 0.137 ms Execution Time: 0.194 ms (7 rows)
  • 43. JSONPath Add a JSONPath filter: Working with JSON in PostgreSQL vs. MongoDB select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")'); Build complicated filter expressions: select * from books where jsonb_path_exists(data, '$.prints[*] ?(@.style=="hc" && @.price == 100)'); Index support for JSONPath is very limited. demo=# explain analyze select * from books where jsonb_path_exists(data,'$.publisher ?(@ == "ktjKEZ1tvq")'); QUERY PLAN ------------------------------------------------------------------------------------------------------------ Seq Scan on books (cost=0.00..36307.24 rows=333340 width=158) (actual time=0.019..480.268 rows=1 loops=1) Filter: jsonb_path_exists(data, '$."publisher"?(@ == "ktjKEZ1tvq")'::jsonpath, '{}'::jsonb, false) Rows Removed by Filter: 1000028 Planning Time: 0.095 ms Execution Time: 480.348 ms (5 rows)
  • 44. JSONPath: Projection JSON Select the last element of the array Working with JSON in PostgreSQL vs. MongoDB demo=# select jsonb_path_query(data, '$.prints[$.size()]') from books where id = 1000029; jsonb_path_query ------------------------------ {"price": 50, "style": "pb"} (1 row) Select only the hardcover prints from the array demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc")') from books where id = 1000029; jsonb_path_query ------------------------------- {"price": 100, "style": "hc"} (1 row) We can also chain the filters demo=# select jsonb_path_query(data, '$.prints[*] ?(@.style=="hc") ?(@.price ==100)') from books where id = 1000029; jsonb_path_query ------------------------------- {"price": 100, "style": "hc"} (1 row)
  • 45. Roadmap ● Improvements to the JSONPath implementation in PG13 ● Future Roadmap Working with JSON in PostgreSQL vs. MongoDB
  • 46. Questions? Working with JSON in PostgreSQL vs. MongoDB