New Features
● Developer and SQL Features
● DBA and Administration
● Replication
● Performance
By Amit Kapila at India PostgreSQL UserGroup Meetup, Bangalore at InMobi.
http://technology.inmobi.com/events/india-postgresql-usergroup-meetup-bangalore
2. 2
●
June 10, 2014 – branch 9.4
●
June 2014 – CF1 - Completed
●
August 2014 – CF2 - Completed
●
October 2014 – CF3 - Completed
●
December 2014 – CF4 - Completed
●
February 2015 – CF5 – In Progress
Development Status
3. 3
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
4. 4
Multi-column subselect
Update
● Update more than one column with subselect
● SQL standard syntax
UPDATE tab SET (col1, col2) =
(SELECT foo, bar FROM tab2)
WHERE ...
5. 5
SKIP LOCKED
● Like SELECT NOWAIT
● Except skip rows instead of error
postgres=# SELECT * FROM a FOR UPDATE NOWAIT;
ERROR: could not obtain lock on row in relation "a"
postgres=# SELECT * FROM a FOR UPDATE SKIP
LOCKED;
a | b | c
----+----+----
2 | 2 | 2
3 | 3 | 3
7. 7
Row Level Security
● Allows controlling at row level which
rows can be retrieved by SELECT or
manipulated using INSERT | UPDATE |
DELETE
● Need to define policies for tables
using Policy commands (CREATE | ALTER |
DROP Policy)
● Row Security needs to be enabled and
disabled by the owner on a per-table
basis using
ALTER TABLE .. ENABLE/DISABLE ROW
SECURITY.
8. 8
Row Level Security
● ROW SECURITY is disabled on tables by
default and must be enabled for
policies on the table to be used.
● If nopolicies exist on a table with ROW
SECURITY enabled, a default-deny policy
is used and no records will be visible.
● A new role capability, BYPASSRLS, which
can only be set by the superuser, is
added to allow other users to be able
to bypass row security using
row_security = OFF
9. 9
Row Level Security
● row_security - a new parameter in
postgresql.conf controls if row
security policies are to be applied to
queries which are run against tables
that have row security enabled.
on - all users, except superusers and
the owner of the table, will have the
row policies for the table applied to
their queries.
force - this is to apply policies for
superusers and owner of table.
off -will bypass row policies for the
table, if user doing operation has
BYPASSRLS attribute, and error if not.
10. 10
Row Level Security – How it works
● create table clients ( id serial primary key,
account_name text not null unique,
account_manager text not null
);
CREATE TABLE
create user peter;
CREATE ROLE
create user joanna;
CREATE ROLE
create user bill;
CREATE ROLE
11. 11
Row Level Security – How it works
● Grant appropriate permissions
grant all on table clients to peter, joanna, bill;
GRANT
grant all on sequence clients_id_seq to peter, joanna, bill;
GRANT
● Populate the table
insert into clients (account_name, account_manager)
values ('initrode', 'peter'), ('initech', 'bill'), ('chotchkie''s',
'joanna');
INSERT 0 3
12. 12
Row Level Security – How it works
● By default, all the rows are visible.
$ c - peter
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
1 | initrode | peter
2 | initech | bill
3 | chotchkie's | joanna
(3 rows)
13. 13
Row Level Security – How it works
● Now lets create policies and enable row level security
create policy just_own_clients on clients
for all
to public
using ( account_manager = current_user );
CREATE POLICY
alter table clients ENABLE ROW LEVEL SECURITY;
ALTER TABLE
14. 14
Row Level Security – How it works
● Now, I can only see rows belonging to myself:
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
1 | initrode | peter
(1 row)
$ c - joanna
$ select * from clients;
id | account_name | account_manager
----+--------------+-----------------
3 | chotchkie's | joanna
(1 row)
15. 15
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
16. 16
min and max wal size
● checkpoint_segments removed!
● Instead, control min and max size
● min_wal_size (default 80MB)
● max_wal_size (default 1GB)
● Checkpoints auto-tuned to happen in between
● Moving average of previous checkpoints
● Space only consumed when actually needed
17. 17
Foreign Table Inheritance
● Foreign tables can now be inheritance
children, or parents.
● PostgreSQL offers a way to do
partitioning by using
table inheritance and CHECK constraints
● This feature can be used for sharding
18. 18
Commit Timestamp Tracking
● Optional tracking of commit timestamps
● track_commit_timestamp=on
● Default is off and changing the value
of this parameter requires server
restart
● User can retrieve the information for
transactions that were committed after
above option is enabled
● Can be used by multimaster systems for
conflict resolution
20. 20
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
21. 21
pg_rewind
● a tool for synchronizing a PostgreSQL
cluster with another copy of the same
cluster, after the clusters' timelines
have diverged
● This is used to bring an old master
server back online after failover, as a
standby that follows the new master
● The advantage of pg_rewind over taking
a new base backup, or tools like rsync,
is that pg_rewind does not require
reading through all unchanged files in
the cluster
22. 22
pg_rewind
● It is lot faster when the database is
large and only a small portion of it
differs between the clusters
● The target server (old-master) must be
shut down cleanly before running
pg_rewind
● pg_rewind requires that the
wal_log_hints option is enabled in
postgresql.conf, or that data checksums
were enabled when the cluster was
initialized with initdb.
● full_page_writes must also be enabled.
23. 23
●
Developer and SQL Features
●
DBA and Administration
●
Replication
●
Performance
New Features
24. 24
BRIN
● Block Range Index
● Stores only bounds-per-block-range
● Default is 128 blocks
● Very small indexes
● Scans all blocks for matches
● Used for scanning large tables
25. 25
BRIN
=# CREATE TABLE brin_example AS SELECT
generate_series(1,100000000) AS id;
SELECT 100000000
=# CREATE INDEX btree_index ON
brin_example(id);
CREATE INDEX
Time: 239033.974 ms
=# CREATE INDEX brin_index ON
brin_example USING brin(id);
CREATE INDEX
Time: 42538.188 ms
26. 26
BRIN
=# CREATE TABLE brin_example AS SELECT
generate_series(1,100000000) AS id;
SELECT 100000000
=# CREATE INDEX btree_index ON brin_example(id);
CREATE INDEX
Time: 239033.974 ms
=# CREATE INDEX brin_index ON brin_example USING
brin(id);
CREATE INDEX
Time: 42538.188 ms
Conclusion – Brin index creation is much faster
27. 27
BRIN – Index creation with different block
ranges
=# CREATE INDEX brin_index_64 ON brin_example USING
brin(id) WITH (pages_per_range = 64);
CREATE INDEX
=# CREATE INDEX brin_index_256 ON brin_example USING
brin(id) WITH (pages_per_range = 256);
CREATE INDEX
=# CREATE INDEX brin_index_512 ON brin_example USING
brin(id) WITH (pages_per_range = 512);
CREATE INDEX
29. 29
BRIN – How it works
● A new index access method intended to
accelerate scans of very large tables,
without the maintenance overhead of
btrees or other traditional indexes.
● They work by maintaining "summary" data
about block ranges.
30. 30
BRIN – How it works
● For data types with natural 1-D sort
orders like integers, the summary info
consists of the maximum and the minimum
values of each indexed column within
each page range
● As new tuples are added into the index,
the summary information is updated if
the block range in which the tuple is
added is already summarized
● Otherwise subsequent pass of Vacuum or
the brin_summarize_new_values()
function will create the summary
information.
31. 31
Read Scalability
● We will see a boost in scalability
for read workload when the data can fit
in RAM. I have ran a pgbench read-only
load to compare the performance
difference between 9.4 and HEAD
(62f5e447)on IBM POWER-8 having 24
cores, 192 hardware threads, 492GB RAM
● The data is mainly taken for 2 kind of
workloads, when all the data fits in
shared buffers (scale_factor = 300) and
when all the data can't fit in shared
buffers, but can fit in RAM
(scale_factor = 1000)
32. 32
Read Scalability – Data fits in shared_buffers
1 8 16 32 64 128 256
0
100000
200000
300000
400000
500000
600000
pgbench -S -M prepared, PG9.5dev as of commit 62f5e4
median of 3 5-minute runs, scale_factor = 300, max_connections = 300, shared_buffers = 8GB
9.4
HEAD
Client Count
TPS
33. 33
Read Scalability
● In 9.4 it peaks at 32 clients, now it
peaks at 64 clients and we can see the
performance improvement upto (~98%) and
it is better in all cases at higher
client count starting from 32 clients
● The main work which lead to this
improvement is commit – ab5194e6
(Improve LWLock scalability)
34. 34
Read Scalability
● The previous implementation has a
bottleneck around spin locks that were
acquired for LWLock Acquisition and
Release and the implementation for 9.5
has changed the LWLock implementation
to use atomic operations to manipulate
the state.
35. 35
Read Scalability – Data fits in RAM
1 8 16 32 64 128 256
0
50000
100000
150000
200000
250000
300000
350000
400000
pgbench -S -M prepared, PG9.5dev as of commit 62f5e4
median of 3 5-minute runs, scale_factor = 1000, max_connections = 300, shared_buffers = 8GB
9.4
HEAD
Client Count
TPS
36. 36
Read Scalability
● In this case, we could see the good
performance improvement (~25%)even at
32 clients and it went upto (~96%) at
higher client count, in this case also
where in 9.4 it was peaking at 32
client count, now it peaks at 64 client
count and the performance is better
atall higher client counts.
● The main work which lead to this
improvement is commit – ab5194e6
(Improve LWLock scalability)
37. 37
Read Scalability
● In this case there were mainly 2
bottlenecks
● a BufFreeList LWLock was getting
acquired to find a free buffer for a
page
● to change the association of buffer in
buffer mapping hash table a LWLock is
acquired on a hash partition towhich
the buffer to be associated belongs and
as there were just 16 such partitions
38. 38
Read Scalability
● To reduce the bottleneck due to first
problem, used a spinlock which is held
just long enough to pop the freelist or
advance the clock sweep hand, and then
released
● To reduce the bottleneck due to second
problem, increase the buffer partitions
to 128
● The crux of this improvement is that we
had to resolve both the bottlenecks
together to see a major improvement in
scalability
39. 39
Parallel Vacuumdb
● vacuumdb can use concurrent connections
● Add -j<n> to command line
● Speed up important VACUUM or ANALYZE
● This option reduces the time of the
processing but it also increases the
load on the database server.This option
reduces the time of the processing but
it also increases the load on the
database server.
40. 40
Sorting Improvements
● Use abbreviated keys for faster sorting
of text
● transformation of strings into binary
keys using strxfrm(), and sorting the
keys instead
● using a strcmp()-based comparator with
the keys, which only considers raw byte
ordering
● abbreviate by taking the first few
characters of the strxfrm() blob.
41. 41
Sorting Improvements
● If the abbreviated comparison is
insufficent to resolve the comparison,
we fall back on the normal comparator.
● This can be much faster than the old
way of doing sorting if the first few
bytes of the string are usually
sufficient to resolve the comparison.
42. 42
Sorting Improvements
● As an example
create table stuff as select
random()::text as a, 'filler filler
filler'::text as b, g as c from
generate_series(1, 1000000) g;
SELECT 1000000
create index on stuff (a);
CREATE INDEX
● On PPC64 m/c, before this feature,
above operation use to take 6.3 seconds
and after feature it took just 1.9
seconds, which is 3x improvement.
Hooray!
43. 43
WAL Compression
● Optional compression for full page
images in WAL
● wal_compression=on
● Default is off, can be set by user and
doesn't require restart
● Support for compressing full page
images
44. 44
WAL Compression
● Smaller WAL
● Faster writes, faster replication
● Costs CPU
● Only compresses FPIs
● Still useful to gzip archives!
45. 45
Index Scan Optimization
● improved performance for Index Scan on ">" condition
● We can see performance improvement from 5 to 30 percent.
46. 46
● Thanks to Magnus Hagander who has presented the paper for
PostgreSQL 9.5 features in PGConf US 2015. Some of the
slides in this paper are from his paper. You can download his
slides from http://www.hagander.net/talks/
● Thanks to Hubert 'depesz' Lubaczewski and Michael Paquier
for writing blogs for new features in PostgreSQL 9.5. Some of
the examples used in this paper are taken from their blogs.