3. Why PSQL?
● Open Source / Cross platform
● Reliability and Stability
● Extensible
● Designed for high volume environments
● Only PSQL has Inherited Tables
● …..
4. You work on a project that stores data in a
relational database.
The application gets deployed to production
and early on the performance is great,
selecting data from the database is snappy and
insert latency goes unnoticed.
Here’s a classic scenario.
Whats Problems!!!
Over a time period of days / weeks / months the
database starts to get bigger and queries slow
down.
5. - A Database Administrator (DBA) will
take a look and see that the database is
tuned.
- They offer suggestions to add certain
indexes,
- Move logging to separate disk partitions,
- Adjust database engine parameters and
verify that the database is healthy.
Potential solutions
This will buy you more time and may resolve
this issues to a degree.
At a certain point you realize the
data in the database is the
bottleneck.
There are various approaches that can help you
make your application and database run faster.
Let’s take a look at two of them:
- Table partitioning
- Sharding
7. The main idea :
You take one MASTER TABLE and split it
into many smaller tables
these smaller tables are called partitions or
child tables.
Table Partitioning
8. Master Table:
Also referred to as a Master Partition Table, this table is the template child tables are created from. This is a normal
table, but it doesn’t contain any data and requires a trigger.
Child Table:
These tables inherit their structure from the master table and belong to a single master table. The child tables
contain all of the data. These tables are also referred to as Table Partitions.
Partition Function:
A partition function is a Stored Procedure that determines which child table should accept a new record. The
master table has a trigger which calls a partition function.
Table Partitioning
9. Here’s a summary of what should be done:
- Create a master table
- Create a partition function
- Create a table trigger
Implementation
Constraint exclusion is a query optimization technique that improves performance for partitioned
tables :
SET constraint_exclusion = partition ;
11. Performance Testing On Specified Date
--partition table
SELECT * FROM hashvalue_PT
WHERE hashtime = DATE '2008-08-01'
--non partition table
SELECT * FROM hashvalue WHERE
hashtime = DATE '2008-08-01'
When both contains 200 millions of
data, search on specified date,
partition table is more faster than
non-partition table about 144.45%
Search on specified date
“2008-08-01”
Records Retrieved = 741825
Partition Table = 359.61 seconds
Non Partition Table = 879.062
seconds
14. Sharding
Sharding is like partitioning. The
difference is that with traditional
partitioning, partitions are stored in
the same database while sharding
shards (partitions) are stored in
different servers.
PostgreSQL does not provide built-in tool for sharding. We will use citus which extends PostgreSQL
capability to do sharding and replication.
15. Sharding Installation
DB server1: 192.168.56.10 (Master)
DB Server2: 192.168.56.11 (Worker)
- Pkg install pg_citus
- root@DB:~ # grep shared_preload_libraries /var/db/postgres/data96/postgresql.conf
shared_preload_libraries = 'citus' # (change requires restart)
- root@DB:~ # grep listen_addresses /var/db/postgres/data96/postgresql.conf
isten_addresses = '*' # what IP address(es) to listen on;
- Echo “host all all 192.168.56.0/24 trust” >> /var/db/postgres/data96/pg_hba.conf
- service postgresql restart
- ONLY ON MASTER: root@DB:/var/db/postgres/data96 # cat pg_worker_list.conf
192.168.56.11 5432
- service postgresql reload
- postgres=# create extension citus;
CREATE EXTENSION
16. Sharding Installation
verify that the master is ready:
postgres=# SELECT * FROM master_get_active_worker_nodes();
node_name | node_port
---------------+-----------
192.168.56.11 | 5432
(1 row)
17. Sharding Installation
Every thing is going fine until now, so we can create on the master the
table to be sharded.
CREATE TABLE sales
(deptno int not null,
deptname varchar(20),
total_amount int,
CONSTRAINT pk_sales PRIMARY KEY (deptno)) ;
We need have inform Citus that data of table sales will be distributed
among MASTER and WORKER:
SELECT master_create_distributed_table('sales', 'deptno', 'hash');
18. Sharding Installation
In our example we are going to create one shard on each worker. We will
Specify
the table name : sales
total shard count : 2
replication factor : 1 –No replication
SELECT master_create_worker_shards(sales, 2, 1);
Sharding is done
19. Sharding result
insert into sales (deptno,deptname,total_amount) values (1,'french_dept',10000);
insert into sales (deptno,deptname,total_amount) values (2,'german_dept',15000);
insert into sales (deptno,deptname,total_amount) values (3,'china_dept',21000);
insert into sales (deptno,deptname,total_amount) values (4,'gambia_dept',8750);
insert into sales (deptno,deptname,total_amount) values (5,'japan_dept',12010);
insert into sales (deptno,deptname,total_amount) values (6,'china_dept',35000);
insert into sales (deptno,deptname,total_amount) values (7,'nigeria_dept',10000);
insert into sales (deptno,deptname,total_amount) values (8,'senegal_dept',33000);
21. Conclusion
Note that not all SQL commands are able to work on inheritance hierarchies. Commands that
are used for data querying, data modification, or schema modification (e.g., SELECT, UPDATE,
DELETE, most variants of ALTER TABLE, but not INSERT or ALTER TABLE ... RENAME) typically
default to including child tables and support the ONLY notation to exclude them. Commands
that do database maintenance and tuning (e.g., REINDEX, VACUUM) typically only work on
individual, physical tables and do not support recursing over inheritance hierarchies. The
respective behavior of each individual command is documented in its reference page (Reference
I, SQL Commands).
A serious limitation of the inheritance feature is that indexes (including unique constraints) and
foreign key constraints only apply to single tables, not to their inheritance children. This is true
on both the referencing and referenced sides of a foreign key constraint.
22. Conclusion
Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:
Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single
partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the
heavily-used parts of the indexes fit in memory.
When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that
partition instead of using an index and random access reads scattered across the whole table.
Bulk loads and deletes can be accomplished by adding or removing partitions, if that requirement is planned into the partitioning design. ALTER TABLE NO
INHERIT and DROP TABLE are both far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.
Seldom-used data can be migrated to cheaper and slower storage media.
The benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning
depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.
Currently, PostgreSQL supports partitioning via table inheritance. Each partition must be created as a child table of a single parent table. The parent table
itself is normally empty; it exists just to represent the entire data set. You should be familiar with inheritance (see Section 5.9) before attempting to set up
partitioning.