SlideShare une entreprise Scribd logo
1  sur  34
Federated
PostgreSQL
Who Am I?
●

Jim Mlodgenski
–
–

●

jimm@openscg.com
@jim_mlodgenski

Co-organizer of
–
–

●

NYC PUG (www.nycpug.org)
Philly PUG (www.phlpug.org)

CTO, OpenSCG
–

www.openscg.com
http://nyc.pgconf.us
What is a federated database?
“A federated database system is a type of meta-database
management system (DBMS), which transparently maps
multiple autonomous database systems into a single federated
database. The constituent databases are interconnected via a
computer network and may be geographically decentralized. ...
There is no actual data integration in the constituent disparate
databases as a result of data federation.”
-Wikipedia
How does PostgreSQL do it?
●

Uses Foreign Table Wrappers (FDW)

●

Used with SQL/MED
–
–

Management of External Data

–
●

New ANIS SQL 2003 Extension
Standard way of handling remote objects in SQL databases

Wrappers used by SQL/MED to access remotes data
sources
Types of Foreign Data Wrappers
●

SQL

●

NoSQL

●

File

●

Miscellaneous

●

PostgreSQL
SQL Wrappers
●

Oracle

●

SQLite

●

MySQL

●

JDBC

●

Informix

●

ODBC

●

Firebird
SQL Wrappers
CREATE SERVER oracle_server FOREIGN DATA WRAPPER
oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME');
CREATE USER MAPPING FOR CURRENT_USER
SERVER oracle_server
OPTIONS (user 'scott', password 'tiger');
CREATE FOREIGN TABLE fdw_test (
userid

numeric,

username

text,

email

text

)
SERVER oracle_server
OPTIONS ( schema 'scott', table 'fdw_test');
postgres=# select * from fdw_test;
userid | username |

email

--------+----------+------------------1 | scott
(1 row)

| scott@oracle.com
NoSQL Wrappers
●

MongoDB

●

Redis

●

CouchDB

●

Neo4j

●

MonetDB

●

Tycoon
NoSQL Wrappers
CREATE SERVER mongo_server FOREIGN DATA WRAPPER
mongo_fdw OPTIONS (address '192.168.122.47', port '27017');
CREATE FOREIGN TABLE databases (
_id NAME,
name TEXT
)
SERVER mongo_server
OPTIONS (database 'mydb', collection 'pgData');
test=# select * from databases ;
_id

|

name

--------------------------+-----------52fd49bfba3ae4ea54afc459 | mongo
52fd49bfba3ae4ea54afc45a | postgresql
52fd49bfba3ae4ea54afc45b | oracle
52fd49bfba3ae4ea54afc45c | mysql
52fd49bfba3ae4ea54afc45d | redis
52fd49bfba3ae4ea54afc45e | db2
(6 rows)
File Wrappers
●

Delimited files

●

Fixed length files

●

JSON files
File Wrappers
CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw;
CREATE FOREIGN TABLE leads (
first_name text, last_name text,
company_name text, address text,
city text, county text,
state text, zip text,
phone1 text, phone2 text,
email text, web text
) SERVER pg_load
OPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' );
test=# select first_name || ' ' || last_name as full_name, email from leads limit 3;
full_name

|

email

-------------------+------------------------------James Butt

| jbutt@gmail.com

Josephine Darakjy | josephine_darakjy@darakjy.org
Art Venere
(3 rows)

| art@venere.org
Miscellaneous Wrappers
●

Hadoop

●

LDAP

●

S3

●

WWW

●

PG-Strom
Hadoop Wrapper
CREATE SERVER hive_server FOREIGN DATA WRAPPER
hive_fdw OPTIONS (address '127.0.0.1', port '10000');
CREATE USER MAPPING

FOR PUBLIC SERVER hive_server;

CREATE FOREIGN TABLE order_line (
ol_w_id

integer,

ol_d_id

integer,

ol_o_id

integer,

ol_number

integer,

ol_i_id

integer,

ol_delivery_d

timestamp,

ol_amount

decimal(6,2),

ol_supply_w_id

integer,

ol_quantity

decimal(2,0),

ol_dist_info

varchar(24)

) SERVER hive_server OPTIONS (table 'order_line');
INSERT INTO item_sale_month
SELECT ol_i_id as i_id,
EXTRACT(YEAR FROM ol_delivery_d) as year,
EXTRACT(MONTH FROM ol_delivery_d) as month,
sum(ol_amount) as amount
FROM order_line
GROUP BY 1, 2, 3;
Hadoop Wrapper
●

Hadoop foreign tables can also be writable
CREATE FORIEGN TABLE audit (
audit_id

bigint,

event_d

timestamp,

table

varchar,

action

varchar,

user

varchar,

) SERVER hive_server
OPTIONS (table 'audit',
flume_port '44444');
INSERT INTO audit
VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
Hadoop Wrapper
●

It also works with HBase tables
CREATE FOREIGN TABLE hive_hbase_table (
key

varchar,

value varchar
) SERVER localhive
OPTIONS (table 'hbase_table', hbase_address 'localhost',
hbase_port '9090', hbase_mapping ':key,cf:val');
INSERT INTO hive_hbase_table VALUES ('key1', 'value1');
INSERT INTO hive_hbase_table VALUES ('key2', 'value2');
UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2';
DELETE FROM hive_hbase_table WHERE key='key1';
SELECT * from hive_hbase_table;
WWW Wrapper
CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw
OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0');
CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search;
CREATE FOREIGN TABLE www_fdw_google_search (
q text, GsearchResultClass text, unescapedUrl text, url text,
visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text
) SERVER www_fdw_server_google_search;
select url,substring(title,1,25)||'...',substring(content,1,25)||'...'
from www_fdw_google_search where q='postgresql fdw';
url

|

?column?

|

?column?

-------------------------------------------------------------+------------------------------+-----------------------------http://wiki.postgresql.org/wiki/Foreign_data_wrappers

| Foreign data wrappers - <... | Jan 24, 2014 <b>...</b> 1...

http://www.postgresql.org/docs/9.3/static/postgres-fdw.html | <b>PostgreSQL</b>: Docume... | F.31.1. <b>FDW</b> Option...
http://www.postgresql.org/docs/9.3/static/fdwhandler.html

| <b>PostgreSQL</b>: Docume... | Foreign Data Wrapper Call...

http://www.craigkerstiens.com/2013/08/05/a-look-at-FDWs/

| A look at Foreign Data Wr... | Aug 5, 2013 <b>...</b> An...

(4 rows)
PostgreSQL Wrapper
●

The most functional FDW by far

●

Replaces much of the functionality of dblink

●

Shipped as a contrib module
PostgreSQL Wrapper
CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (host 'localhost', port '5432', dbname 'test2');
CREATE USER MAPPING FOR PUBLIC SERVER postgres_server;
CREATE FOREIGN TABLE bird_strikes (
aircraft_type varchar, airport varchar, altitude varchar, aircraft_model varchar,
num_wildlife_struck varchar, impact_to_flight varchar, effect varchar,
location varchar, flight_num varchar, flight_date timestamp,
record_id int, indicated_damage varchar, freeform_en_route varchar, num_engines varchar,
airline varchar, origin_state varchar, phase_of_flight varchar, precipitation varchar,
wildlife_collected boolean, wildlife_sent_to_smithsonian boolean, remarks varchar,
reported_date timestamp, wildlife_size varchar, sky_conditions varchar, wildlife_species varchar,
when_time_hhmm varchar, time_of_day varchar, pilot_warned varchar,
cost_out_of_service varchar, cost_other varchar, cost_repair varchar, cost_total varchar,
miles_from_airport varchar, feet_above_ground varchar, num_human_fatalities integer,
num_injured integer, speed_knots varchar
) SERVER postgres_server OPTIONS (table_name 'bird_strikes');
PostgreSQL Wrapper
●

Only requests columns that are needed
test=# explain verbose select airport, flight_date from bird_strikes;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes

(cost=100.00..148.40 rows=1280 width=40)

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes
(3 rows)
PostgreSQL Wrapper
●

Sends a WHERE clause
test=# explain verbose select airport, flight_date from
bird_strikes where flight_date > '2011-01-01';
QUERY PLAN
-----------------------------------------------------------------Foreign Scan on public.bird_strikes
rows=427 width=40)

(cost=100.00..134.54

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM
public.bird_strikes WHERE ((flight_date > '2011-01-01
00:00:00'::timestamp without time zone))
(3 rows)
PostgreSQL Wrapper
●

Sends built-in immutable functions
test=# explain verbose select airport, flight_date from bird_strikes where flight_date
> '2011-01-01' and length(airport) < 10;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes

(cost=100.00..135.24 rows=142 width=40)

Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date
> '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10))
(3 rows)
PostgreSQL Wrapper
●

Writable (INSERT, UPDATE, DELETE)
test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339;
QUERY PLAN
------------------------------------------------------------------------------Update on public.bird_strikes

(cost=100.00..111.05 rows=1 width=964)

Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1
->

Foreign Scan on public.bird_strikes

(cost=100.00..111.05 rows=1 width=964)

Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck,
impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freefo
rm_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected,
wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, w
hen_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport,
feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid
Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect,
location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_en
gines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks,
reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time
_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground,
num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE (
(record_id = 313339)) FOR UPDATE
(5 rows)
PostgreSQL Wrapper
●

Writes are transactional
test=# select airport from bird_strikes where record_id = 313339;
airport
--------Unknown
(1 row)
test=# BEGIN;
BEGIN
test=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339;
UPDATE 1
test=# ROLLBACK;
ROLLBACK
test=# select airport from bird_strikes where record_id = 313339;
airport
--------Unknown
(1 row)
Limitations
●

Aggregates are not pushed down
test=# explain verbose select count(*) from bird_strikes;
QUERY PLAN
--------------------------------------------------------------------------------------------------------Aggregate

(cost=220.92..220.93 rows=1 width=0)

Output: count(*)
->

Foreign Scan on public.bird_strikes

(cost=100.00..212.39 rows=3413 width=0)

Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect,
location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engi
nes, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian,
remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_o
f_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport,
feet_above_ground, num_human_fatalities, num_injured, speed_knots
Remote SQL: SELECT NULL FROM public.bird_strikes
(5 rows)
Limitations
●

ORDER BY, GROUP BY, LIMIT not pushed down
test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5;
QUERY PLAN
------------------------------------------------------------------------------------------Limit

(cost=169.66..169.67 rows=5 width=40)

Output: flight_num, flight_date
->

Sort

(cost=169.66..172.86 rows=1280 width=40)

Output: flight_num, flight_date
Sort Key: bird_strikes.flight_date
->

Foreign Scan on public.bird_strikes

(cost=100.00..148.40 rows=1280 width=40)

Output: flight_num, flight_date
Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes
(8 rows)
Limitations
●

Joins not pushed down
test=# explain verbose select s.name, b.flight_date
test-# from bird_strikes b, state_code s
test-# where b.location = s.abbreviation and flight_date > '2011-01-01';
QUERY PLAN
------------------------------------------------------------------------------Hash Join

(cost=239.88..349.95 rows=1986 width=40)

Output: s.name, b.flight_date
Hash Cond: ((s.abbreviation)::text = (b.location)::text)
->

Foreign Scan on public.state_code s

(cost=100.00..137.90 rows=930 width=64)

Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press,
s.standard_federal_region, s.census_region, s.census_region_name, s.cen
sus_division, s.census_devision_name, s.circuit_court
Remote SQL: SELECT name, abbreviation FROM public.state_code
->

Hash

(cost=134.54..134.54 rows=427 width=40)

Output: b.flight_date, b.location
->

Foreign Scan on public.bird_strikes b

(cost=100.00..134.54 rows=427 width=40)

Output: b.flight_date, b.location
Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp
without time zone))
(11 rows)
Limitations (Gotcha)
●

Sometimes the foreign tables don't act like tables
test=# SELECT l.*, w.lat, w.lng
FROM leads l, www_fdw_geocoder_google w
WHERE w.address = l.address || ',' || l.city || ',' || l.state;
first_name | last_name | company_name | address | city | county |
state | zip | phone1 | phone2 | email | web | lat | lng
------------+-----------+--------------+---------+------+-------+-------+-----+--------+--------+-------+-----+-----+----(0 rows)
Limitations (Gotcha)
QUERY PLAN
------------------------------------------------------------------------------------------Merge Join

(cost=187.47..215.47 rows=1000 width=448)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat,
w.lng
Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address)
->

Sort

(cost=37.64..38.14 rows=200 width=384)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web,
(((((l.address || ','::text) || l.city) || ','::text) || l.state
))
Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state))
->

Foreign Scan on public.leads l

(cost=0.00..30.00 rows=200 width=384)

Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email,
l.web, ((((l.address || ','::text) || l.city) || ','::text) || l.
state)
Foreign File: /tmp/us-500.csv
Foreign File Size: 81485
->

Sort

(cost=149.83..152.33 rows=1000 width=96)

Output: w.lat, w.lng, w.address
Sort Key: w.address
->

Foreign Scan on public.www_fdw_geocoder_google w
Output: w.lat, w.lng, w.address
WWW API: Request

(16 rows)

(cost=0.00..100.00 rows=1000 width=96)
Limitations (Gotcha)
CREATE OR REPLACE FUNCTION google_geocode(
OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text,
OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text)
RETURNS SETOF RECORD AS $$
DECLARE
r

record;

f_adr text;
l_lat text;
l_lng text;
BEGIN
FOR r IN SELECT * FROM leads LOOP
f_adr := r.address || ',' || r.city || ',' || r.state;
EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1'
INTO l_lat, l_lng
USING f_adr;
SELECT

r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip,
r.phone1, r.phone2, r.email, r.web, l_lat, l_lng

INTO first_name, last_name, company_name, address, city, county, state, zip,
phone1, phone2, email, web, lat, lng;
RETURN NEXT;
END LOOP;
END $$ LANGUAGE plpgsql;
Writing a new FDW
●

Might not need to write one if there is a http interface

●

Use the Blackhole as a template
–

https://bitbucket.org/adunstan/blackhole_fdw
Writing a new FDW
Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){
...
/* these are required */
fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize;
fdwroutine->GetForeignPaths = blackholeGetForeignPaths;
fdwroutine->GetForeignPlan = blackholeGetForeignPlan;
fdwroutine->BeginForeignScan = blackholeBeginForeignScan;
fdwroutine->IterateForeignScan = blackholeIterateForeignScan;
fdwroutine->ReScanForeignScan = blackholeReScanForeignScan;
fdwroutine->EndForeignScan = blackholeEndForeignScan;
/* remainder are optional - use NULL if not required */
/* support for insert / update / delete */
fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets;
fdwroutine->PlanForeignModify = blackholePlanForeignModify;
fdwroutine->BeginForeignModify = blackholeBeginForeignModify;
fdwroutine->ExecForeignInsert = blackholeExecForeignInsert;
fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate;
fdwroutine->ExecForeignDelete = blackholeExecForeignDelete;
fdwroutine->EndForeignModify = blackholeEndForeignModify;
/* support for EXPLAIN */
fdwroutine->ExplainForeignScan = blackholeExplainForeignScan;
fdwroutine->ExplainForeignModify = blackholeExplainForeignModify;
/* support for ANALYSE */
fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
Future
●

Even more Wrappers

●

Check Constraints on Foreign Tables
–

●

Allows partitioning

Joins
–

Custom Scan API
●

Probably will not be the way to do this, but progress being made
Questions?
jimm@openscg.com
@jim_mlodgenski

Contenu connexe

Tendances

Tendances (20)

Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling StrategiesWebscale PostgreSQL - JSONB and Horizontal Scaling Strategies
Webscale PostgreSQL - JSONB and Horizontal Scaling Strategies
 
PostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized WorldPostgreSQL High Availability in a Containerized World
PostgreSQL High Availability in a Containerized World
 
Camel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdfCamel JBang - Quarkus Insights.pdf
Camel JBang - Quarkus Insights.pdf
 
SQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they workSQL Transactions - What they are good for and how they work
SQL Transactions - What they are good for and how they work
 
Real World Event Sourcing and CQRS
Real World Event Sourcing and CQRSReal World Event Sourcing and CQRS
Real World Event Sourcing and CQRS
 
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
 
Lessons Learned: Troubleshooting Replication
Lessons Learned: Troubleshooting ReplicationLessons Learned: Troubleshooting Replication
Lessons Learned: Troubleshooting Replication
 
[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode[DockerCon 2019] Hardening Docker daemon with Rootless mode
[DockerCon 2019] Hardening Docker daemon with Rootless mode
 
Docker Networking Tip - Macvlan driver
Docker Networking Tip - Macvlan driverDocker Networking Tip - Macvlan driver
Docker Networking Tip - Macvlan driver
 
zeromq
zeromqzeromq
zeromq
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
Getting Started - Ansible Galaxy NG
Getting Started - Ansible Galaxy NGGetting Started - Ansible Galaxy NG
Getting Started - Ansible Galaxy NG
 
PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language PromQL Deep Dive - The Prometheus Query Language
PromQL Deep Dive - The Prometheus Query Language
 
CQRS and Event Sourcing
CQRS and Event Sourcing CQRS and Event Sourcing
CQRS and Event Sourcing
 
Understanding docker networking
Understanding docker networkingUnderstanding docker networking
Understanding docker networking
 
Pro Postgres 9
Pro Postgres 9Pro Postgres 9
Pro Postgres 9
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka, Un système distribué de messagerie hautement performant
Apache Kafka, Un système distribué de messagerie hautement performantApache Kafka, Un système distribué de messagerie hautement performant
Apache Kafka, Un système distribué de messagerie hautement performant
 
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...How to test infrastructure code: automated testing for Terraform, Kubernetes,...
How to test infrastructure code: automated testing for Terraform, Kubernetes,...
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016
 

Similaire à Postgresql Federation

FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FromDual GmbH
 

Similaire à Postgresql Federation (20)

2013 Collaborate - OAUG - Presentation
2013 Collaborate - OAUG - Presentation2013 Collaborate - OAUG - Presentation
2013 Collaborate - OAUG - Presentation
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdfProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
 
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash courseCodepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
Codepot - Pig i Hive: szybkie wprowadzenie / Pig and Hive crash course
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and GotchasPostgreSQL Procedural Languages: Tips, Tricks and Gotchas
PostgreSQL Procedural Languages: Tips, Tricks and Gotchas
 
Drupal 8 migrate!
Drupal 8 migrate!Drupal 8 migrate!
Drupal 8 migrate!
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouseApplication Monitoring using Open Source: VictoriaMetrics - ClickHouse
Application Monitoring using Open Source: VictoriaMetrics - ClickHouse
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
MySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users GroupMySQL Workbench for DFW Unix Users Group
MySQL Workbench for DFW Unix Users Group
 
Creating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at ScaleCreating PostgreSQL-as-a-Service at Scale
Creating PostgreSQL-as-a-Service at Scale
 
JavaScript client API for Google Apps Script API primer
JavaScript client API for Google Apps Script API primerJavaScript client API for Google Apps Script API primer
JavaScript client API for Google Apps Script API primer
 
OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013OQGraph @ SCaLE 11x 2013
OQGraph @ SCaLE 11x 2013
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
FOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with GaleraFOSDEM 2012: MySQL synchronous replication in practice with Galera
FOSDEM 2012: MySQL synchronous replication in practice with Galera
 
Love Your Database Railsconf 2017
Love Your Database Railsconf 2017Love Your Database Railsconf 2017
Love Your Database Railsconf 2017
 
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
PostgreSQL - масштабирование в моде, Valentine Gogichashvili (Zalando SE)
 

Plus de Jim Mlodgenski

Scaling PostreSQL with Stado
Scaling PostreSQL with StadoScaling PostreSQL with Stado
Scaling PostreSQL with Stado
Jim Mlodgenski
 

Plus de Jim Mlodgenski (10)

Strategic autovacuum
Strategic autovacuumStrategic autovacuum
Strategic autovacuum
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Oracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakesOracle postgre sql-mirgration-top-10-mistakes
Oracle postgre sql-mirgration-top-10-mistakes
 
Profiling PL/pgSQL
Profiling PL/pgSQLProfiling PL/pgSQL
Profiling PL/pgSQL
 
Debugging Your PL/pgSQL Code
Debugging Your PL/pgSQL CodeDebugging Your PL/pgSQL Code
Debugging Your PL/pgSQL Code
 
An Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL TriggersAn Introduction To PostgreSQL Triggers
An Introduction To PostgreSQL Triggers
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Scaling PostreSQL with Stado
Scaling PostreSQL with StadoScaling PostreSQL with Stado
Scaling PostreSQL with Stado
 
Multi-Master Replication with Slony
Multi-Master Replication with SlonyMulti-Master Replication with Slony
Multi-Master Replication with Slony
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Postgresql Federation

  • 2. Who Am I? ● Jim Mlodgenski – – ● jimm@openscg.com @jim_mlodgenski Co-organizer of – – ● NYC PUG (www.nycpug.org) Philly PUG (www.phlpug.org) CTO, OpenSCG – www.openscg.com
  • 4. What is a federated database? “A federated database system is a type of meta-database management system (DBMS), which transparently maps multiple autonomous database systems into a single federated database. The constituent databases are interconnected via a computer network and may be geographically decentralized. ... There is no actual data integration in the constituent disparate databases as a result of data federation.” -Wikipedia
  • 5. How does PostgreSQL do it? ● Uses Foreign Table Wrappers (FDW) ● Used with SQL/MED – – Management of External Data – ● New ANIS SQL 2003 Extension Standard way of handling remote objects in SQL databases Wrappers used by SQL/MED to access remotes data sources
  • 6. Types of Foreign Data Wrappers ● SQL ● NoSQL ● File ● Miscellaneous ● PostgreSQL
  • 8. SQL Wrappers CREATE SERVER oracle_server FOREIGN DATA WRAPPER oracle_fdw OPTIONS (dbserver 'ORACLE_DBNAME'); CREATE USER MAPPING FOR CURRENT_USER SERVER oracle_server OPTIONS (user 'scott', password 'tiger'); CREATE FOREIGN TABLE fdw_test ( userid numeric, username text, email text ) SERVER oracle_server OPTIONS ( schema 'scott', table 'fdw_test'); postgres=# select * from fdw_test; userid | username | email --------+----------+------------------1 | scott (1 row) | scott@oracle.com
  • 10. NoSQL Wrappers CREATE SERVER mongo_server FOREIGN DATA WRAPPER mongo_fdw OPTIONS (address '192.168.122.47', port '27017'); CREATE FOREIGN TABLE databases ( _id NAME, name TEXT ) SERVER mongo_server OPTIONS (database 'mydb', collection 'pgData'); test=# select * from databases ; _id | name --------------------------+-----------52fd49bfba3ae4ea54afc459 | mongo 52fd49bfba3ae4ea54afc45a | postgresql 52fd49bfba3ae4ea54afc45b | oracle 52fd49bfba3ae4ea54afc45c | mysql 52fd49bfba3ae4ea54afc45d | redis 52fd49bfba3ae4ea54afc45e | db2 (6 rows)
  • 11. File Wrappers ● Delimited files ● Fixed length files ● JSON files
  • 12. File Wrappers CREATE SERVER pg_load FOREIGN DATA WRAPPER file_fdw; CREATE FOREIGN TABLE leads ( first_name text, last_name text, company_name text, address text, city text, county text, state text, zip text, phone1 text, phone2 text, email text, web text ) SERVER pg_load OPTIONS ( filename '/tmp/us-500.csv', format 'csv', header 'TRUE' ); test=# select first_name || ' ' || last_name as full_name, email from leads limit 3; full_name | email -------------------+------------------------------James Butt | jbutt@gmail.com Josephine Darakjy | josephine_darakjy@darakjy.org Art Venere (3 rows) | art@venere.org
  • 14. Hadoop Wrapper CREATE SERVER hive_server FOREIGN DATA WRAPPER hive_fdw OPTIONS (address '127.0.0.1', port '10000'); CREATE USER MAPPING FOR PUBLIC SERVER hive_server; CREATE FOREIGN TABLE order_line ( ol_w_id integer, ol_d_id integer, ol_o_id integer, ol_number integer, ol_i_id integer, ol_delivery_d timestamp, ol_amount decimal(6,2), ol_supply_w_id integer, ol_quantity decimal(2,0), ol_dist_info varchar(24) ) SERVER hive_server OPTIONS (table 'order_line'); INSERT INTO item_sale_month SELECT ol_i_id as i_id, EXTRACT(YEAR FROM ol_delivery_d) as year, EXTRACT(MONTH FROM ol_delivery_d) as month, sum(ol_amount) as amount FROM order_line GROUP BY 1, 2, 3;
  • 15. Hadoop Wrapper ● Hadoop foreign tables can also be writable CREATE FORIEGN TABLE audit ( audit_id bigint, event_d timestamp, table varchar, action varchar, user varchar, ) SERVER hive_server OPTIONS (table 'audit', flume_port '44444'); INSERT INTO audit VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
  • 16. Hadoop Wrapper ● It also works with HBase tables CREATE FOREIGN TABLE hive_hbase_table ( key varchar, value varchar ) SERVER localhive OPTIONS (table 'hbase_table', hbase_address 'localhost', hbase_port '9090', hbase_mapping ':key,cf:val'); INSERT INTO hive_hbase_table VALUES ('key1', 'value1'); INSERT INTO hive_hbase_table VALUES ('key2', 'value2'); UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2'; DELETE FROM hive_hbase_table WHERE key='key1'; SELECT * from hive_hbase_table;
  • 17. WWW Wrapper CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0'); CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search; CREATE FOREIGN TABLE www_fdw_google_search ( q text, GsearchResultClass text, unescapedUrl text, url text, visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text ) SERVER www_fdw_server_google_search; select url,substring(title,1,25)||'...',substring(content,1,25)||'...' from www_fdw_google_search where q='postgresql fdw'; url | ?column? | ?column? -------------------------------------------------------------+------------------------------+-----------------------------http://wiki.postgresql.org/wiki/Foreign_data_wrappers | Foreign data wrappers - <... | Jan 24, 2014 <b>...</b> 1... http://www.postgresql.org/docs/9.3/static/postgres-fdw.html | <b>PostgreSQL</b>: Docume... | F.31.1. <b>FDW</b> Option... http://www.postgresql.org/docs/9.3/static/fdwhandler.html | <b>PostgreSQL</b>: Docume... | Foreign Data Wrapper Call... http://www.craigkerstiens.com/2013/08/05/a-look-at-FDWs/ | A look at Foreign Data Wr... | Aug 5, 2013 <b>...</b> An... (4 rows)
  • 18. PostgreSQL Wrapper ● The most functional FDW by far ● Replaces much of the functionality of dblink ● Shipped as a contrib module
  • 19. PostgreSQL Wrapper CREATE SERVER postgres_server FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost', port '5432', dbname 'test2'); CREATE USER MAPPING FOR PUBLIC SERVER postgres_server; CREATE FOREIGN TABLE bird_strikes ( aircraft_type varchar, airport varchar, altitude varchar, aircraft_model varchar, num_wildlife_struck varchar, impact_to_flight varchar, effect varchar, location varchar, flight_num varchar, flight_date timestamp, record_id int, indicated_damage varchar, freeform_en_route varchar, num_engines varchar, airline varchar, origin_state varchar, phase_of_flight varchar, precipitation varchar, wildlife_collected boolean, wildlife_sent_to_smithsonian boolean, remarks varchar, reported_date timestamp, wildlife_size varchar, sky_conditions varchar, wildlife_species varchar, when_time_hhmm varchar, time_of_day varchar, pilot_warned varchar, cost_out_of_service varchar, cost_other varchar, cost_repair varchar, cost_total varchar, miles_from_airport varchar, feet_above_ground varchar, num_human_fatalities integer, num_injured integer, speed_knots varchar ) SERVER postgres_server OPTIONS (table_name 'bird_strikes');
  • 20. PostgreSQL Wrapper ● Only requests columns that are needed test=# explain verbose select airport, flight_date from bird_strikes; QUERY PLAN ------------------------------------------------------------------------------Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes (3 rows)
  • 21. PostgreSQL Wrapper ● Sends a WHERE clause test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01'; QUERY PLAN -----------------------------------------------------------------Foreign Scan on public.bird_strikes rows=427 width=40) (cost=100.00..134.54 Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) (3 rows)
  • 22. PostgreSQL Wrapper ● Sends built-in immutable functions test=# explain verbose select airport, flight_date from bird_strikes where flight_date > '2011-01-01' and length(airport) < 10; QUERY PLAN ------------------------------------------------------------------------------Foreign Scan on public.bird_strikes (cost=100.00..135.24 rows=142 width=40) Output: airport, flight_date Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10)) (3 rows)
  • 23. PostgreSQL Wrapper ● Writable (INSERT, UPDATE, DELETE) test=# explain verbose update bird_strikes set airport = 'Unknown' where record_id = 313339; QUERY PLAN ------------------------------------------------------------------------------Update on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Remote SQL: UPDATE public.bird_strikes SET airport = $2 WHERE ctid = $1 -> Foreign Scan on public.bird_strikes (cost=100.00..111.05 rows=1 width=964) Output: aircraft_type, 'Unknown'::character varying, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freefo rm_en_route, num_engines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, w hen_time_hhmm, time_of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid Remote SQL: SELECT aircraft_type, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_en gines, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time _of_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots, ctid FROM public.bird_strikes WHERE ( (record_id = 313339)) FOR UPDATE (5 rows)
  • 24. PostgreSQL Wrapper ● Writes are transactional test=# select airport from bird_strikes where record_id = 313339; airport --------Unknown (1 row) test=# BEGIN; BEGIN test=# update bird_strikes set airport = 'UNKNOWN' where record_id = 313339; UPDATE 1 test=# ROLLBACK; ROLLBACK test=# select airport from bird_strikes where record_id = 313339; airport --------Unknown (1 row)
  • 25. Limitations ● Aggregates are not pushed down test=# explain verbose select count(*) from bird_strikes; QUERY PLAN --------------------------------------------------------------------------------------------------------Aggregate (cost=220.92..220.93 rows=1 width=0) Output: count(*) -> Foreign Scan on public.bird_strikes (cost=100.00..212.39 rows=3413 width=0) Output: aircraft_type, airport, altitude, aircraft_model, num_wildlife_struck, impact_to_flight, effect, location, flight_num, flight_date, record_id, indicated_damage, freeform_en_route, num_engi nes, airline, origin_state, phase_of_flight, precipitation, wildlife_collected, wildlife_sent_to_smithsonian, remarks, reported_date, wildlife_size, sky_conditions, wildlife_species, when_time_hhmm, time_o f_day, pilot_warned, cost_out_of_service, cost_other, cost_repair, cost_total, miles_from_airport, feet_above_ground, num_human_fatalities, num_injured, speed_knots Remote SQL: SELECT NULL FROM public.bird_strikes (5 rows)
  • 26. Limitations ● ORDER BY, GROUP BY, LIMIT not pushed down test=# explain verbose select flight_num from bird_strikes order by flight_date limit 5; QUERY PLAN ------------------------------------------------------------------------------------------Limit (cost=169.66..169.67 rows=5 width=40) Output: flight_num, flight_date -> Sort (cost=169.66..172.86 rows=1280 width=40) Output: flight_num, flight_date Sort Key: bird_strikes.flight_date -> Foreign Scan on public.bird_strikes (cost=100.00..148.40 rows=1280 width=40) Output: flight_num, flight_date Remote SQL: SELECT flight_num, flight_date FROM public.bird_strikes (8 rows)
  • 27. Limitations ● Joins not pushed down test=# explain verbose select s.name, b.flight_date test-# from bird_strikes b, state_code s test-# where b.location = s.abbreviation and flight_date > '2011-01-01'; QUERY PLAN ------------------------------------------------------------------------------Hash Join (cost=239.88..349.95 rows=1986 width=40) Output: s.name, b.flight_date Hash Cond: ((s.abbreviation)::text = (b.location)::text) -> Foreign Scan on public.state_code s (cost=100.00..137.90 rows=930 width=64) Output: s.id, s.name, s.abbreviation, s.country, s.type, s.sort, s.status, s.occupied, s.notes, s.fips_state, s.assoc_press, s.standard_federal_region, s.census_region, s.census_region_name, s.cen sus_division, s.census_devision_name, s.circuit_court Remote SQL: SELECT name, abbreviation FROM public.state_code -> Hash (cost=134.54..134.54 rows=427 width=40) Output: b.flight_date, b.location -> Foreign Scan on public.bird_strikes b (cost=100.00..134.54 rows=427 width=40) Output: b.flight_date, b.location Remote SQL: SELECT location, flight_date FROM public.bird_strikes WHERE ((flight_date > '2011-01-01 00:00:00'::timestamp without time zone)) (11 rows)
  • 28. Limitations (Gotcha) ● Sometimes the foreign tables don't act like tables test=# SELECT l.*, w.lat, w.lng FROM leads l, www_fdw_geocoder_google w WHERE w.address = l.address || ',' || l.city || ',' || l.state; first_name | last_name | company_name | address | city | county | state | zip | phone1 | phone2 | email | web | lat | lng ------------+-----------+--------------+---------+------+-------+-------+-----+--------+--------+-------+-----+-----+----(0 rows)
  • 29. Limitations (Gotcha) QUERY PLAN ------------------------------------------------------------------------------------------Merge Join (cost=187.47..215.47 rows=1000 width=448) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, w.lat, w.lng Merge Cond: ((((((l.address || ','::text) || l.city) || ','::text) || l.state)) = w.address) -> Sort (cost=37.64..38.14 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, (((((l.address || ','::text) || l.city) || ','::text) || l.state )) Sort Key: (((((l.address || ','::text) || l.city) || ','::text) || l.state)) -> Foreign Scan on public.leads l (cost=0.00..30.00 rows=200 width=384) Output: l.first_name, l.last_name, l.company_name, l.address, l.city, l.county, l.state, l.zip, l.phone1, l.phone2, l.email, l.web, ((((l.address || ','::text) || l.city) || ','::text) || l. state) Foreign File: /tmp/us-500.csv Foreign File Size: 81485 -> Sort (cost=149.83..152.33 rows=1000 width=96) Output: w.lat, w.lng, w.address Sort Key: w.address -> Foreign Scan on public.www_fdw_geocoder_google w Output: w.lat, w.lng, w.address WWW API: Request (16 rows) (cost=0.00..100.00 rows=1000 width=96)
  • 30. Limitations (Gotcha) CREATE OR REPLACE FUNCTION google_geocode( OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text, OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text) RETURNS SETOF RECORD AS $$ DECLARE r record; f_adr text; l_lat text; l_lng text; BEGIN FOR r IN SELECT * FROM leads LOOP f_adr := r.address || ',' || r.city || ',' || r.state; EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1' INTO l_lat, l_lng USING f_adr; SELECT r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip, r.phone1, r.phone2, r.email, r.web, l_lat, l_lng INTO first_name, last_name, company_name, address, city, county, state, zip, phone1, phone2, email, web, lat, lng; RETURN NEXT; END LOOP; END $$ LANGUAGE plpgsql;
  • 31. Writing a new FDW ● Might not need to write one if there is a http interface ● Use the Blackhole as a template – https://bitbucket.org/adunstan/blackhole_fdw
  • 32. Writing a new FDW Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){ ... /* these are required */ fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize; fdwroutine->GetForeignPaths = blackholeGetForeignPaths; fdwroutine->GetForeignPlan = blackholeGetForeignPlan; fdwroutine->BeginForeignScan = blackholeBeginForeignScan; fdwroutine->IterateForeignScan = blackholeIterateForeignScan; fdwroutine->ReScanForeignScan = blackholeReScanForeignScan; fdwroutine->EndForeignScan = blackholeEndForeignScan; /* remainder are optional - use NULL if not required */ /* support for insert / update / delete */ fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets; fdwroutine->PlanForeignModify = blackholePlanForeignModify; fdwroutine->BeginForeignModify = blackholeBeginForeignModify; fdwroutine->ExecForeignInsert = blackholeExecForeignInsert; fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate; fdwroutine->ExecForeignDelete = blackholeExecForeignDelete; fdwroutine->EndForeignModify = blackholeEndForeignModify; /* support for EXPLAIN */ fdwroutine->ExplainForeignScan = blackholeExplainForeignScan; fdwroutine->ExplainForeignModify = blackholeExplainForeignModify; /* support for ANALYSE */ fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable; PG_RETURN_POINTER(fdwroutine); }
  • 33. Future ● Even more Wrappers ● Check Constraints on Foreign Tables – ● Allows partitioning Joins – Custom Scan API ● Probably will not be the way to do this, but progress being made