As more and more alternative data stores come into use, the problem of being able to easily use and report on the data scattered across those data stores becomes increasingly difficult. PostgreSQL has a feature called Foreign Data Wrappers that allows external data sources to be queried from PostgreSQL and look like a standard table. Using Foreign Data Wrappers, users can create a report that joins data residing in Oracle, Hadoop and MongoDB all in a single query.
In this talk, we'll discuss how to set up a Foreign Data Wrapper for various data sources and the pros and cons using them. We'll also discuss the growing ecosystem of Foreign Data Wrapper and a little about how to write one.
4. What is a federated database?
“A federated database system is a type of meta-database
management system (DBMS), which transparently maps
multiple autonomous database systems into a single federated
database. The constituent databases are interconnected via a
computer network and may be geographically decentralized. ...
There is no actual data integration in the constituent disparate
databases as a result of data federation.”
-Wikipedia
5. How does PostgreSQL do it?
●
Uses Foreign Table Wrappers (FDW)
●
Used with SQL/MED
–
–
Management of External Data
–
●
New ANIS SQL 2003 Extension
Standard way of handling remote objects in SQL databases
Wrappers used by SQL/MED to access remotes data
sources
6. Types of Foreign Data Wrappers
●
SQL
●
NoSQL
●
File
●
Miscellaneous
●
PostgreSQL
14. Hadoop Wrapper
CREATE SERVER hive_server FOREIGN DATA WRAPPER
hive_fdw OPTIONS (address '127.0.0.1', port '10000');
CREATE USER MAPPING
FOR PUBLIC SERVER hive_server;
CREATE FOREIGN TABLE order_line (
ol_w_id
integer,
ol_d_id
integer,
ol_o_id
integer,
ol_number
integer,
ol_i_id
integer,
ol_delivery_d
timestamp,
ol_amount
decimal(6,2),
ol_supply_w_id
integer,
ol_quantity
decimal(2,0),
ol_dist_info
varchar(24)
) SERVER hive_server OPTIONS (table 'order_line');
INSERT INTO item_sale_month
SELECT ol_i_id as i_id,
EXTRACT(YEAR FROM ol_delivery_d) as year,
EXTRACT(MONTH FROM ol_delivery_d) as month,
sum(ol_amount) as amount
FROM order_line
GROUP BY 1, 2, 3;
15. Hadoop Wrapper
●
Hadoop foreign tables can also be writable
CREATE FORIEGN TABLE audit (
audit_id
bigint,
event_d
timestamp,
table
varchar,
action
varchar,
user
varchar,
) SERVER hive_server
OPTIONS (table 'audit',
flume_port '44444');
INSERT INTO audit
VALUES (nextval('audit_id_seq'), now(), 'users', 'SELECT', 'scott');
16. Hadoop Wrapper
●
It also works with HBase tables
CREATE FOREIGN TABLE hive_hbase_table (
key
varchar,
value varchar
) SERVER localhive
OPTIONS (table 'hbase_table', hbase_address 'localhost',
hbase_port '9090', hbase_mapping ':key,cf:val');
INSERT INTO hive_hbase_table VALUES ('key1', 'value1');
INSERT INTO hive_hbase_table VALUES ('key2', 'value2');
UPDATE hive_hbase_table SET value = 'update' WHERE key = 'key2';
DELETE FROM hive_hbase_table WHERE key='key1';
SELECT * from hive_hbase_table;
17. WWW Wrapper
CREATE SERVER www_fdw_server_google_search FOREIGN DATA WRAPPER www_fdw
OPTIONS (uri 'https://ajax.googleapis.com/ajax/services/search/web?v=1.0');
CREATE USER MAPPING FOR current_user SERVER www_fdw_server_google_search;
CREATE FOREIGN TABLE www_fdw_google_search (
q text, GsearchResultClass text, unescapedUrl text, url text,
visibleUrl text, cacheUrl text, title text, titleNoFormatting text, content text
) SERVER www_fdw_server_google_search;
select url,substring(title,1,25)||'...',substring(content,1,25)||'...'
from www_fdw_google_search where q='postgresql fdw';
url
|
?column?
|
?column?
-------------------------------------------------------------+------------------------------+-----------------------------http://wiki.postgresql.org/wiki/Foreign_data_wrappers
| Foreign data wrappers - <... | Jan 24, 2014 <b>...</b> 1...
http://www.postgresql.org/docs/9.3/static/postgres-fdw.html | <b>PostgreSQL</b>: Docume... | F.31.1. <b>FDW</b> Option...
http://www.postgresql.org/docs/9.3/static/fdwhandler.html
| <b>PostgreSQL</b>: Docume... | Foreign Data Wrapper Call...
http://www.craigkerstiens.com/2013/08/05/a-look-at-FDWs/
| A look at Foreign Data Wr... | Aug 5, 2013 <b>...</b> An...
(4 rows)
18. PostgreSQL Wrapper
●
The most functional FDW by far
●
Replaces much of the functionality of dblink
●
Shipped as a contrib module
20. PostgreSQL Wrapper
●
Only requests columns that are needed
test=# explain verbose select airport, flight_date from bird_strikes;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes
(cost=100.00..148.40 rows=1280 width=40)
Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes
(3 rows)
21. PostgreSQL Wrapper
●
Sends a WHERE clause
test=# explain verbose select airport, flight_date from
bird_strikes where flight_date > '2011-01-01';
QUERY PLAN
-----------------------------------------------------------------Foreign Scan on public.bird_strikes
rows=427 width=40)
(cost=100.00..134.54
Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM
public.bird_strikes WHERE ((flight_date > '2011-01-01
00:00:00'::timestamp without time zone))
(3 rows)
22. PostgreSQL Wrapper
●
Sends built-in immutable functions
test=# explain verbose select airport, flight_date from bird_strikes where flight_date
> '2011-01-01' and length(airport) < 10;
QUERY PLAN
------------------------------------------------------------------------------Foreign Scan on public.bird_strikes
(cost=100.00..135.24 rows=142 width=40)
Output: airport, flight_date
Remote SQL: SELECT airport, flight_date FROM public.bird_strikes WHERE ((flight_date
> '2011-01-01 00:00:00'::timestamp without time zone)) AND ((length(airport) < 10))
(3 rows)
30. Limitations (Gotcha)
CREATE OR REPLACE FUNCTION google_geocode(
OUT first_name text, OUT last_name text, OUT company_name text, OUT address text, OUT city text, OUT county text,
OUT state text, OUT zip text, OUT phone1 text, OUT phone2 text, OUT email text, OUT web text, OUT lat text, OUT lng text)
RETURNS SETOF RECORD AS $$
DECLARE
r
record;
f_adr text;
l_lat text;
l_lng text;
BEGIN
FOR r IN SELECT * FROM leads LOOP
f_adr := r.address || ',' || r.city || ',' || r.state;
EXECUTE 'SELECT lat, lng FROM www_fdw_geocoder_google WHERE address = $1'
INTO l_lat, l_lng
USING f_adr;
SELECT
r.first_name, r.last_name, r.company_name, r.address, r.city, r.county, r.state, r.zip,
r.phone1, r.phone2, r.email, r.web, l_lat, l_lng
INTO first_name, last_name, company_name, address, city, county, state, zip,
phone1, phone2, email, web, lat, lng;
RETURN NEXT;
END LOOP;
END $$ LANGUAGE plpgsql;
31. Writing a new FDW
●
Might not need to write one if there is a http interface
●
Use the Blackhole as a template
–
https://bitbucket.org/adunstan/blackhole_fdw
32. Writing a new FDW
Datum blackhole_fdw_handler(PG_FUNCTION_ARGS){
...
/* these are required */
fdwroutine->GetForeignRelSize = blackholeGetForeignRelSize;
fdwroutine->GetForeignPaths = blackholeGetForeignPaths;
fdwroutine->GetForeignPlan = blackholeGetForeignPlan;
fdwroutine->BeginForeignScan = blackholeBeginForeignScan;
fdwroutine->IterateForeignScan = blackholeIterateForeignScan;
fdwroutine->ReScanForeignScan = blackholeReScanForeignScan;
fdwroutine->EndForeignScan = blackholeEndForeignScan;
/* remainder are optional - use NULL if not required */
/* support for insert / update / delete */
fdwroutine->AddForeignUpdateTargets = blackholeAddForeignUpdateTargets;
fdwroutine->PlanForeignModify = blackholePlanForeignModify;
fdwroutine->BeginForeignModify = blackholeBeginForeignModify;
fdwroutine->ExecForeignInsert = blackholeExecForeignInsert;
fdwroutine->ExecForeignUpdate = blackholeExecForeignUpdate;
fdwroutine->ExecForeignDelete = blackholeExecForeignDelete;
fdwroutine->EndForeignModify = blackholeEndForeignModify;
/* support for EXPLAIN */
fdwroutine->ExplainForeignScan = blackholeExplainForeignScan;
fdwroutine->ExplainForeignModify = blackholeExplainForeignModify;
/* support for ANALYSE */
fdwroutine->AnalyzeForeignTable = blackholeAnalyzeForeignTable;
PG_RETURN_POINTER(fdwroutine);
}
33. Future
●
Even more Wrappers
●
Check Constraints on Foreign Tables
–
●
Allows partitioning
Joins
–
Custom Scan API
●
Probably will not be the way to do this, but progress being made