Presentation on sept 21, 2013 at FOSS4G 2013 in Nottingham (UK). Stetl, Streaming ETL, is a lightweight, geospatial ETL-framework written in Python, integrating transformation tools like GDAL/OGR, XSLT and PostGIS. Stetl targets ETL cases that involve XML and GML data, like INSPIRE data harmonization, but other transformations, even non-geospatial, can also be made. Stetl applies declarative programming: a configuration file specifies an ETL chain of input/filter/output modules. Stetl uses native calls to C-level libraries like libxml2 (via lxml) for speed. See more at http://stetl.org
Watch this presentation video recording on FOSSLC: http://www.fosslc.org/drupal/content/taming-rich-gml-stetl-lightweight-python-framework-geospatial-etl
Boost Fertility New Invention Ups Success Rates.pdf
Taming Rich GML with Stetl - FOSS4G 2013 Nottingham
1. Taming Rich GML with Stetl
-
A lightweight Python Framework
for Geospatial ETL
Just van den Broecke
FOSS4G Nottingham 2013
Sept 21, 2013
www.justobjects.nl
1
2. About Me
Independent Open Source Geospatial Professional
Secretary OSGeo Dutch Local Chapter
Member of the Dutch OpenGeoGroep
Just van den Broecke
just@justobjects.nl
www.justobjects.nl
2
48. Example: XsltFilter Python
from util import Util, etree
from filter import Filter
from packet import FORMAT
log = Util.get_log("xsltfilter")
class XsltFilter(Filter):
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc, produces=FORMAT.etree_doc)
self.xslt_file_path = self.cfg.get('script')
self.xslt_file = open(self.xslt_file_path, 'r')
# Parse XSLT file only once
self.xslt_doc = etree.parse(self.xslt_file)
self.xslt_obj = etree.XSLT(self.xslt_doc)
self.xslt_file.close()
def invoke(self, packet):
if packet.data is None:
return packet
return self.transform(packet)
def transform(self, packet):
packet.data = self.xslt_obj(packet.data)
log.info("XSLT Transform OK")
return packet
48
49. [etl]
chains = input_xml_file|my_filter|output_std
[input_xml_file]
class = inputs.fileinput.XmlFileInput
file_path = input/cities.xml
# My custom component
[my_filter]
class = my.myfilter.MyFilter
[output_std]
class = outputs.standardoutput.StandardXmlOutput
class MyFilter(Filter):
# Constructor
def __init__(self, configdict, section):
Filter.__init__(self, configdict, section, consumes=FORMAT.etree_doc,
produces=FORMAT.etree_doc)
def invoke(self, packet):
log.info("CALLING MyFilter OK!!!!")
return packet
Your Own Components
Stetl concepts
Step 1- Define Class
Step 2- Config Class
49
50. Data Structures
Stetl concepts
• Components exchange Packets
• Packet contains data and status
• Data formats, e.g. :
xml_line_stream
etree_doc
etree_element (feature)
etree_element_array
string
any
.
.
50
52. Cases - The Netherlands
•INSPIRE Download Services
publish to deegree store (WFS)
generate GML files (for Atom Feed)
•National GML Datasets
GML to PostGIS (Top10NL, BGT)
52
53. [etl]
chains = input_sql_pre|schema_name_filter|output_postgres,
input_big_gml_files|xml_assembler|transformer_xslt|output_ogr2ogr,
input_sql_post|schema_name_filter|output_postgres
# Pre SQL file inputs to be executed
[input_sql_pre]
class = inputs.fileinput.StringFileInput
file_path = sql/drop-tables.sql,sql/create-schema.sql
# Post SQL file inputs to be executed
[input_sql_post]
class = inputs.fileinput.StringFileInput
file_path = sql/delete-duplicates.sql
# Generic filter to substitute Python-format string values like {schema} in string
[schema_name_filter]
class = filters.stringfilter.StringSubstitutionFilter
# format args {schema} is schema name
format_args = schema:{schema}
[output_postgres]
class = outputs.dboutput.PostgresDbOutput
database = {database}
host = {host}
port = {port}
user = {user}
password = {password}
schema = {schema}
# The source input file(s) from dir and produce gml:featureMember elements
[input_big_gml_files]
class = inputs.fileinput.XmlElementStreamerFileInput
file_path = {gml_files}
element_tags = featureMember
Top10NL Extract
Parameter
Substitution
53
57. Project Status - Sept 21, 2013
• v1.0.4 installable via PyPi
• Documentation on www.stetl.org
• Real world transforms done
• Seeking feedback, support and
contributors
57