SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Easy integration of geospatial 
data with the open source 
spatial ETL tool GeoKettle

1st  Rendez­vous OSGeo­Quebec

Dr. Thierry Badard, CTO

Spatialytics inc.
Quebec, Canada
tbadard@spatialytics.com
http://www.spatialytics.com

http://www.spatialytics.org 

                               Saguenay, Quebec, Canada – June 16, 2010
What is GeoKettle?
   It is part of the geospatial BI software stack developed 
    initially by the GeoSOA research group at Laval 
    University in Quebec …
       GeoKettle
       GeoMondrian
       SOLAPLayers
   But are now developed and supported by Spatialytics
       http://www.spatialytics.org (open source community)
       http://www.spatialytics.com (professional support, training, ...)
What is GeoKettle?
   It is part of the geospatial BI software stack developed 
    initially by the GeoSOA research group at Laval 
    University in Quebec …
       GeoKettle
       GeoMondrian
       SOLAPLayers
   But are now developed and supported by Spatialytics
       http://www.spatialytics.org (open source community)
       http://www.spatialytics.com (professional support, training, ...)


   OK but … what is geospatial BI? ;­)
As you probably know …
   Business Intelligence applications are usually used to 
    better understand historical, current and future aspects
    of business operations in a company. 
   The applications typically offer ways to mine database­ 
    and spreadsheet­centric data, and produce graphical, 
    table­based and other types of analytics regarding 
    business operations. 
   They support the decision process and allow to take more 
    informed decision!
Data visualization to support decision …
As you probably know …
   Business Intelligence applications are usually used to better 
    understand historical, current and future aspects
    of business operations in a company. 
   The applications typically offer ways to mine database­ and 
    spreadsheet­centric data, and produce graphical, table­based 
    and other types of analytics regarding business operations. 
   They support the decision process and allow to take more 
    informed decision!
   Rely on an architecture with robust components and 
    applications:
       ETL tools & data warehousing
       On­line Analytical Processing (OLAP) servers and clients
       Reporting tools & dashboards
       Data mining
So, an ETL tool is …
   A type of software used to populate databases or data 
    warehouses from heterogeneous data sources.
   ETL stands for:
       Extract – Extract data from data sources
       Transform – Transformation of data in order to correct errors, 
        make some data cleansing, change the data structure, make 
        them compliant to defined standards, etc.
       Load – Load transformed data into the target DBMS
   An ETL tool should manage the insertion of new data and 
    the updating of existing data.
   Should be able to perform transformations from :
       A OLTP system to another OLTP system
       A OLTP system to analytical data warehouse
Why use an ETL tool?

   Automation of complex and repetitive data 
    processing without producing any specific code
   Conversion between various data formats
   Migration of data from a DBMS to another
   Data feeding into various DBMS
   Population of analytical data warehouses for 
    decision support purposes
   etc.
GeoKettle
   GeoKettle is then a "spatially­enabled" version of 
    Pentaho Data Integration (Kettle)
   Kettle is a metadata­driven ETL with direct execution 
    of transformations
       No intermediate code generation!
   Support of several DBMS and file formats
       DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL 
        Server, ... (total of 37)
       Read/write support of various data file formats: text, Excel, 
        Access, DBF, XML, ...
   Numerous transformation steps
   Support of methods for the updating of DW
GeoKettle
   GeoKettle provides a true and consistent integration of the 
    spatial component
       All steps provided by Kettle are able to deal with geospatial data types
       Some geospatial dedicated steps have been added
   First release in May 2008: 2.5.2­20080531
   Current stable version: 3.2.0­r188­20090706
   Released under LGPL at http://www.geokettle.org 
   Used in different organisations and countries:
       Some ministries, bank, insurance, integrators, …
       E.g. GeoETL from Inova is in fact GeoKettle! :­)
   A growing community of users and developers
GeoKettle

   Transformations vs. Jobs:
     Running in parallel vs. running sequentially

   All can be stored in a central repository (database)
     But each transformation or job could also be saved in a simple 
      XML file!
   Offers different interfaces:
     Spoon: GUI for the edition of transformations and jobs

     Pan: command line interface for running transformations

     Kitchen: command line interface for running jobs

     Carte: Web service for the remote execution of transformations 
      and jobs
GeoKettle ­ Spoon
GeoKettle
   Provides support for:
     Handling geometry data types (based on JTS)

     Accessing Geometry objects in JavaScript

     It allows the definition of custom transformation steps by the 
      user (“Modified JavaScript Value” step)
     Topological predicates (Intersects, crosses, etc.)

     SRS definition and transformations

     Input / Output with some spatial DBMS
          Native support for Oracle, PostGIS and MySQL
          MS SQL Server 2008 and IBM DB2 can be used but it requires some 
           tricks
     GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR 
      support (~33 vector data formats and DBMS)
GeoKettle
   GeoKettle releases are aligned with the ones of 
    Pentaho Data Integration (Kettle), 
       GeoKettle  then benefits all new features provided by PDI 
        (Kettle). 
   Kettle is natively designed to be deployed in cluster 
    and web service environments. 
       It makes GeoKettle a perfect software component to be 
        deployed as a service (SaaS) in cloud computing 
        environments as those provided by Amazon EC2. 
       It enables then the scalable, distributed and on demand 
        processing of large and complex volumes of geospatial data 
        in minutes for critical applications and without requiring a 
        company to invest in an expensive IT infrastructure of 
        servers, networks and software.
GeoKettle – Requirements and install

   Very simple installation procedure
   All you need is a Java Runtime Environment
     Version 5 or higher

   Just unzip the binary archive of GeoKettle ...
   And let’s go !
     Run spoon.sh (UNIX/Linux/Mac) or spoon.bat (Windows)



   Need help, please visit our wiki:
      http://wiki.spatialytics.org 
GeoKettle




            ­ Demo ­
GeoKettle
   Upcoming features:
       Cartographic preview (should be available soon)
       Implementation of data matching and conflation steps in order to 
        allow geometric data cleansing and comparison of geospatial 
        datasets (Google Summer of Code in progress)
       Read/write support for other DBMS, GIS file formats and services
            LAS (LiDAR), ...
            Native support for MS SQL Server 2008, ...
            WFS­T, Sensor Web (TML, SensorML, SOS, ...), ...
       Implementation of a “Spatial analysis” step with a GUI
       New steps: Social media (Twitter, ...), OSM, generalisation, ...
       Raster support: dev. in progress of a plugin to integrate all 
        capabilities provided by the Sextante library (BeETLe)
Questions?
   Thanks for your attention and do not hesitate to ask for more demos!
   Contact:
    Dr. Thierry Badard, CTO
    Spatialytics inc.
    Quebec, Canada
    Email: tbadard@spatialytics.com  
    Web:  http://www.spatialytics.org 
              http://www.spatialytics.com  
    Twitter: tbadard & spatialytics

                                http://www.geokettle.org                         Twitter : geokettle


                                http://www.geo­mondrian.org                 Twitter : geomondrian


                                http://www.solaplayers.org                     Twitter : solaplayers

Contenu connexe

En vedette

GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
Change the Way We build - Part 3
Change the Way We build -  Part 3Change the Way We build -  Part 3
Change the Way We build - Part 3Amol Vidwans
 
Components of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISComponents of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISKaium Chowdhury
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatialSumant Diwakar
 
Data Models - GIS I
Data Models - GIS IData Models - GIS I
Data Models - GIS IJohn Reiser
 
Geospatial analytics data science sg meetup
Geospatial analytics   data science sg meetupGeospatial analytics   data science sg meetup
Geospatial analytics data science sg meetupNUS-ISS
 
STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...ALESSANDRO CAPEZZUOLI
 

En vedette (11)

Pentaho etl-tool
Pentaho etl-toolPentaho etl-tool
Pentaho etl-tool
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
Change the Way We build - Part 3
Change the Way We build -  Part 3Change the Way We build -  Part 3
Change the Way We build - Part 3
 
Components of Spatial Data Quality in GIS
Components of Spatial Data Quality in GISComponents of Spatial Data Quality in GIS
Components of Spatial Data Quality in GIS
 
Spatial Data Model
Spatial Data ModelSpatial Data Model
Spatial Data Model
 
OLAP
OLAPOLAP
OLAP
 
Spatial vs non spatial
Spatial vs non spatialSpatial vs non spatial
Spatial vs non spatial
 
Data Models - GIS I
Data Models - GIS IData Models - GIS I
Data Models - GIS I
 
Geospatial analytics data science sg meetup
Geospatial analytics   data science sg meetupGeospatial analytics   data science sg meetup
Geospatial analytics data science sg meetup
 
STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...STATVIEW: a web platform for visualisation and dissemination of statistical d...
STATVIEW: a web platform for visualisation and dissemination of statistical d...
 

Plus de Thierry Badard

Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...Thierry Badard
 
Dashboard and reporting tools: Integration of BI tools in the geospatial domain
Dashboard and reporting tools: Integration of BI tools in the geospatial domainDashboard and reporting tools: Integration of BI tools in the geospatial domain
Dashboard and reporting tools: Integration of BI tools in the geospatial domainThierry Badard
 
Open source Geospatial Business Intelligence in action with GeoMondrian and S...
Open source Geospatial Business Intelligence in action with GeoMondrian and S...Open source Geospatial Business Intelligence in action with GeoMondrian and S...
Open source Geospatial Business Intelligence in action with GeoMondrian and S...Thierry Badard
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolThierry Badard
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersThierry Badard
 
Open Source Geospatial Business Intelligence (Geo-BI)
Open Source Geospatial Business Intelligence (Geo-BI)Open Source Geospatial Business Intelligence (Geo-BI)
Open Source Geospatial Business Intelligence (Geo-BI)Thierry Badard
 

Plus de Thierry Badard (6)

Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...
Open Source Geospatial Business Intelligence (GeoBI): Definition, architectur...
 
Dashboard and reporting tools: Integration of BI tools in the geospatial domain
Dashboard and reporting tools: Integration of BI tools in the geospatial domainDashboard and reporting tools: Integration of BI tools in the geospatial domain
Dashboard and reporting tools: Integration of BI tools in the geospatial domain
 
Open source Geospatial Business Intelligence in action with GeoMondrian and S...
Open source Geospatial Business Intelligence in action with GeoMondrian and S...Open source Geospatial Business Intelligence in action with GeoMondrian and S...
Open source Geospatial Business Intelligence in action with GeoMondrian and S...
 
GeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL toolGeoKettle: A powerful open source spatial ETL tool
GeoKettle: A powerful open source spatial ETL tool
 
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayersGeospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
Geospatial Business Intelligence made easy with GeoMondrian & SOLAPLayers
 
Open Source Geospatial Business Intelligence (Geo-BI)
Open Source Geospatial Business Intelligence (Geo-BI)Open Source Geospatial Business Intelligence (Geo-BI)
Open Source Geospatial Business Intelligence (Geo-BI)
 

Dernier

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Dernier (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Easy integration of geospatial data with the open source spatial ETL tool GeoKettle

  • 2. What is GeoKettle?  It is part of the geospatial BI software stack developed  initially by the GeoSOA research group at Laval  University in Quebec …  GeoKettle  GeoMondrian  SOLAPLayers  But are now developed and supported by Spatialytics  http://www.spatialytics.org (open source community)  http://www.spatialytics.com (professional support, training, ...)
  • 3. What is GeoKettle?  It is part of the geospatial BI software stack developed  initially by the GeoSOA research group at Laval  University in Quebec …  GeoKettle  GeoMondrian  SOLAPLayers  But are now developed and supported by Spatialytics  http://www.spatialytics.org (open source community)  http://www.spatialytics.com (professional support, training, ...)  OK but … what is geospatial BI? ;­)
  • 4. As you probably know …  Business Intelligence applications are usually used to  better understand historical, current and future aspects of business operations in a company.   The applications typically offer ways to mine database­  and spreadsheet­centric data, and produce graphical,  table­based and other types of analytics regarding  business operations.   They support the decision process and allow to take more  informed decision!
  • 6. As you probably know …  Business Intelligence applications are usually used to better  understand historical, current and future aspects of business operations in a company.   The applications typically offer ways to mine database­ and  spreadsheet­centric data, and produce graphical, table­based  and other types of analytics regarding business operations.   They support the decision process and allow to take more  informed decision!  Rely on an architecture with robust components and  applications:  ETL tools & data warehousing  On­line Analytical Processing (OLAP) servers and clients  Reporting tools & dashboards  Data mining
  • 7. So, an ETL tool is …  A type of software used to populate databases or data  warehouses from heterogeneous data sources.  ETL stands for:  Extract – Extract data from data sources  Transform – Transformation of data in order to correct errors,  make some data cleansing, change the data structure, make  them compliant to defined standards, etc.  Load – Load transformed data into the target DBMS  An ETL tool should manage the insertion of new data and  the updating of existing data.  Should be able to perform transformations from :  A OLTP system to another OLTP system  A OLTP system to analytical data warehouse
  • 8. Why use an ETL tool?  Automation of complex and repetitive data  processing without producing any specific code  Conversion between various data formats  Migration of data from a DBMS to another  Data feeding into various DBMS  Population of analytical data warehouses for  decision support purposes  etc.
  • 9. GeoKettle  GeoKettle is then a "spatially­enabled" version of  Pentaho Data Integration (Kettle)  Kettle is a metadata­driven ETL with direct execution  of transformations  No intermediate code generation!  Support of several DBMS and file formats  DBMS support: MySQL, PostgreSQL, Oracle, DB2, MS SQL  Server, ... (total of 37)  Read/write support of various data file formats: text, Excel,  Access, DBF, XML, ...  Numerous transformation steps  Support of methods for the updating of DW
  • 10. GeoKettle  GeoKettle provides a true and consistent integration of the  spatial component  All steps provided by Kettle are able to deal with geospatial data types  Some geospatial dedicated steps have been added  First release in May 2008: 2.5.2­20080531  Current stable version: 3.2.0­r188­20090706  Released under LGPL at http://www.geokettle.org   Used in different organisations and countries:  Some ministries, bank, insurance, integrators, …  E.g. GeoETL from Inova is in fact GeoKettle! :­)  A growing community of users and developers
  • 11. GeoKettle  Transformations vs. Jobs:  Running in parallel vs. running sequentially  All can be stored in a central repository (database)  But each transformation or job could also be saved in a simple  XML file!  Offers different interfaces:  Spoon: GUI for the edition of transformations and jobs  Pan: command line interface for running transformations  Kitchen: command line interface for running jobs  Carte: Web service for the remote execution of transformations  and jobs
  • 13. GeoKettle  Provides support for:  Handling geometry data types (based on JTS)  Accessing Geometry objects in JavaScript  It allows the definition of custom transformation steps by the  user (“Modified JavaScript Value” step)  Topological predicates (Intersects, crosses, etc.)  SRS definition and transformations  Input / Output with some spatial DBMS  Native support for Oracle, PostGIS and MySQL  MS SQL Server 2008 and IBM DB2 can be used but it requires some  tricks  GIS file Input / Output: Shapefile, GML 3, KML 2.2 and OGR  support (~33 vector data formats and DBMS)
  • 14. GeoKettle  GeoKettle releases are aligned with the ones of  Pentaho Data Integration (Kettle),   GeoKettle  then benefits all new features provided by PDI  (Kettle).   Kettle is natively designed to be deployed in cluster  and web service environments.   It makes GeoKettle a perfect software component to be  deployed as a service (SaaS) in cloud computing  environments as those provided by Amazon EC2.   It enables then the scalable, distributed and on demand  processing of large and complex volumes of geospatial data  in minutes for critical applications and without requiring a  company to invest in an expensive IT infrastructure of  servers, networks and software.
  • 15. GeoKettle – Requirements and install  Very simple installation procedure  All you need is a Java Runtime Environment  Version 5 or higher  Just unzip the binary archive of GeoKettle ...  And let’s go !  Run spoon.sh (UNIX/Linux/Mac) or spoon.bat (Windows)  Need help, please visit our wiki:   http://wiki.spatialytics.org 
  • 16. GeoKettle ­ Demo ­
  • 17. GeoKettle  Upcoming features:  Cartographic preview (should be available soon)  Implementation of data matching and conflation steps in order to  allow geometric data cleansing and comparison of geospatial  datasets (Google Summer of Code in progress)  Read/write support for other DBMS, GIS file formats and services  LAS (LiDAR), ...  Native support for MS SQL Server 2008, ...  WFS­T, Sensor Web (TML, SensorML, SOS, ...), ...  Implementation of a “Spatial analysis” step with a GUI  New steps: Social media (Twitter, ...), OSM, generalisation, ...  Raster support: dev. in progress of a plugin to integrate all  capabilities provided by the Sextante library (BeETLe)
  • 18. Questions?  Thanks for your attention and do not hesitate to ask for more demos!  Contact: Dr. Thierry Badard, CTO Spatialytics inc. Quebec, Canada Email: tbadard@spatialytics.com   Web:  http://www.spatialytics.org            http://www.spatialytics.com   Twitter: tbadard & spatialytics                             http://www.geokettle.org                         Twitter : geokettle                             http://www.geo­mondrian.org                 Twitter : geomondrian                             http://www.solaplayers.org                     Twitter : solaplayers