SlideShare une entreprise Scribd logo
1  sur  25
Moving and Transforming Data with

Pentaho Data Integration
a.k.a. KETTLE
Welcome!
• Software engineer
• rbouman@pentaho.com
Pentaho
• Business Intelligence & Analytics
• Full stack
• Open core
–

GPLv2, Apache 2.0

–

Enterprise and OEM licenses

• Java-based
• Web front-ends
The Pentaho Stack
•
•
•
•
•
•
•
•
•
•

Data Integration / ETL
Big Data / NoSQL
Data Modeling
Reporting
OLAP / Analysis
Data Visualization
Dashboarding
Data Mining / Predictive Analysis
(Mobile) Delivery
Bursting, Scheduling, Self Service
Full Stack BI & BA

Sources

Reports

OLAP

Blending

Visualization
Data Warehouse
ETL

T

T

T

T

Models
D
D

Instant Analytics

Dashboards

D

F
D

Mining
D
Information:

A well-prepared, well-presented meal
Extraction:

Catching the right data and hauling it in
Transformation

Disgusting job of validating & cleaning data
Loading:

Store manageable units for later use
Loading:

Store manageable units for later use
Pentaho Data Integration
• Kettle
– Extract,

Transform, Load

– Blending
– Instant Analytics

• Changing input to desired output
Kettle Architecture
Data Integration Engine
Job
Engine

Call

Job

Transformation
Engine

Transformation

Tools and Utilities
Launch:
Kitchen, Carte

Launch:
Pan

Develop: Spoon

Repository
(RDBMS)
.kjb

.ktr
Jobs & Transformations
• Jobs
– Synchronous

workflow of job entries (tasks)

• Transformations
– Stepwise

parallel & asynchronous

processing of a recordstream
• Distributed
Sources and Destinations
•
•
•
•
•
•
•

RDBMS (> 40)
NoSQL / Big Data
OLAP (Mondrian, Palo, XML/A)
Web (REST, SOAP, XML, JSON .)
Files (CSV, Fixed, Excel …)
ERP (SAP, Salesforce, OpenERP)
...way Too Many To Mention™!
Transformations
•
•
•
•
•
•
•
•

String & Date manipulation
Data Validation / Business Rules
Lookup / Join
Calculation, Statistics
Cryptography
Decisions, Flow control
Scripting
...> 150, excluding plugins
Demo
• NLUUG Program Webpage
Input

Output
dim_room

ETL

dim_track

fact_talk

dim_speaker

dim_company
Demo Transformation
Business Model
• Open core
– Majority
– Give

is open source

and take

• Enterprise Edition
– Extra

features, Support, Services
Community
• Plugins
– Pentaho

Marketplace

• Code contributions
• Applications & Solutions
• Tutorials, Support
Marketplace
Community Meetups
Pentaho Software
• Community Edition Binaries
–

community.pentaho.com (release)

ci.pentaho.com (development)
• Source code
–

–

github.com/pentaho

• Enterprise Edition Evaluation
–

pentaho.com/testdrive
Resources
• Online documentation
– infocenter.pentaho.com
– wiki.pentaho.com

(manual)

(community wiki)

• Issue tracker
– jira.pentaho.com

• Community Support
– forums.pentaho.com
– Freenode

IRC: ##pentaho
Books
Thank You
Join the conversation. You can find us on:
blog.pentaho.com
@Pentaho
Facebook.com/Pentaho
Pentaho Business Analytics
25

Contenu connexe

Tendances

Tendances (20)

Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Data analytics
Data analyticsData analytics
Data analytics
 
Power BI Architecture
Power BI ArchitecturePower BI Architecture
Power BI Architecture
 
Big data
Big dataBig data
Big data
 
Introduction To Pentaho
Introduction To PentahoIntroduction To Pentaho
Introduction To Pentaho
 
Apache Hadoop
Apache HadoopApache Hadoop
Apache Hadoop
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Introduction to Looker Studio.pptx
Introduction to Looker Studio.pptxIntroduction to Looker Studio.pptx
Introduction to Looker Studio.pptx
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Data Engineering Basics
Data Engineering BasicsData Engineering Basics
Data Engineering Basics
 
Oracle Analytics Cloud
Oracle Analytics CloudOracle Analytics Cloud
Oracle Analytics Cloud
 
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
[Webinar Deck] Google Data Studio for Mastering the Art of Data Visualizations
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Building a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and DockerBuilding a data warehouse with Pentaho and Docker
Building a data warehouse with Pentaho and Docker
 
Web Intelligence - Tutorial1
Web Intelligence - Tutorial1Web Intelligence - Tutorial1
Web Intelligence - Tutorial1
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Data analytics - Alteryx Spotlight.pdf
Data analytics - Alteryx Spotlight.pdfData analytics - Alteryx Spotlight.pdf
Data analytics - Alteryx Spotlight.pdf
 
Data Lake Architecture
Data Lake ArchitectureData Lake Architecture
Data Lake Architecture
 
Oracle Architecture
Oracle ArchitectureOracle Architecture
Oracle Architecture
 
Informatica Powercenter Architecture
Informatica Powercenter ArchitectureInformatica Powercenter Architecture
Informatica Powercenter Architecture
 

En vedette

Pentaho bi suite overview presentation
Pentaho bi suite overview   presentationPentaho bi suite overview   presentation
Pentaho bi suite overview presentation
nvvrajesh
 
Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho
Sunny U Okoro
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho
valex_haro
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
NGINX, Inc.
 

En vedette (20)

Pentaho bi suite overview presentation
Pentaho bi suite overview   presentationPentaho bi suite overview   presentation
Pentaho bi suite overview presentation
 
Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho Business Intelligence and Big Data Analytics with Pentaho
Business Intelligence and Big Data Analytics with Pentaho
 
Advanced ETL2 Pentaho
Advanced ETL2  Pentaho Advanced ETL2  Pentaho
Advanced ETL2 Pentaho
 
NGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application DeliveryNGINX Plus PLATFORM For Flawless Application Delivery
NGINX Plus PLATFORM For Flawless Application Delivery
 
Docker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - ComposeDocker Ecosystem - Part II - Compose
Docker Ecosystem - Part II - Compose
 
Building Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using PentahoBuilding Data Integration and Transformations using Pentaho
Building Data Integration and Transformations using Pentaho
 
Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5Indic threads pune12-accelerating computation in html 5
Indic threads pune12-accelerating computation in html 5
 
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, RegistryDocker Ecosystem: Engine, Compose, Machine, Swarm, Registry
Docker Ecosystem: Engine, Compose, Machine, Swarm, Registry
 
Introduction to docker swarm
Introduction to docker swarmIntroduction to docker swarm
Introduction to docker swarm
 
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
Scaling Jenkins with Docker: Swarm, Kubernetes or Mesos?
 
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
Continuous Development with Jenkins - Stephen Connolly at PuppetCamp Dublin '12
 
Migración de datos con OpenERP-Kettle
Migración de datos con OpenERP-KettleMigración de datos con OpenERP-Kettle
Migración de datos con OpenERP-Kettle
 
Scaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and KubernetesScaling Jenkins with Docker and Kubernetes
Scaling Jenkins with Docker and Kubernetes
 
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) ItalyClustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
Clustering with Docker Swarm - Dockerops 2016 @ Cento (FE) Italy
 
Tao zhang
Tao zhangTao zhang
Tao zhang
 
Jenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker EcosystemJenkins Peru Meetup Docker Ecosystem
Jenkins Peru Meetup Docker Ecosystem
 
Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho Elementos ETL - Kettle Pentaho
Elementos ETL - Kettle Pentaho
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Load Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINXLoad Balancing Apps in Docker Swarm with NGINX
Load Balancing Apps in Docker Swarm with NGINX
 
Docker swarm introduction
Docker swarm introductionDocker swarm introduction
Docker swarm introduction
 

Similaire à Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)

Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
Amit Kumar
 
Heavy Metal PowerPivot Remastered
Heavy Metal PowerPivot RemasteredHeavy Metal PowerPivot Remastered
Heavy Metal PowerPivot Remastered
Jason Himmelstein
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
Tillmann Eitelberg
 

Similaire à Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle) (20)

Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
SQL Server Data Discovery with PowerPivot
SQL Server Data Discovery with PowerPivotSQL Server Data Discovery with PowerPivot
SQL Server Data Discovery with PowerPivot
 
sandhya exp resume
sandhya exp resume sandhya exp resume
sandhya exp resume
 
Amit Kumar_Resume
Amit Kumar_ResumeAmit Kumar_Resume
Amit Kumar_Resume
 
seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019seven steps to dataops @ dataops.rocks conference Oct 2019
seven steps to dataops @ dataops.rocks conference Oct 2019
 
Heavy Metal PowerPivot Remastered
Heavy Metal PowerPivot RemasteredHeavy Metal PowerPivot Remastered
Heavy Metal PowerPivot Remastered
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...Informatica to ODI Migration – What, Why and How |  Informatica to Oracle Dat...
Informatica to ODI Migration – What, Why and How | Informatica to Oracle Dat...
 
Pentaho
PentahoPentaho
Pentaho
 
Summer Shorts: Big Data Integration
Summer Shorts: Big Data IntegrationSummer Shorts: Big Data Integration
Summer Shorts: Big Data Integration
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview Analytics at the Speed of Thought: Actian Express Overview
Analytics at the Speed of Thought: Actian Express Overview
 
Pentaho Roadmap 2011
Pentaho Roadmap 2011Pentaho Roadmap 2011
Pentaho Roadmap 2011
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Data Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation CriteriaData Platform Architecture Principles and Evaluation Criteria
Data Platform Architecture Principles and Evaluation Criteria
 
Resume (3)
Resume (3)Resume (3)
Resume (3)
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 

Plus de Roland Bouman

Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
Roland Bouman
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema
Roland Bouman
 
2. writing MySql plugins general
2. writing MySql plugins   general2. writing MySql plugins   general
2. writing MySql plugins general
Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
Roland Bouman
 
Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010
Roland Bouman
 

Plus de Roland Bouman (12)

Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5Beyond OData: Introducing the XML/A model for ui5
Beyond OData: Introducing the XML/A model for ui5
 
Xmla4js
Xmla4jsXmla4js
Xmla4js
 
Xml4js pentaho
Xml4js pentahoXml4js pentaho
Xml4js pentaho
 
Writing MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScriptWriting MySQL User-defined Functions in JavaScript
Writing MySQL User-defined Functions in JavaScript
 
3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema3. writing MySql plugins for the information schema
3. writing MySql plugins for the information schema
 
2. writing MySql plugins general
2. writing MySql plugins   general2. writing MySql plugins   general
2. writing MySql plugins general
 
1. MySql plugins
1. MySql plugins1. MySql plugins
1. MySql plugins
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010Optimizing mysql stored routines uc2010
Optimizing mysql stored routines uc2010
 
Writing MySQL UDFs
Writing MySQL UDFsWriting MySQL UDFs
Writing MySQL UDFs
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 

Dernier

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Dernier (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 

Moving and Transforming Data with Pentaho Data Integration 5.0 CE (aka Kettle)