SlideShare une entreprise Scribd logo
1  sur  25
Ricardo Pires – Partner & BI Lead Xpand IT
Real Uses Cases
A SET OF INSPIRING USE CASES
USE CASE 1:
ALL TRANSACTIONS,
ONE DASHBOARD
• Dashboard providing a common view across sales transactions
• Multiple roles
• Top management
• Brand managers
• Channel managers
• Requiring to organize data in multiple ways
• Establish dynamic hierarchies based on multiple attributes
DYNAMIC VIEW ACROSS SALES
DYNAMIC HIERARCHIES
Holding
Brand
Channel
Shop
Ʃ
Ʃ
Ʃ
Ʃ ATTR = abc
• 3 Years historical data
• 7,2 billion transactions representing 4,5TB
• Wide group of users spread across the organization
• Intuitive User Interface with a great User Experience
• Detailed visualization
• Row level security
• Maximum dashboard load time 5s
CHALLENGES
THE SOLUTION
HDFS Hive Impala
Pentaho Data Integration (PDI)
PDI
HBase
Web
Application
Hadoop
• Impala on Cloudera Hadoop can be used as an interactive data
base
• Hadoop distributed nature allows implementing used cases that
wouldn’t be viable on other technologies
• We went from 7 days of data to 3 years
• Pentaho Data Integration implements and orchestrates the whole
ETL process, making it much easier
• From traditional data sources to summarized data on Hadoop
KEY TAKEAWAYS
USE CASE 2:
LOADING THE DATA LAKE
• Data lake goal is to make data available on a centralized location
• Requires dealing with
• Wide set of sources
• Disparate technologies
• In this case it is a repetitive batch loading process
DATA INGESTION
THE SMART SOLUTION
Configure
Metadata
Repository
Ingestion Engine
based on
Templates
Use Hadoop as
Data Repository
METADATA DRIVEN INGESTION
ARCHITECTURE
HDFS
Web UI
HadoopAny Datasource
PDI
PDI
PDI
{REST}
Ingestion Engine
Hive
• Pentaho Data Integration flexibility is a great match for Hadoop
semi-structured nature
• Cloudera Hadoop can be easily used to store data and make it
immediately available through a SQL interface
• Patterns and well defined workflows are essential to data
governance
KEY TAKEAWAYS
USE CASE 3:
FOSTERING TRANSPARENCY
• Government agencies have long collected data but that doesn’t
mean it can easily be perceived by citizens
• Challenge
• Create an intuitive UI to represent more than 100 KPIs across 308
municipalities
• Become a standard in terms of transparency
GOVERNMENT CHALLENGE
Architecture
BA
SERVER
Public Data
Service
PDI
ETL Web Application
• Pentaho Business Analytics is a comprehensive suite
• Pentaho Server components are really flexible and extensible
allowing creating custom UIs such as:
• Analytics portals
• Embed on existing products
KEY TAKEAWAYS
THANK YOU

Contenu connexe

Tendances

Tendances (20)

Materi struktur hirarki basis data
Materi struktur hirarki basis dataMateri struktur hirarki basis data
Materi struktur hirarki basis data
 
SAP and Salesforce Integration
SAP and Salesforce IntegrationSAP and Salesforce Integration
SAP and Salesforce Integration
 
LeverX IQ DMS Overview - SAP DMS Simplified
LeverX IQ DMS Overview - SAP DMS SimplifiedLeverX IQ DMS Overview - SAP DMS Simplified
LeverX IQ DMS Overview - SAP DMS Simplified
 
OSI Layer dan TCP/IP
OSI Layer dan TCP/IP OSI Layer dan TCP/IP
OSI Layer dan TCP/IP
 
Lessons learned during SAP CPI and API mgt projects
Lessons learned during SAP CPI and API mgt projects Lessons learned during SAP CPI and API mgt projects
Lessons learned during SAP CPI and API mgt projects
 
Visualizando dados de Big Data com Amazon QuickSight
Visualizando dados de Big Data com Amazon QuickSightVisualizando dados de Big Data com Amazon QuickSight
Visualizando dados de Big Data com Amazon QuickSight
 
normalisasi
normalisasinormalisasi
normalisasi
 
Spanning Tree Protocol dan Etherchannel
Spanning Tree Protocol dan EtherchannelSpanning Tree Protocol dan Etherchannel
Spanning Tree Protocol dan Etherchannel
 
RPP - Administrasi Infrastruktur Jaringan | Kelas XI
RPP - Administrasi Infrastruktur Jaringan | Kelas XIRPP - Administrasi Infrastruktur Jaringan | Kelas XI
RPP - Administrasi Infrastruktur Jaringan | Kelas XI
 
Python in telecommunications (in 7 minutes)
Python in telecommunications (in 7 minutes)Python in telecommunications (in 7 minutes)
Python in telecommunications (in 7 minutes)
 
Oracle: Procedures
Oracle: ProceduresOracle: Procedures
Oracle: Procedures
 
Network Interception - Write Swift codes to inspect network requests (even wi...
Network Interception - Write Swift codes to inspect network requests (even wi...Network Interception - Write Swift codes to inspect network requests (even wi...
Network Interception - Write Swift codes to inspect network requests (even wi...
 
001. konfigurasi dasar debian server
001. konfigurasi dasar debian server001. konfigurasi dasar debian server
001. konfigurasi dasar debian server
 
VPLS Fundamental
VPLS FundamentalVPLS Fundamental
VPLS Fundamental
 
Basic service set
Basic service setBasic service set
Basic service set
 
MUH. DADANG HAWARI_TUGAS SOAL ESSAY.pdf
MUH. DADANG HAWARI_TUGAS SOAL ESSAY.pdfMUH. DADANG HAWARI_TUGAS SOAL ESSAY.pdf
MUH. DADANG HAWARI_TUGAS SOAL ESSAY.pdf
 
Best practices on building data lakes and lake formation
Best practices on building data lakes and lake formationBest practices on building data lakes and lake formation
Best practices on building data lakes and lake formation
 
Stored Procedure
Stored ProcedureStored Procedure
Stored Procedure
 
DER - Diagrama de Entidade e Relacionamentos
DER - Diagrama de Entidade e RelacionamentosDER - Diagrama de Entidade e Relacionamentos
DER - Diagrama de Entidade e Relacionamentos
 
PDO (php data object)
PDO (php data object)PDO (php data object)
PDO (php data object)
 

Similaire à Real Use Cases - Pentaho & Big Data Ecosystem

Similaire à Real Use Cases - Pentaho & Big Data Ecosystem (20)

Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
Modernizing Business Processes with Big Data: Real-World Use Cases for Produc...
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Hitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop SolutionHitachi Data Systems Hadoop Solution
Hitachi Data Systems Hadoop Solution
 
Moving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from PivotalMoving data to the cloud BY CESAR ROJAS from Pivotal
Moving data to the cloud BY CESAR ROJAS from Pivotal
 
Next Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data WarehouseNext Gen Analytics Going Beyond Data Warehouse
Next Gen Analytics Going Beyond Data Warehouse
 
Accelerating Data Warehouse Modernization
Accelerating Data Warehouse ModernizationAccelerating Data Warehouse Modernization
Accelerating Data Warehouse Modernization
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and Microsoft
 
Logical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business OutcomesLogical Data Fabric and Data Mesh – Driving Business Outcomes
Logical Data Fabric and Data Mesh – Driving Business Outcomes
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Designing a modern data warehouse in azure
Designing a modern data warehouse in azure   Designing a modern data warehouse in azure
Designing a modern data warehouse in azure
 
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 

Plus de Xpand IT

Plus de Xpand IT (20)

Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018
Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018
Xray & Xporter were in Austria: Jira & Confluence Solutions Day 2018
 
Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017
Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017
Using Xamarin for your Mobile+ Apps – Xamarin Experience London 2017
 
Xporter for Jira - Overview
Xporter for Jira - OverviewXporter for Jira - Overview
Xporter for Jira - Overview
 
Xray for Jira - How to automate your QA process
Xray for Jira - How to automate your QA processXray for Jira - How to automate your QA process
Xray for Jira - How to automate your QA process
 
Xpand Addons - Addon Discovery Day 2017
Xpand Addons - Addon Discovery Day 2017Xpand Addons - Addon Discovery Day 2017
Xpand Addons - Addon Discovery Day 2017
 
Xray for Jira 3.0 - What's New?
Xray for Jira 3.0 - What's New?Xray for Jira 3.0 - What's New?
Xray for Jira 3.0 - What's New?
 
Xray for Jira - Overview
Xray for Jira - OverviewXray for Jira - Overview
Xray for Jira - Overview
 
Xporter for Jira - Advanced topics
Xporter for Jira  - Advanced topicsXporter for Jira  - Advanced topics
Xporter for Jira - Advanced topics
 
Keynote - Xamarin Experience London 2017
Keynote - Xamarin Experience London 2017 Keynote - Xamarin Experience London 2017
Keynote - Xamarin Experience London 2017
 
Welcome & Introduction – Xamarin Experience London 2017
Welcome & Introduction – Xamarin Experience London 2017 Welcome & Introduction – Xamarin Experience London 2017
Welcome & Introduction – Xamarin Experience London 2017
 
Gathering Customer Insights with Sitecore - Xamarin Experience London 2017
Gathering Customer Insights with Sitecore - Xamarin Experience London 2017Gathering Customer Insights with Sitecore - Xamarin Experience London 2017
Gathering Customer Insights with Sitecore - Xamarin Experience London 2017
 
Why Speed Matters in Mobile Apps – Xamarin Experience London 2017
Why Speed Matters in Mobile Apps – Xamarin Experience London 2017Why Speed Matters in Mobile Apps – Xamarin Experience London 2017
Why Speed Matters in Mobile Apps – Xamarin Experience London 2017
 
Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...
Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...
Mobile & Cognitive Services | Harnessing the Power of IoT – Xamarin Experienc...
 
Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...
Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...
Atlassian Tools in Practice: A Customer Success Story – Xpand IT & Atlassian ...
 
The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017
The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017
The Secret Sauce of Successful Teams - Xpand IT & Atlassian JAM Sessions 2017
 
Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017
Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017
Quality Assurance Made Easy in JIRA - Xpand IT & Atlassian JAM Sessions 2017
 
Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017
Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017
Improved Reporting with JIRA Add-ons - Xpand IT & Atlassian JAM Sessions 2017
 
How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...
How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...
How our Team Collaborates with Atlassian Tools - Xpand IT & Atlassian JAM Ses...
 
Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017
Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017 Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017
Welcome & Introduction - Xpand IT & Atlassian JAM Sessions 2017
 
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017 The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
The Real World with OpenShift - Red Hat DevOps & Microservices Conference 2017
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Real Use Cases - Pentaho & Big Data Ecosystem

  • 1. Ricardo Pires – Partner & BI Lead Xpand IT Real Uses Cases
  • 2. A SET OF INSPIRING USE CASES
  • 3. USE CASE 1: ALL TRANSACTIONS, ONE DASHBOARD
  • 4. • Dashboard providing a common view across sales transactions • Multiple roles • Top management • Brand managers • Channel managers • Requiring to organize data in multiple ways • Establish dynamic hierarchies based on multiple attributes DYNAMIC VIEW ACROSS SALES
  • 6. • 3 Years historical data • 7,2 billion transactions representing 4,5TB • Wide group of users spread across the organization • Intuitive User Interface with a great User Experience • Detailed visualization • Row level security • Maximum dashboard load time 5s CHALLENGES
  • 7. THE SOLUTION HDFS Hive Impala Pentaho Data Integration (PDI) PDI HBase Web Application Hadoop
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. • Impala on Cloudera Hadoop can be used as an interactive data base • Hadoop distributed nature allows implementing used cases that wouldn’t be viable on other technologies • We went from 7 days of data to 3 years • Pentaho Data Integration implements and orchestrates the whole ETL process, making it much easier • From traditional data sources to summarized data on Hadoop KEY TAKEAWAYS
  • 13. USE CASE 2: LOADING THE DATA LAKE
  • 14. • Data lake goal is to make data available on a centralized location • Requires dealing with • Wide set of sources • Disparate technologies • In this case it is a repetitive batch loading process DATA INGESTION
  • 15. THE SMART SOLUTION Configure Metadata Repository Ingestion Engine based on Templates Use Hadoop as Data Repository METADATA DRIVEN INGESTION
  • 17. • Pentaho Data Integration flexibility is a great match for Hadoop semi-structured nature • Cloudera Hadoop can be easily used to store data and make it immediately available through a SQL interface • Patterns and well defined workflows are essential to data governance KEY TAKEAWAYS
  • 18. USE CASE 3: FOSTERING TRANSPARENCY
  • 19. • Government agencies have long collected data but that doesn’t mean it can easily be perceived by citizens • Challenge • Create an intuitive UI to represent more than 100 KPIs across 308 municipalities • Become a standard in terms of transparency GOVERNMENT CHALLENGE
  • 21.
  • 22.
  • 23.
  • 24. • Pentaho Business Analytics is a comprehensive suite • Pentaho Server components are really flexible and extensible allowing creating custom UIs such as: • Analytics portals • Embed on existing products KEY TAKEAWAYS

Notes de l'éditeur

  1. Goal is: - to let know examples of what we have been doing - inspire you to use these technologies
  2. Structure was static Last 7 days to last 3 years
  3. Mutliple levels that defined drill down path Multiple elements where each has a criteria establishing the rows to aggregate Sales are sum on each element Multiple hierarchies like this can be created and processed overnigh
  4. 3 main components: PDI + Hadoop + Web App Sqoop from Oracle Process on Hive (formulas, pre aggregation) using conf from HBase Impala stores end result Data is summarized as much as possible allowing each chart is able to be visualized using only a couple of rows which are filtered based on security criteria
  5. Zoom, Pan e Play
  6. Single workflow/pattern across all data sources Promote reusability -> opposite to typical ETL Create a metadata repository Describes sources, destinations and simple processes required to ingest the data, can be done with automatic profiling Implement the ingestion process with PDI Flexible tool with meta data injection capabilities Open standards allowing creating transformations on the fly Use Hadoop as the data repository File system based and thus very flexible Additional layers can be placed on top to access data in multiple ways
  7. 3 main components: PDI + Hadoop + Web App Sqoop from Oracle Process on Hive (formulas, pre aggregation) using conf from HBase Impala stores end result
  8. can be easily understood by anyone be “cosy” and attractive