SlideShare une entreprise Scribd logo
1  sur  27
Let’s Build a Service Oriented
Data Pipeline!
June 2016
Software Developer | Hootsuite
Yasha Podeswa
Before: Oceanographer
Me!
Now: Software Developer at Hootsuite
Me!
Introduce a problem that requires a new data pipeline
Design it in a service oriented style
Build it on stage!
This Talk
Passive Aggressive Inc. just
cancelled their subscription!
Desperate Dan in trouble!
The Problem
Want to Build a Tool Like This
Want to Build a Tool Like This
Want to Build a Tool Like This
What We’re Starting With
What We’re Starting With
Things Users
Did
What We’re Starting With
Things
Organizations
Did
What We’re Starting With
Crap
High Level Plan
JSON files
Calculate stats
about
organizations
DB
High Level Plan
JSON files
Calculate stats
about
organizations
DB
Extract
Transform
Load
High Level Plan
JSON files
Calculate stats
about
organizations
DB
Extract
Transform
Load
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
Useful for lots of
things!
JSON files
Calculate stats
about
organizations
DB
Clean and organize
data
Calculate stats per
organization
Shouldn’t run until
dependent job done
Need a “Service” Communication and
Orchestration Layer!
Let’s build it!
First App Event Cleaning and Loading
Read logs from S3, clean and sort into different types of
events, load into data warehouse
Vanilla Scala app
AWS Lambda
Second App Organization Stat Calculation
Read cleaned/sorted events from data warehouse, calculate
stats about organization, load stats to data warehouse
Vanilla Scala app
AWS Lambda
Third App Airflow
Hook up the Lambda apps in a dependency graph
● Scheduling
● Retries
● Monitoring
Steal my code!
https://github.com/yashap/etl-load-events
https://github.com/yashap/etl-organization-stats
https://github.com/yashap/airflow
Questions?

Contenu connexe

En vedette

Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
confluent
 

En vedette (14)

Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
Webinar - How to Build Data Pipelines for Real-Time Applications with SMACK &...
 
Writing a Jenkins / Hudson plugin
Writing a Jenkins / Hudson pluginWriting a Jenkins / Hudson plugin
Writing a Jenkins / Hudson plugin
 
Curator intro
Curator introCurator intro
Curator intro
 
Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer Building an Event-oriented Data Platform with Kafka, Eric Sammer
Building an Event-oriented Data Platform with Kafka, Eric Sammer
 
Data ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFiData ingestion and distribution with apache NiFi
Data ingestion and distribution with apache NiFi
 
Modeling Microservices
Modeling MicroservicesModeling Microservices
Modeling Microservices
 
Building a real-time streaming platform using Kafka Connect + Kafka Streams
Building a real-time streaming platform using Kafka Connect + Kafka StreamsBuilding a real-time streaming platform using Kafka Connect + Kafka Streams
Building a real-time streaming platform using Kafka Connect + Kafka Streams
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
The Enterprise Service Bus is Dead! Long live the Enterprise Service Bus, Rim...
 
reveal.js 3.0.0
reveal.js 3.0.0reveal.js 3.0.0
reveal.js 3.0.0
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similaire à Let's Build a Service Oriented Data Pipeline!

Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & Overview
Li Ken Chong
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical Overview
David J Rosenthal
 

Similaire à Let's Build a Service Oriented Data Pipeline! (20)

SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS Warsaw 2017 - Building applications with PowerApps, Microsoft flow and Of...
 
SPS Brno 2017 - Go with the Microsoft flow
SPS Brno 2017 - Go with the Microsoft flowSPS Brno 2017 - Go with the Microsoft flow
SPS Brno 2017 - Go with the Microsoft flow
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BI
 
Jira - Solving Reporting Problems using eazyBI
Jira - Solving Reporting Problems using eazyBIJira - Solving Reporting Problems using eazyBI
Jira - Solving Reporting Problems using eazyBI
 
Synapse NanoApps
Synapse NanoAppsSynapse NanoApps
Synapse NanoApps
 
Microsoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & OverviewMicrosoft Business Intelligence - Practical Approach & Overview
Microsoft Business Intelligence - Practical Approach & Overview
 
Intro to PowerApps and Flow
Intro to PowerApps and FlowIntro to PowerApps and Flow
Intro to PowerApps and Flow
 
Automating your tasks with microsoft flow
Automating your tasks with microsoft flowAutomating your tasks with microsoft flow
Automating your tasks with microsoft flow
 
Microsoft Power BI Technical Overview
Microsoft Power BI Technical OverviewMicrosoft Power BI Technical Overview
Microsoft Power BI Technical Overview
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
Ha100 notes units 1 and 2 sp08
Ha100 notes units 1 and 2   sp08Ha100 notes units 1 and 2   sp08
Ha100 notes units 1 and 2 sp08
 
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenariosSPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Brno 2017 - PowerApps & Microsoft Flow: Advanced scenarios
 
SharePoint 2010: Insights into BI
SharePoint 2010: Insights into BISharePoint 2010: Insights into BI
SharePoint 2010: Insights into BI
 
SQL Saturday Redmond The Power Platform
SQL Saturday Redmond The Power Platform SQL Saturday Redmond The Power Platform
SQL Saturday Redmond The Power Platform
 
The Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value ThereafterThe Right Data Warehouse: Automation Now, Business Value Thereafter
The Right Data Warehouse: Automation Now, Business Value Thereafter
 
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenariosSPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
SPS Dubai 2017 - PowerApps & Microsoft Flow: Advanced scenarios
 
Project Insights for Data Driven Decisions
Project Insights for Data Driven DecisionsProject Insights for Data Driven Decisions
Project Insights for Data Driven Decisions
 
Loading text data from SAP source systems
Loading text data from SAP source systemsLoading text data from SAP source systems
Loading text data from SAP source systems
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
SPS London 2017 - Building applications with PowerApps, Microsoft flow and Of...
 

Dernier

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Let's Build a Service Oriented Data Pipeline!