SlideShare a Scribd company logo
1 of 25
DataDevOps - Data Manifesto
Shared Responsibilities for Data
in Microservice architectures
Sebastian Herold & Dr. Arif Wider
Con Berlin, September 21-22, 2017
Sebastian Herold
- Chief Data Architect
- Data Engineering
Manager
- Data Evangelist
- @heroldus
Seite 2
Dr. Arif Wider
- Senior Consultant/Dev
- Scala/FP enthusiast
- ThoughtWorks Germany
data strategy group
- @arifwider
Seite 3
PL
S
RUS
UA
RO
CZ
D
NL
B
F CH
A
HR
I
E
BG
TR
- AutoScout24 & ImmobilienScout24
- 18 European Countries
- > 20 Mio. Unique Visitors per month
- ~100 GB new data per day
- > 100 temporary EMR clusters per day
Seite 5
Road to MicroService Architecture – How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
2007
Web
Tier
Analyst
BI Dev
Seite 6
Road to MicroService Architecture – How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev
APPMySQL
APPMySQL
APPMySQL
Seite 7
Road to MicroService Architecture – How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Web
2013
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE
AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
Seite 8
Road to MicroService Architecture – How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Web
2015
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE
Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
Seite 9
Road to MicroService Architecture – How we rearchitectured our Data Landscape
BI Tool
DWH
Central Data Lake on S3
CRM
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
2017
Core DB APP
REST API
Analyst
DE
BI Dev
APPAPPAPP
Seite 10
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 11
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up Data
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Image source: https://www.oddsemiconductorservices.com/
Seite 12
SCOUT24
DATA LANDSCAPE
MANIFESTO
ROLES, RESPONSIBILITIES, AND VALUES
FOR A DATA-DRIVEN COMPANY AT SCALE
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 13
SCOUT24 DATA LANDSCAPE MANIFESTO
#1 Preamble
Data is a key asset of our
company.
SCOUT24 DATA LANDSCAPE MANIFESTO
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 14
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
D’N’A
Data Landscape
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 15
SCOUT24 DATA LANDSCAPE MANIFESTO
#3 Data Autonomy, Not Anarchy
Data autonomy puts data
producers & data consumers
in control of their data & of
their metrics and thereby
allows us to be data-driven
at scale, but this comes with
responsibility.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Data
Producer Consumer
D’N’A
Data Landscape
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 16
SCOUT24 DATA LANDSCAPE MANIFESTO
#4 Producer’s Responsibility
Data producers are
responsible for publishing
data to the central Data
Lake, for the data's quality,
and for publishing metadata
that makes it easy to find
and consume the data.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer
D’N’A
Data Landscape
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 17
SCOUT24 DATA LANDSCAPE MANIFESTO
#5 Consumer’s Responsibility
Data consumers are
responsible for the definition
& visualization of metrics
and for driving the imple-
mentation and maintenance
of these metrics.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’N’A
Data Landscape
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 18
SCOUT24 DATA LANDSCAPE MANIFESTO
#6 Exception: Core KPIs
We, Data & Analytics, take
the full ownership and
responsibility of the few top
company-wide core KPIs.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’N’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 19
SCOUT24 DATA LANDSCAPE MANIFESTO
#7 Transparency Over Continuity
We value data transparency
over data continuity, which
means we may break metric
comparability if it is for the
cause of enabling better
insights.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’N’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 20
SCOUT24 DATA LANDSCAPE MANIFESTO
The Ultimate Goal
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer Consumer
D’N’A
Data Landscape
Core
metric
Data
products
A federal landscape of data
producers and consumers
with just enough rules to
ensure seamless co-
operation without severely
impeding autonomy.
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 21
Consequences for Product
Development Teams?
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
- Think about data & reporting
- Deliver your data to the lake
- Provide meta data (schema, descriptions, versions)
- Eat your own dog food: Consume your own data
for reporting -> take responsibility for data quality
Seite 22
Benefits for Product Development
Teams?
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
- Independently work with data
- No dependencies to data teams
- Company data is curated and it’s easy to consume
data produced by other teams
DevOps
Seite 23
#DataDevOps
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
Seite 24
Learnings and lessons
 Publish exhaustive, general, and denormalized event data
 Avoid consumer-specific tailoring of data you publish
 Consume your own data, e.g. for KPI reports
 Try out ad-hoc analytics notebooks to get better insights
 Inform data producers, if you rely on their data
 Invest in documentation and guidelines for your data
platform to keep your effort for support low
DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
www.scout24.com
Thanks!
Questions?
Sebastian Herold Arif Wider
We are hiring!

More Related Content

What's hot

What's hot (20)

Future of data
Future of dataFuture of data
Future of data
 
Denodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo DataFest 2016: Metadata and Data: Search and Exploration
Denodo DataFest 2016: Metadata and Data: Search and Exploration
 
Enterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in productionEnterprise ready: a look at Neo4j in production
Enterprise ready: a look at Neo4j in production
 
Eneco Ronald Root
Eneco Ronald RootEneco Ronald Root
Eneco Ronald Root
 
GraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital ExplorerGraphTour - DXC - Digital Explorer
GraphTour - DXC - Digital Explorer
 
Design Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & SupermenDesign Thinking for Data Superwomen & Supermen
Design Thinking for Data Superwomen & Supermen
 
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...
 
Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)Multi-Cloud Data Integration with Data Virtualization (APAC)
Multi-Cloud Data Integration with Data Virtualization (APAC)
 
GDPR: the IBM journey to compliance
GDPR: the IBM journey to complianceGDPR: the IBM journey to compliance
GDPR: the IBM journey to compliance
 
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015Jeff Kelly, Wikibon Slides; Big Data Summit 2015
Jeff Kelly, Wikibon Slides; Big Data Summit 2015
 
20191106 brasil it 2
20191106 brasil it 220191106 brasil it 2
20191106 brasil it 2
 
Power BI : A Detailed Discussion
Power BI : A Detailed DiscussionPower BI : A Detailed Discussion
Power BI : A Detailed Discussion
 
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...
 
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
 
‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovation‘Edge’ Technologies: a new language of innovation
‘Edge’ Technologies: a new language of innovation
 
What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?What makes it worth becoming a Data Engineer?
What makes it worth becoming a Data Engineer?
 
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...
 
Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0Data Thinking Preview - Predictive Analytics World for Industry 4.0
Data Thinking Preview - Predictive Analytics World for Industry 4.0
 
Pieter den Hamer Alliander
Pieter den Hamer Alliander Pieter den Hamer Alliander
Pieter den Hamer Alliander
 
Don’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, CapgeminiDon’t Choose One Database Choose Them All!, Capgemini
Don’t Choose One Database Choose Them All!, Capgemini
 

Similar to DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices

Turning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable InsightsTurning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable Insights
G3 Communications
 
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
IngridBuenaventura
 
Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
BigDataExpo
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
DataWorks Summit
 

Similar to DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices (20)

Turning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable InsightsTurning Business Intelligence Into Actionable Insights
Turning Business Intelligence Into Actionable Insights
 
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformThe Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
The Scout24 Data Landscape Manifesto: Building an Opinionated Data Platform
 
Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019Trends for Modernizing Analytics and Data Warehousing in 2019
Trends for Modernizing Analytics and Data Warehousing in 2019
 
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...
 
Big Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the GameBig Data Enabled: How YARN Changes the Game
Big Data Enabled: How YARN Changes the Game
 
The Data & Analytics Journey – Why it’s more attainable for your company than...
The Data & Analytics Journey – Why it’s more attainable for your company than...The Data & Analytics Journey – Why it’s more attainable for your company than...
The Data & Analytics Journey – Why it’s more attainable for your company than...
 
Disrupt for Digital Transformation
Disrupt for Digital Transformation Disrupt for Digital Transformation
Disrupt for Digital Transformation
 
Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0Roadmap to data driven advice michael goedhart 1v0
Roadmap to data driven advice michael goedhart 1v0
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
Graphs for Enterprise Architects
Graphs for Enterprise ArchitectsGraphs for Enterprise Architects
Graphs for Enterprise Architects
 
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)
 
Denodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data LeadershipDenodo DataFest 2017: Company Leadership from Data Leadership
Denodo DataFest 2017: Company Leadership from Data Leadership
 
When everybody wants Big Data Who gets it?
When everybody wants Big Data Who gets it?When everybody wants Big Data Who gets it?
When everybody wants Big Data Who gets it?
 
Strategies for efficient Delivery
Strategies for efficient DeliveryStrategies for efficient Delivery
Strategies for efficient Delivery
 
Strategies for efficient delivery with APIs, containers, microservices and De...
Strategies for efficient delivery with APIs, containers, microservices and De...Strategies for efficient delivery with APIs, containers, microservices and De...
Strategies for efficient delivery with APIs, containers, microservices and De...
 
Organising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data WorldOrganising the Data Lake - Information Management in a Big Data World
Organising the Data Lake - Information Management in a Big Data World
 
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
Personalization Strategies Leveraging a Data Management Platform - with Bank ...Personalization Strategies Leveraging a Data Management Platform - with Bank ...
Personalization Strategies Leveraging a Data Management Platform - with Bank ...
 
Building Audi’s enterprise big data platform
Building Audi’s enterprise big data platformBuilding Audi’s enterprise big data platform
Building Audi’s enterprise big data platform
 
Hadoop for Business Intelligence Professionals
Hadoop for Business Intelligence ProfessionalsHadoop for Business Intelligence Professionals
Hadoop for Business Intelligence Professionals
 
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
Big Data LDN 2017: The Logical Data Warehouse – A Modern Analytical Architect...
 

More from Dr. Arif Wider

More from Dr. Arif Wider (9)

Data Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about peopleData Mesh - It's not about technology, it's about people
Data Mesh - It's not about technology, it's about people
 
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...
 
Continuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production ReliablyContinuous Intelligence: Moving Machine Learning into Production Reliably
Continuous Intelligence: Moving Machine Learning into Production Reliably
 
Continuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in ProductionContinuous Intelligence: Keeping your AI Application in Production
Continuous Intelligence: Keeping your AI Application in Production
 
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...
 
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf HamburgA High-Performance Solution to Microservice UI Composition @ XConf Hamburg
A High-Performance Solution to Microservice UI Composition @ XConf Hamburg
 
An Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI CompositionAn Unexpected Solution to Microservices UI Composition
An Unexpected Solution to Microservices UI Composition
 

Recently uploaded

TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 

DataDevOps - A Manifesto on Shared Data Responsibility in Times of Microservices

  • 1. DataDevOps - Data Manifesto Shared Responsibilities for Data in Microservice architectures Sebastian Herold & Dr. Arif Wider Con Berlin, September 21-22, 2017
  • 2. Sebastian Herold - Chief Data Architect - Data Engineering Manager - Data Evangelist - @heroldus Seite 2 Dr. Arif Wider - Senior Consultant/Dev - Scala/FP enthusiast - ThoughtWorks Germany data strategy group - @arifwider
  • 3. Seite 3 PL S RUS UA RO CZ D NL B F CH A HR I E BG TR - AutoScout24 & ImmobilienScout24 - 18 European Countries - > 20 Mio. Unique Visitors per month - ~100 GB new data per day - > 100 temporary EMR clusters per day
  • 4.
  • 5. Seite 5 Road to MicroService Architecture – How we started in 2007 BI Tool Middle Tier DWH Staging Core DB CRM DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider 2007 Web Tier Analyst BI Dev
  • 6. Seite 6 Road to MicroService Architecture – How things got complicated in 2011 BI Tool Middle Tier DWH Staging Core DB CRM DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider Web 2011 API APP $$$ APPMySQL Analyst BI Dev
  • 7. APPMySQL APPMySQL APPMySQL Seite 7 Road to MicroService Architecture – How we sliced the monolith in 2013 BI Tool DWH StagingCRM DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider Web 2013 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPI API HADOOP REST API Analyst BI Dev DE
  • 8. AWS APP APP APP APPMySQL APPMySQL APPMySQL Seite 8 Road to MicroService Architecture – How a central data team doesn’t scale BI Tool DWH StagingCRM DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider Web 2015 API APPMySQL Core DB EXP Mongo SEA Elastic Sync APP APIAPIAPI HADOOP REST API APPAPP Analyst BI Dev DE
  • 9. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP AWS Seite 9 Road to MicroService Architecture – How we rearchitectured our Data Landscape BI Tool DWH Central Data Lake on S3 CRM DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider 2017 Core DB APP REST API Analyst DE BI Dev APPAPPAPP
  • 10. Seite 10 Scout24 wants to become a truly data-driven company Fast & easy data-driven product development… …supported by Data & Analytics DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 11. Seite 11 Scout24 wants to become a truly data-driven company Everywhere in the company... ...without bloating up Data DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider Image source: https://www.oddsemiconductorservices.com/
  • 12. Seite 12 SCOUT24 DATA LANDSCAPE MANIFESTO ROLES, RESPONSIBILITIES, AND VALUES FOR A DATA-DRIVEN COMPANY AT SCALE DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 13. Seite 13 SCOUT24 DATA LANDSCAPE MANIFESTO #1 Preamble Data is a key asset of our company. SCOUT24 DATA LANDSCAPE MANIFESTO DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 14. Seite 14 #2 Our Responsibility We, Data & Analytics, are responsible for providing a solid Data Platform as well as clear guidelines and training how to participate in the Data Landscape. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform D’N’A Data Landscape DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 15. Seite 15 SCOUT24 DATA LANDSCAPE MANIFESTO #3 Data Autonomy, Not Anarchy Data autonomy puts data producers & data consumers in control of their data & of their metrics and thereby allows us to be data-driven at scale, but this comes with responsibility. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Data Producer Consumer D’N’A Data Landscape DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 16. Seite 16 SCOUT24 DATA LANDSCAPE MANIFESTO #4 Producer’s Responsibility Data producers are responsible for publishing data to the central Data Lake, for the data's quality, and for publishing metadata that makes it easy to find and consume the data. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer D’N’A Data Landscape DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 17. Seite 17 SCOUT24 DATA LANDSCAPE MANIFESTO #5 Consumer’s Responsibility Data consumers are responsible for the definition & visualization of metrics and for driving the imple- mentation and maintenance of these metrics. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’N’A Data Landscape DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 18. Seite 18 SCOUT24 DATA LANDSCAPE MANIFESTO #6 Exception: Core KPIs We, Data & Analytics, take the full ownership and responsibility of the few top company-wide core KPIs. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’N’A Data Landscape Core metric DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 19. Seite 19 SCOUT24 DATA LANDSCAPE MANIFESTO #7 Transparency Over Continuity We value data transparency over data continuity, which means we may break metric comparability if it is for the cause of enabling better insights. SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Producer Consumer D’N’A Data Landscape Core metric DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 20. Seite 20 SCOUT24 DATA LANDSCAPE MANIFESTO The Ultimate Goal SCOUT24 DATA LANDSCAPE MANIFESTO Data Platform Metadata Data Producer Consumer D’N’A Data Landscape Core metric Data products A federal landscape of data producers and consumers with just enough rules to ensure seamless co- operation without severely impeding autonomy. DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 21. Seite 21 Consequences for Product Development Teams? DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider - Think about data & reporting - Deliver your data to the lake - Provide meta data (schema, descriptions, versions) - Eat your own dog food: Consume your own data for reporting -> take responsibility for data quality
  • 22. Seite 22 Benefits for Product Development Teams? DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider - Independently work with data - No dependencies to data teams - Company data is curated and it’s easy to consume data produced by other teams
  • 23. DevOps Seite 23 #DataDevOps DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider
  • 24. Seite 24 Learnings and lessons  Publish exhaustive, general, and denormalized event data  Avoid consumer-specific tailoring of data you publish  Consume your own data, e.g. for KPI reports  Try out ad-hoc analytics notebooks to get better insights  Inform data producers, if you rely on their data  Invest in documentation and guidelines for your data platform to keep your effort for support low DataDevOps – Data Manifesto | BEDCon 2017 | Sebastian Herold & Arif Wider

Editor's Notes

  1. Willkommen zum Vortrag DataDevOps – Data Manifesto
  2. Ich bin Sebastian Herold Ich bin Arif Wider
  3. Kleiner Hinweis: Folien werden auf English sein! Fragt, wenn ihr was nicht versteht! Scout24: Kernmarken: AutoScout24 größte Online Autoportal in Europa ImmobilienScout24: Größtes ImmobilienPortal in Deutschland Insgesamt vertreten in 18 Ländern Europa >20 Mio. Unique Visitors pro Monat Ca. 100GB neue Daten pro Tag > 100 temporäre Hadoop Cluster am Tag
  4. - Wir sind eine IT Beratungs- und Delivery Firma, aber vor allem sind eine Community von über 5000 Technologie Enthusiasten - Viele von den Entwicklern im Raum kennen sicher die Artikel von Martin Fowler und anderen ThoughtLeadern die bei TW arbeiten - Mit Scout verbindet uns eine sehr gute Partnerschaft, die jetzt schon über 3 Jahre andauert - So haben wir u.a. mit der Hilfe von Scout ein Delivery Center in Barcelona aufgebaut und damit TW Spanien gegründet, unser 42 Office in einem von 15 Ländern
  5. Aus Sicht des Data Teams! -> Sonst viel komplexer -> Architektur vereinfacht dargestellt! Wir starten in 2007 (manche Teile sind aber schon älter) 10 Jahre her! Anwendung: Saubere 3-Schicht-Architektur Web Tier Middle Tier Operative Oracle DB (Klick) Für Analysen wollte Reports ziehen (Klick) eigenes DWH, um nicht die CoreDB mit analytischen Abfragen zu belasten Zentrale DB -> Staging -> DWH -> BI Tool
  6. Mehr Systeme integrieren oft direkt in die zentrale DB -> klassische 3 Schichten-Architektur aufgehoben Systeme haben unterschiedliche Anforderungen: Mehr Schreiblast, hohe Leselast, Key-Value abfragen, > klassiches Problem damals: die One-Size-fits-All-DB skaliert nur durch Überweisen immer größerer Beträge an Oracle, was nicht immer sinnvoll ist - weitere Applikationen haben Daten für’s DWH und müssen integriert werden
  7. 2013 (vor 4 Jahren): Beginn des Chaos DB-Skalierung gelöst -> Denormalisierung: Eigene DB für Suche und Anzeige von Dateilseiten -> Synchronisierung der Daten Immer mehr kleinere Applikationen, die Daten bereitstellen Immer mehr unstrukturierte Daten, die nicht in unsere klassische relationale Welt passten Hadoop Cluster aufgebaut Nicht geeignet für einzelne Events REST-API davor, sammelt Events des gleichen Typs und packt sie in größeren Chunks in HDFS Einfaches Reporting für Applikationen JSON setzt sich für Business-Reporting durch, ganz anders als die relationale Welt Standardisierung durch einheitliche IDs Direkte Anbindung an BI Tools Immer mehr Analysten und DataScientisten arbeiten auch direkt auf Hadoop
  8. 2015: Das Chaos ist perfekt! Immer mehr Applikationen Cloud-Strategy der Firma -> alle gehen über kurz oder lang in die Cloud Ständige Wartung Mappings ins DWH Ständiges Einsammeln von Metadaten zu den Daten Wartung nahm einen Großteil der Arbeit ein Zentrales Bottleneck für Änderung am Schema Immer mehr Leute unzufrieden, konnten DWH-Daten nicht mit anderen Daten verknüpfen Wir mussten etwas ändern
  9. 2017: - (Klick) BI Developer und Data Engineering zusammengeworfen und ein Team daraus gemacht (Klick) Zentraler Data Lake in der Cloud: Als führendes System für sktrukturierte und unstrukturierte Daten, damit war ein Verknüpfen aller Daten gewährleistet Warum S3? Billig, günstiger als EC2 auf EBS Verlässlich: 7 9nen, in viele Technologien integriert ist Zugriff durch mehrere Systeme gleichzeitig Performance Nachteil gegenüber HDFS meistens gering -> ansonsten HDFS für Zwischenergebnisse (Klick) DWH nur als Cache für analytische Abfragen (Klick) Applikationen im Rechenzentrum weiterhin Hadoop Rest API (Klick) Datenbanken direkt exportiert (Klick) CRM exportiert und importiert Daten (Klick) Apps haben direkt in den Lake gestreamt z.B. über Kinesis Firehose Technischen Voraussetzungen gegeben, damit Dev Team einfach Daten in die Data Platform einbringen konnten und alle Daten verknüpft werden können Darstellung vereinfacht, natürlich war alles komplizierter und Legacy-Systeme vorhanden… Dazu kam natürlich noch ein weiteres Phänomen, von dem Arif erzählen wird
  10. - Wir haben gesehen, wie durch Microservices sich die Zahl und die Heterogenität der Datenproduzenten exponentiell erhöht hat, und wie man dem technisch… - Jetzt gab es aber parallelzur Modularisierung eine weitere Entwicklung bei Scout, nämlich jene zu mehr Datengetriebener Produktentwicklung. - Was bedeutet das… - Natürlich mit Unterstützung von data & analytics
  11. Jetzt aber überall in der Firma Auch die Anzahl der Konsumenten exponentiell erhöht! Diese Konsumenten wollen Ihre speziellen Daten mit den allgemeinen DWH Daten verknüpfen Jetzt möchte Data & Analytics natürlich gerne alle Teams dabei unterstützen, aber Scout ist natürlich nicht bereit das Data team auf die gleiche exponentielle Weise zu vergrößern wie die Datennutzung Resultat: das Data & Analytics wird noch mehr zum Bottleneck und die Frustration auf beiden Seiten ist hoch Viel dieser Frustration durch unklare Verantwortungen, bzw. Verantwortungslage von 2007 Was wir hier erkannt haben: es reicht nicht nur die technische Organisation auf eine neue Basis zu stellen, sondern auch die Weise, wie die Verantwortungen und Interaktionen mit Daten im Unternehmen geregelt sind, muss neu gedacht werden.
  12. - Um hier einen Neuanfang zu markieren, kamen wir nun auf die Idee, ein wie wir es genannt haben, Scout Data Land Manifest zu erstellen, um uns darauf in der Firma zu einigen - Wie erwähnt geht es dabei um klare Verantwortlichkeiten, daher Rollen, und vor allem gemeinsame Werte - Erstmal innerhalb von Data auf Werte einigen… - 7 Prinzipien, die wir von jeweils einer Annahme, d.h. einem Glauben in einen Wert abgeleitet haben
  13. Wir glauben, dass die Daten nicht mehr nur das Business unterstützen, sondern Kernbestandteil des Businesses sind. Nur durch Daten und Analyse können wir unseren Kunden und den Markt verstehen und damit die richtigen Produkte und Diensteistungen bereitstellen. Auch wenn das mittlerweile fast schon selbstverständlich ist, wollten wir hier einen Grundstein für ein gemeinsames Werteverständnis legen.
  14. Wir glauben daher, dass jeder in der Firma einfachen Zugang zu Daten haben muss und es einfach sein muss Daten zu veröffentlichen die von anderen genutzt weden können. Hierfür braucht man eine solide Datenplattform, d.h. einfach zu benutzende Werkzeuge und eine verlässliche Infrastruktur. Das ist unsere Kernverantwortung. Diese Datenplattfrom bildet die Grundlage für das was wir als Datenlandschaft bezeichnen, eine Art Spielplatz für Daten für den wir ausser den Werkzeugen und Technologien auch noch klare Regeln und Guidelines zur Teilnahme bereitstellen.
  15. Wir glauben, dass eine zentralisierte Datenverwaltung nicht so skaliert, wie wir uns das für eine Datengetriebene Firma wünschen. Stattdessen glauben wir, dass Datenautonomie, die einzige Möglichkeit ist ein solches Maß an Datennutzung über die ganze Firma zu verteilen. Allerdings darf Datenautonomie nicht Datenanarchie werden, weshalb es klare Spielregeln und Verantworlichkeiten geben muss. Dementsprechend geben wir Datenproduzenten und Konsumenten die Kontrolle über Ihre Daten und Metriken .
  16. Wir glauben, dass umfangreiche Datenverfügbarkeit und vor allem deren Auffindbarkeit und Nutzbarkeit von höchster Bedeutung sind, aber letztendlich von niemand anderem gewährleistet werden können als von dem Produzenten der Daten selbst. Nur der Produzent selbst hat wirklich die Kontrolle und auch den notwendigen Kontext, um überhaupt zu wissen, was die Daten bedeuten und wie es um deren Qualität steht.
  17. Auch bei den Metriken und deren Definitionen glauben wir an das Prinzip von Single Ownership. Und so wie Datenproduzenten…, glauben wir dass nur der Konsument der wirklich an einer bestimmten Metrik interessiert ist und sie verwendet, letztendlich deren Bedeutung definieren und deren Implmentierung treiben und erhalten kann. Bei allem anderen riskiert man, dass es unterschiedliche Auslegungen ein und derselben Metrik gibt.
  18. Ausnahme: We believe that a minimum level of company-wide comparability & reliability of core KPIs is crucial for leading the company into the right direction. The ELT is the owner of these core KPIs and the data group represents the ELT here in terms of metric ownership.
  19. We believe that transparency is crucial for understanding what the meaning of a metric is. If month-to-month comparability must never break, there is no way to continuously improve metrics and their transparency based on new insights.
  20. Förderalistische Bund mit Spielregeln der Zusammenarbeit, die jedem genug Freiheit geben, autonom zu agieren, aber sich nicht behindern/blockieren
  21. Was heißt das für die Produkt-Teams? (Klick) Sie müssen sich viel mehr Gedanken um Daten machen: Reporting, welche Datenbank nutze ich: allein in AWS dutzende Möglichkeiten -> vielleicht sogar selbst betreiben (Klick) Sie müssen die Business in den Data-Lake bringen (unterstützt durch Data Platform) (Klick) Sie müssen Meta-Daten bereitstellen: Schema Beschreibung Verknüpfbarkeit Versionierung (Klick) Eat your own dog food: Benutz diese Daten für dein eigenes Reporting: Umdrehen der Verantwortung: Daten-Qualität liegt beim Produzenten -> Sie müssen sich überhaupt mit dem Reporting beschäftigen -> Schlüpfe in die Rolle des Konsumenten Verstehe wie das Reporting funktioniert und was andere mit den Daten machen
  22. Was bekommen sie dafür? Kein Warten auf Data Team -> unabhängiges Arbeiten Daten von anderen Team sind einfacher nutzbar und integrierbar
  23. Folie (Arif) Fortsetzung der DevOps Idee: Neue Verantwortung erlaubt unabhängiges Arbeiten -> Full Stack Sicht auf das Problem
  24. (Arif)
  25. Fertig! (Arif)