SlideShare une entreprise Scribd logo
1  sur  36
Your Data Nerd Friends Need You!
How the world of data analytics, science and
insights is failing and how the principles from Agile,
DevOps, and Lean are the way forward. #DataOps
October 30, 2019
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
What is this talk about?
A BIG PROBLEM
THAT CAN USE
YOUR HELP
BY HAVING
EMPATHY FOR A
GROUP OF PEOPLE
THAT ARE
SUFFERING
AND WHAT YOU
KNOW CAN HELP
THEM
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
It’s Big (Data) …
• ‘New Oil’ amount of data increasing fast
• Buzz: Big Data, Data Science, Data Lakes,
Machine Learning, AI
• $189.1 Billion Market , Double-Digit Annual
Growth Through 2022.
• $7.5B for GitHub, $15.7B for Tableau
• 10s millions of people creating insight from data
• More than software developers.
• 1 of 25 workers full time, significant part time.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
It’s a big
problem
that can
use your
help
• 87% of data science projects never make it into
production.
• Data analytics investment up, yet “data driven”
organizations down 37% to 31% since 2019.
• 80% of AI projects resemble alchemy
• 60% of all data analytic projects fail
• 79% of data projects have too many errors
• … “They’re not even using version control!”
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Walk down the hall to
your data analytics group
and observe
• Poor quality, high errors
• Minor changes take months to
implement, manual processes
• 75 percent of the day is hijacked by
unplanned work
• Oversubscribed resources limit
overall productivity.
….. Sound familiar?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
A BIG PROBLEM
THAT CAN USE
YOUR HELP
BY HAVING
EMPATHY FOR A
GROUP OF PEOPLE
THAT ARE
SUFFERING
AND WHAT YOU
KNOW CAN HELP
THEM
Agenda
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Who Are These People?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They took a
different door …
• Talk like you, look like you
• But early in their career they took the
data analytics door, not the software
door
• Complex toolchain
• 50+ tools in each category
• People love their tools
• Some code, some configure
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in Teams
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in teams together
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They work in teams together
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
With a massive, fragmented toolchain
Data Sources and/or Data Lake
ETL Tools (Informatica, Talend, etc.)
Databases (Redshift, SQL Server, etc.)
Data Science Tools (Python, DataIku, etc.)
Data Catalog Tools (Alation, wiki)
Data Visualization Tools (Tableau, etc)
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They may work for the same boss
Chief Data Officer
Chief Analytics Officer
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Or not
CIO or CDO
Line of Business
Executives
CEO
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
A many to many dev/ops relationship
Data Specific Production
Team
Do Operations
Themselves
Use IT Ops, Data
Production Team &
Themselves
“DEV” “OPS”
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Example: Coordination of Two Teams, Two
Locations, Two Ops, Many Tools
Home Office Team Local Office ‘Self-
Service’ Team
VP Marketing
Data
Engineer
Data
Scientist
Centralized,
Weekly
Cadence of
Changes
Data
Analyst
Distributed,
Daily/Hourly
Cadence of
Changes
Boston
New Jersey
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Challenges With Coordination
Home Office Team
Data
Engineer
Data
Scientist
Local Office Team
Data
Analyst
VP Marketing
Make a change in schema? Break Reports?
Add New Data SetsNot Available For All?
Change Report Calculations Inconsistencies?
New Data & Schema Update/New Report Not Working?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
And they source data from internal and
external system
CRM
ERP
Supply Chain
Website
Financial
HR
Open Data
Syndicated
Databases
APIs
Files
‘DevOps’ Governed
Systems
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
They run a ‘Factory’ of Insight
CRM
ERP
Supply Chain
Website
Financial
HR
Open Data
Syndicated
Databases
APIs
Files
Access:
Python Code
Transform:
SQL Code,
ETL
Model:
R Code
Visualize:
Tableau
Workbook
Report:
Tableau
Online
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
And need to deploy quickly from
dev to production
Data
Engineers
Data
Scientists
Data
Analysts
Diverse Team
Diverse Tools
Diverse Customers
Business
Customer
Products &
Systems
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
And need to do both simultaneously
Don’t want break
production when I
deploy my changes
Don’t want to learn about data quality issues from my customers
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
A BIG PROBLEM
THAT CAN USE
YOUR HELP
BY HAVING
EMPATHY FOR A
GROUP OF PEOPLE
THAT ARE
SUFFERING
AND WHAT YOU
KNOW CAN HELP
THEM
Agenda
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Your Data Nerd Friends
Are Suffering
• Hero culture
• Fear culture
• Insanely high error rate
• Complete lack of automated testing
• Deploy to product rates of months
• Lots of hope, heroism and fear.
• Technology Review Boards
Project Panther! It’s a subplot in
Gene’s new book for a reason
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Have High Errors
DataKitchen/Eckerson Survey (May 2019)
DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Europe
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Currently, Teams Struggle to Deploy
DataKitchen/Eckerson Survey (May 2019)
DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Europe
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
My Story
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
A BIG PROBLEM
THAT CAN USE
YOUR HELP
BY HAVING
EMPATHY FOR A
GROUP OF PEOPLE
THAT ARE
SUFFERING
AND WHAT YOU
KNOW CAN HELP
THEM
Agenda
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps is having a moment
• DataOps Manifesto 2017
• 6000 signatures
• Gartner Hype Cycle in late 2018
• Increased market adoption of DataOps
principles by leaders of data and analytic
teams in 2019
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps – Transformative to Data Analytics
DataOps is a set of technical practices,
cultural norms, and architecture that
enable:
• Rapid experimentation and innovation
for the fastest delivery of new insights to
our customers
• Low error rates
• Collaboration across complex sets of
people, technology, and environments
• Clear measurement and monitoring of
results
Source: Gartner
“Organizations that adopt a DevOps- and DataOps-
based approach are more successful in
implementing end-to-end, reliable, robust, scalable
and repeatable solutions.”
Sumit Pal, Gartner, November 2019
People,
Process,
Organization
Technical
Environment
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
From To
Change Fear Change Velocity
Manual Operations Automated Operations
Hope For Quality Integrated Quality
Hero Mentality Repeatable Processes
Heads Down Collaboration
Vendor Lock-In Diverse Tools
How To Succeed?
A Mindset Change to …
…to power your highly agile data culture.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Education: Seven Steps to DataOps (+3)
1. Orchestrate Two Journeys
2. Add Tests And Monitoring
3. Use a Version Control System
4. Branch and Merge
5. Use Multiple Environments
6. Reuse & Containerize
7. Parameterize Your Processing
+ Three (Architecture, Metrics and Inter/Intra Team Collaboration)
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DataOps contains many of the same concepts as software development and
many unique to data analytics
DataOpsDevOps
DevOps vs DataOps:
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
DevOps vs DataOps:
1. Different Process, Different People and Expectations
2. DevOps 1:1 DataOps Many:Many
• Multiple ‘Dev’ and ‘Ops’ groups
3. DataOps Views Data Analytics as ‘Factory’
• Multi-Tool Orchestration Testing, Monitoring and Statistical Process Control
4. DataOps Has Additional Development Complexities
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Ask your data analytic teams impertinent
questions
• Are you using source control for your work?
• How many automated tests to you have in production?
• Do you have regression, functional or unit tests for you
work?
• How long does it take to deploy ETL/models/BI report
from development to production?
• Do you have automated deployment?
• How up to date is your development environment?
• How often are your business users finding errors in the
data?
Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
Learn More
• For these slides, contact me:
• cbergh at datakitchen dot io
• DataOps Manifesto:
• http://dataopsmanifesto.org
• Free DataOps Cookbook:
• https://www.datakitchen.io/dataops-cookbook-
main.html
• Excerpt from Gene’s Unicorn Project Book on
DataOps
• https://www.datakitchen.io/unicorn-project.html

Contenu connexe

Tendances

Tendances (20)

ODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps ManifestoODSC May 2019 - The DataOps Manifesto
ODSC May 2019 - The DataOps Manifesto
 
Bridged Overview by CodeData
Bridged Overview by CodeDataBridged Overview by CodeData
Bridged Overview by CodeData
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products The 3 Key Barriers Keeping Companies from Deploying Data Products
The 3 Key Barriers Keeping Companies from Deploying Data Products
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
You're the New CDO, Now What?
You're the New CDO, Now What?You're the New CDO, Now What?
You're the New CDO, Now What?
 
Understanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application QualityUnderstanding DataOps and Its Impact on Application Quality
Understanding DataOps and Its Impact on Application Quality
 
Measuring Data Quality with DataOps
Measuring Data Quality with DataOpsMeasuring Data Quality with DataOps
Measuring Data Quality with DataOps
 
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013
 
The Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data GovernanceThe Model Enterprise: A Blueprint for Enterprise Data Governance
The Model Enterprise: A Blueprint for Enterprise Data Governance
 
Applied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML modelApplied Data Science Course Part 1: Concepts & your first ML model
Applied Data Science Course Part 1: Concepts & your first ML model
 
Operationalizing Data Analytics
Operationalizing Data AnalyticsOperationalizing Data Analytics
Operationalizing Data Analytics
 
Monitoring data quality by Jos Gheerardyn of Yields.io
Monitoring data quality by Jos Gheerardyn of Yields.ioMonitoring data quality by Jos Gheerardyn of Yields.io
Monitoring data quality by Jos Gheerardyn of Yields.io
 
2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results2020 Big Data & Analytics Maturity Survey Results
2020 Big Data & Analytics Maturity Survey Results
 
Operationalizing analytics to scale
Operationalizing analytics to scaleOperationalizing analytics to scale
Operationalizing analytics to scale
 
Dsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovicDsc 2021 presentation_radovan_bacovic
Dsc 2021 presentation_radovan_bacovic
 
Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017Data Engineering Efficiency @ Netflix - Strata 2017
Data Engineering Efficiency @ Netflix - Strata 2017
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)
 
The Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data IntegrationThe Future of Data Warehousing and Data Integration
The Future of Data Warehousing and Data Integration
 

Similaire à Your Data Nerd Friends Need You!

Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
DataKitchen
 

Similaire à Your Data Nerd Friends Need You! (20)

Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview PreparationBig Data Developer Career Path: Job & Interview Preparation
Big Data Developer Career Path: Job & Interview Preparation
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Role of Data in Digital Transformation
Role of Data in Digital TransformationRole of Data in Digital Transformation
Role of Data in Digital Transformation
 
Fri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataopsFri benghiat gil-odsc-data-kitchen-data science to dataops
Fri benghiat gil-odsc-data-kitchen-data science to dataops
 
Data and its Role in Your Digital Transformation
Data and its Role in Your Digital TransformationData and its Role in Your Digital Transformation
Data and its Role in Your Digital Transformation
 
Big data
Big dataBig data
Big data
 
big data
big databig data
big data
 
Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019Building a Data Platform Strata SF 2019
Building a Data Platform Strata SF 2019
 
Data is not the new snake oil
Data is not the new snake oilData is not the new snake oil
Data is not the new snake oil
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Designing a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science StrategyDesigning a Successful Governed Citizen Data Science Strategy
Designing a Successful Governed Citizen Data Science Strategy
 
Big Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential ToolsBig Data Tools: A Deep Dive into Essential Tools
Big Data Tools: A Deep Dive into Essential Tools
 
2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG Enterprise2014 Big Data Research by IDG Enterprise
2014 Big Data Research by IDG Enterprise
 
Kartikey tripathi
Kartikey tripathiKartikey tripathi
Kartikey tripathi
 
10 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 201710 top notch big data trends to watch out for in 2017
10 top notch big data trends to watch out for in 2017
 
Big Data at a Glance
Big Data at a GlanceBig Data at a Glance
Big Data at a Glance
 
Big data and your career final
Big data and your career finalBig data and your career final
Big data and your career final
 
Hadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata CompanyHadoop 2015: what we larned -Think Big, A Teradata Company
Hadoop 2015: what we larned -Think Big, A Teradata Company
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Your Data Nerd Friends Need You!

  • 1. Your Data Nerd Friends Need You! How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps October 30, 2019
  • 2. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. What is this talk about? A BIG PROBLEM THAT CAN USE YOUR HELP BY HAVING EMPATHY FOR A GROUP OF PEOPLE THAT ARE SUFFERING AND WHAT YOU KNOW CAN HELP THEM
  • 3. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. It’s Big (Data) … • ‘New Oil’ amount of data increasing fast • Buzz: Big Data, Data Science, Data Lakes, Machine Learning, AI • $189.1 Billion Market , Double-Digit Annual Growth Through 2022. • $7.5B for GitHub, $15.7B for Tableau • 10s millions of people creating insight from data • More than software developers. • 1 of 25 workers full time, significant part time.
  • 4. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. It’s a big problem that can use your help • 87% of data science projects never make it into production. • Data analytics investment up, yet “data driven” organizations down 37% to 31% since 2019. • 80% of AI projects resemble alchemy • 60% of all data analytic projects fail • 79% of data projects have too many errors • … “They’re not even using version control!”
  • 5. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Walk down the hall to your data analytics group and observe • Poor quality, high errors • Minor changes take months to implement, manual processes • 75 percent of the day is hijacked by unplanned work • Oversubscribed resources limit overall productivity. ….. Sound familiar?
  • 6. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. A BIG PROBLEM THAT CAN USE YOUR HELP BY HAVING EMPATHY FOR A GROUP OF PEOPLE THAT ARE SUFFERING AND WHAT YOU KNOW CAN HELP THEM Agenda
  • 7. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Who Are These People?
  • 8. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They took a different door … • Talk like you, look like you • But early in their career they took the data analytics door, not the software door • Complex toolchain • 50+ tools in each category • People love their tools • Some code, some configure
  • 9. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in Teams
  • 10. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in teams together
  • 11. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They work in teams together
  • 12. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. With a massive, fragmented toolchain Data Sources and/or Data Lake ETL Tools (Informatica, Talend, etc.) Databases (Redshift, SQL Server, etc.) Data Science Tools (Python, DataIku, etc.) Data Catalog Tools (Alation, wiki) Data Visualization Tools (Tableau, etc)
  • 13. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They may work for the same boss Chief Data Officer Chief Analytics Officer
  • 14. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Or not CIO or CDO Line of Business Executives CEO
  • 15. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. A many to many dev/ops relationship Data Specific Production Team Do Operations Themselves Use IT Ops, Data Production Team & Themselves “DEV” “OPS”
  • 16. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Example: Coordination of Two Teams, Two Locations, Two Ops, Many Tools Home Office Team Local Office ‘Self- Service’ Team VP Marketing Data Engineer Data Scientist Centralized, Weekly Cadence of Changes Data Analyst Distributed, Daily/Hourly Cadence of Changes Boston New Jersey
  • 17. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Challenges With Coordination Home Office Team Data Engineer Data Scientist Local Office Team Data Analyst VP Marketing Make a change in schema? Break Reports? Add New Data SetsNot Available For All? Change Report Calculations Inconsistencies? New Data & Schema Update/New Report Not Working?
  • 18. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. And they source data from internal and external system CRM ERP Supply Chain Website Financial HR Open Data Syndicated Databases APIs Files ‘DevOps’ Governed Systems
  • 19. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. They run a ‘Factory’ of Insight CRM ERP Supply Chain Website Financial HR Open Data Syndicated Databases APIs Files Access: Python Code Transform: SQL Code, ETL Model: R Code Visualize: Tableau Workbook Report: Tableau Online
  • 20. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. And need to deploy quickly from dev to production Data Engineers Data Scientists Data Analysts Diverse Team Diverse Tools Diverse Customers Business Customer Products & Systems
  • 21. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. And need to do both simultaneously Don’t want break production when I deploy my changes Don’t want to learn about data quality issues from my customers
  • 22. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. A BIG PROBLEM THAT CAN USE YOUR HELP BY HAVING EMPATHY FOR A GROUP OF PEOPLE THAT ARE SUFFERING AND WHAT YOU KNOW CAN HELP THEM Agenda
  • 23. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Your Data Nerd Friends Are Suffering • Hero culture • Fear culture • Insanely high error rate • Complete lack of automated testing • Deploy to product rates of months • Lots of hope, heroism and fear. • Technology Review Boards Project Panther! It’s a subplot in Gene’s new book for a reason
  • 24. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Have High Errors DataKitchen/Eckerson Survey (May 2019) DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Europe
  • 25. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Currently, Teams Struggle to Deploy DataKitchen/Eckerson Survey (May 2019) DataKitchen / Eckerson Research Survey of Medium – Large Companies US And Europe
  • 26. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. My Story
  • 27. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. A BIG PROBLEM THAT CAN USE YOUR HELP BY HAVING EMPATHY FOR A GROUP OF PEOPLE THAT ARE SUFFERING AND WHAT YOU KNOW CAN HELP THEM Agenda
  • 28. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps is having a moment • DataOps Manifesto 2017 • 6000 signatures • Gartner Hype Cycle in late 2018 • Increased market adoption of DataOps principles by leaders of data and analytic teams in 2019
  • 29. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps – Transformative to Data Analytics DataOps is a set of technical practices, cultural norms, and architecture that enable: • Rapid experimentation and innovation for the fastest delivery of new insights to our customers • Low error rates • Collaboration across complex sets of people, technology, and environments • Clear measurement and monitoring of results Source: Gartner “Organizations that adopt a DevOps- and DataOps- based approach are more successful in implementing end-to-end, reliable, robust, scalable and repeatable solutions.” Sumit Pal, Gartner, November 2019 People, Process, Organization Technical Environment
  • 30. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. From To Change Fear Change Velocity Manual Operations Automated Operations Hope For Quality Integrated Quality Hero Mentality Repeatable Processes Heads Down Collaboration Vendor Lock-In Diverse Tools How To Succeed? A Mindset Change to … …to power your highly agile data culture.
  • 31. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Education: Seven Steps to DataOps (+3) 1. Orchestrate Two Journeys 2. Add Tests And Monitoring 3. Use a Version Control System 4. Branch and Merge 5. Use Multiple Environments 6. Reuse & Containerize 7. Parameterize Your Processing + Three (Architecture, Metrics and Inter/Intra Team Collaboration)
  • 32. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DataOps contains many of the same concepts as software development and many unique to data analytics DataOpsDevOps DevOps vs DataOps:
  • 33. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. DevOps vs DataOps: 1. Different Process, Different People and Expectations 2. DevOps 1:1 DataOps Many:Many • Multiple ‘Dev’ and ‘Ops’ groups 3. DataOps Views Data Analytics as ‘Factory’ • Multi-Tool Orchestration Testing, Monitoring and Statistical Process Control 4. DataOps Has Additional Development Complexities
  • 34. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved.
  • 35. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Ask your data analytic teams impertinent questions • Are you using source control for your work? • How many automated tests to you have in production? • Do you have regression, functional or unit tests for you work? • How long does it take to deploy ETL/models/BI report from development to production? • Do you have automated deployment? • How up to date is your development environment? • How often are your business users finding errors in the data?
  • 36. Copyright © 2019 by DataKitchen, Inc. All Rights Reserved. Learn More • For these slides, contact me: • cbergh at datakitchen dot io • DataOps Manifesto: • http://dataopsmanifesto.org • Free DataOps Cookbook: • https://www.datakitchen.io/dataops-cookbook- main.html • Excerpt from Gene’s Unicorn Project Book on DataOps • https://www.datakitchen.io/unicorn-project.html