3. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
4. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Data is more important now than ever before!!
4
5. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
6. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Large Orgs Can Be Tempted to Put AI/ML Cart Before Data Horse
6
7. What is DataOps? = Modern Data Engineering
Practice
DataOps is an automated, process oriented methodology,
used by analytic and data teams to improve the quality
and reduce the cycle time of data analytics.
7
8. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Data Science vs. Data Engineering
There is a significant overlap between data engineers & data
scientists when it comes to skills and responsibilities.
The main difference is one of focus.
Data Engineers are focused on building infrastructure &
architecture for data generation, preparation and
publishing.
In contrast, data scientists are focused on advanced
mathematics and statistical analysis on published data.
Some traditional enterprise “data management’
professionals will become data engineers.
Link here to detailed infographic from DataCamp
Reporting and Visualization
Statistical Modeling & Machine Learning
Data Movement
Data Cleaning, Unification, Alignment
Database Performance Optimization
Software
Engineer
Data
Scientist
Data
Engineer
Data Engineer
Data Scientist
8
9. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Reality of modern enterprise data scientists.
They are constantly and idiosyncratically “fixing” the core data
Data Scientist Survey by Figure Eight
9
10. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Empowering data consumers is core to success of
industry disruptors
“One of the primary goals of the platform is to enable
other teams to focus on business logic, making
experimentation, implementation, operation of
stream processing jobs easy. By having a platform to
abstract the “hard stuff”, removing complexities away
from users, this would unleash broader team agility
and product innovations.”
10
11. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
12. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Traditional “Methods” for data engineering
All necessary, none alone is sufficient to solve broader problems either approach alone is sufficient
● Standardization -- one schema to rule them all
● Aggregation -- tends to create more/bigger siloes
● Federation -- always creates significant query performance challenges
● Master Data Management -- “deviations” difficult to handle and too deterministic
● Rationalize Systems -- single vendor = radical compromise
● Throw Bodies at it -- expensive, time-consuming, ineffective & inconsistent
12
13. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Human/behavioral challenges often primary bottleneck
● Afraid to share data
○ Due to data quality (worry about being judged or having to take on the responsibility of
fixing the data consumers’ requests)
● Hoarding data
○ A method of organizational control or job preservation
● Obscuring data complexity
○ Failure to embrace the complexity, diversity, and idiosyncrasy of data generated in a large
enterprise
● Limiting access to a small number of users
○ A method of control or as a reflection of insecurity of data quality
13
14. Traditional companies have significant “legacy drag coefficient”
Manage data from their business systems more as “exhaust” than “asset” > “significant data debt”
Result: “Random Data Salad”
Data debt from constant change/entropy
Restructuring
Leadership
Changes
Politics
Dynamic Schema DBs -
Mongo et al
“Data
Hoarding”
Legacy
Burden
M&A
Problem: Thousands of systems generating
data every day that were built over decades
to support business processes - idiosyncratic
to that time/context.
Data is idiosyncratic to each system - creates
fundamental “data disconnect” and “data
decay” Consequences: 1. Too much time spent on data prep vs. analysis / action.
2. High failure rate of BI / analytics projects
3. Game changing initiatives deemed ‘impossible’ and never
start
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 14
15. Modern, Open Data Engineering Ecosystem
(aka DataOps)
Sources (tabular)
Internal
databases
Internal
apps / systems
External
endpoints
Internal
files
External
files
Feedback &
Usage
Mastering & Quality
Movement & Automation
Storage & Compute
Governance, Privacy
& Policy
Catalog &
Crawling
Consumption
endpoints
Analytics
Source
Remediation
Source / Cloud
Migration
Custom
Apps
Consumers
Citizens
Analysts
Data Scientists
Developers
Publishing &
Versioning
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
15
16. Sources People, Process, Tools
Internal Tabular
Data
External Tabular
Data
● Cloud First
● Continuous (assume data will change)
● Agile (deploy quickly and iteratively)
● Highly Automated - automate whenever possible
● Open/Best of Breed (not one platform/vendor)
● Bi-Directional (Feedback)
● Collaborative (Humans at the Core)
● Service Oriented (clear endpoints for data)
● Loosely Coupled (Restful Interfaces Table(s) In/Out)
● Both aggregated AND federated storage
● Both batch AND Streaming
● Lineage/Provenance is essential
● Scale Out/Distributed
Modern Enterprise Data Engineering Principles ≈ “DataOps”
Consumers
Citizens
Analysts
Data
Scientists
Developers
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 16
17. Reality of Data Ecosystem/Landscape : EXTREME NOISE
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
17
18. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
19. How many employees work from
home?
Regular work-at-home has grown 173% since 2005,
11% faster than the rest of the workforce.[Global
Workplace Analytics’ analysis of 2018 ACS data]
How many people could work-
from-home?
● 56% of employees have a job where at least
some of what they do could be done remotely
[Global Workplace Analytics analysis of BLS data, 2017]
● 62% of employees say they could work
remotely [Citrix 2019 poll]
● Studies repeatedly show desks are vacant 50-
60% of the time.
Adapting to change - Telecommuting/WFH/Remote working is not a
new concept
The chart shows the percentage of people who work-at-
home by industry. [Global Workplace Analytics’ special analysis of 2016 ACS data]
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 19
20. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Data
Engineering
Data
Suppliers
CIO
Source Owner
DBA
IT Professional
CDO
Data Engineer
Curator
Steward
Business owners and Other CxOs
Data Consumers
Data Scientist
Data Analyst
Data Citizen
Developer
ELT Professional
Key People/Personas in the Modern Data Ecosystem
20
21. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Key Roles in next-gen Data Ecosystem
Role Goals Tools
Citizen Use data to make business decisions
Viz, CRM, Excel, PowerPoint, Word, Web
Search
Analyst Deliver insights to the business, typically through dashboards and reports Viz, Excel, SSDP, Web Search
Scientist Deliver insights to the business, typically through models and algorithms R, Python, SAS, SSDP
Developer Build applications which leverage corporate data Python, Java, JS, SQL, REST
Engineer Deliver and manage data pipelines ETL, SQL
Curator Ensure consumers have the data they need, in the form they need it Data mastering tools
Steward Uses feedback from consumers to improve data broadly Data Feedback Tools
Source Owner
Define and manage purpose, processes (data creation, consumption) & users
(i.e., access) of the data source
EDW, SQL, ERWin, LDAP, SAP
ConsumersPreparersSuppliers
21
22. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
23. Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Useful comparison between CDO and CFO
Tablestakes
CHIEF FINANCIAL OFFICER
● What money do we have?
● Where did it come from?
● Where is it going and why?
Long term goal: Return on Assets
CHIEF DATA OFFICER
● What data do we have?
● Where does it come from?
● Who consumes it and why?
Long term goal: Return on Data
23
24. How Do I Start?
You may have already started...
● Leverage existing mastered data as ground truth
● Keep the best parts of your MDM, just enhance the Mastering capability
...But if you haven’t…
● Find a data-rich, analytically valuable problem for which fragmented data and
knowledge present a challenge
...Either way, it’s essential to keep an agile...
● ...Mindset - focus on quick wins that have been beyond reach, then build
● ...Skillset - engage the data experts at their current skill level, let machines do the
rest
● ...Toolset - simple, collaborative data curation, optimally in the tools they already
use
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 24
26. 212 Sources (tables) - mostly
SAP
● Enterprises have hundreds of
source systems
● Sources must be combined,
consolidated, and classified
● These lists are building blocks
for transformational analytics
What are Transformational Analytic Outcomes?
Question: How many customers do we have?
Before After
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 26
27. Before After
What are Transformational Analytic Outcomes?
Question: What is our customer distribution by sales totals?
● Analytics begin with sell more
and/or spend less
● Transformational analytics aren’t
new, they are broader
● Business wants speed and up-to-
date information
● Data variety skews answers,
creating misinformation instead of
clarity
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 27
28. Agenda
● Introduction
● What is Modern Enterprise Data
Engineering? (Why it’s more
important than ever before)
● How to adapt your Data
Engineering processes to rapid
change
● Strategies to keep remote teams
aligned and virtually connected
● The importance of using data to
drive business decisions
● Why organizations need to
modernize data management
quickly and effectively (How to
accelerate cloud migration)
29. Why now? 7 years ago, we needed data scientists!
But now that we have them - where do they get their data?
Data Scientist: The Sexiest
Job of the 21st Century
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 29
30. Today: we have data scientists! (and want to do cool AI stuff)
Data Scientist Jobs
Indeed.com, % of all postings
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 30
31. Unique Moment in Time: Enterprise Data as an Asset
Vestibulum congueLatent Opportunity of
Enterprise “Data As An
Asset”
Enterprise Migration to
the Cloud
Generational Change in
Enterprise Data
Management
Fear of disruption by “Data Natives”
Low Hanging Analytical Fruit
“Competing on Analytics”/Strategic Imperative
Popularity of AI but really of Core Data Quality
Inability of traditional tech to scale
Lack of innovation from old vendors
Maturing Big Data Tech (HDFS/Lakes)
Democratization of analytics
Rise of the CDO
Decades of treading data as operational exhaust
Deeply Fragmented/Siloed Data Environments
Inability to leverage new sources - esp external
“AI Cart” before the “Data Horse”
Significant “lift & shift” opportunity
Potential for behavioral changes
New infra good/secure enough
Now that we’ve established Data Science as critical component of
enterprise:
It’s time for each enterprise in the Global 2000 to build their data
engineering muscles to enable them to “compete on analytics”
over the coming decades.
31
32. What NOT to do
● Avoid boil the ocean/”waterfall” (projects measured in years/quarters)
○ Build rational long term infra while delivering real analytic value along the way
● Single “Platform”: Don’t overestimate what single piece of software can do
○ Focus on thoughtfully designed ecosystem of loosely coupled best of breed tools
● Single Vendor: Don’t overestimate what single vendor can do
○ Align vendors with APIs and expectations that they MUST work together
● Don’t Underestimate effort required to make FOSS work
○ Just because Google does it doesn’t mean you can do it
● Don’t underestimate human/behavioral challenges with data
○ Most often the reason that projects fail/stall are human/behavioral
● Avoid “Data Engineering/Science Hubris”
○ I Data - therefore I am
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty 32