Data Leaders Summit Barcelona 2018

Case Study Interactive: Productionising machine learning to automate the
enterprise: How can you pin-point which core business processes to
transform with increased automation and streamline daily workflows to
boost in-house efficiencies?

// Harvinder Atwal
MoneySuperMarket
// Web
dunnhumby
{"previous" : "Insight Director, Tesco Clubcard"}
Lloyds Banking
Group
{"previous" : "Senior Manager, Customer Strategy and Insight"}
{"Current" : "Head of Data Strategy and Advanced Analytics"}
@harvindersatwal
British Airways
{"previous" : "Senior Operational Research Analyst"}
{"about" : "me"}
@gmail.com

£2B
SAVINGS
2017 estimate total of UK savings
1993 24.9M 24 million £323M 989
We started life
as mortgages
2000
Adults choose
to share their
data with us
Average
monthly users
2017
Revenue
2017
Product
Providers

3 major ways Data Science can help the
organisation
Product
Creation
Customer
Experience
Business
efficiency

Intelligent Automation is a solution to
improving Business Efficiency
Robotic Process
Automation
Machine
Learning
Artificial
Intelligence
Business rules to execute tasks
with existing software systems.
Systems performing tasks normally
requiring human intelligence
Predictive and Prescriptive
Analytics driven decision making
Act Like a Human Think Like a Human

Robotic Process Automation (RPA) is
process rather than data-driven
Read Email Open Attachment Enter Data into ERP System

“The science of getting computers to act without
being explicitly programmed” – Andrew Ng
7
Output
Output
Machine
Learning
AlgorithmRules
f(x)
Input
Data
Code
Regular programming Machine Learning
Input
Data
Rules
f(x)
Robotic Process Automation is really Software Engineering

Our Marketing Personalisation relies on Intelligent
Automation through Prescriptive Analytics
From one version to 1400+
customised variants of the newsletter
+19% Increase in Revenue Per Send

Intelligent Automation through ML
Anomaly detection prioritises alerts
Facebook use ML to review code
releases
Automated
Maintenance
Schedules

Can’t we just
AI it?
WT F
AI solves all some problems

We use Natural Language processing to customise
content
Worried about whether you can afford a personal
loan? With UK interest rates at record lows, it’s
worth checking to see how reasonable the cost
could be.
Whether you need to borrow to buy something, or
you want to bring your existing debts under one
roof, have a look at these competitive deals
we’ve assembled.
Thanks to our Smart Search tool, you can get an
idea of the loans you’re likely to be accepted for
before you proceed with your application.
Same message but
Language tailored
to the customer’s
Financial Attitude

Associated Press use NLP tools to
create articles quickly, such
as business earnings reports and
localized election coverage
Classify
documents
according
to their
meaning
and
relevance
to ongoing
litigation
Automated
Claims handling
and Fraud
detection in
Insurance

Algorithms are becoming less and less
important

Automated Claims Processing
Automated Product Descriptions
Automated
Trend
Detection
at Unilever
Automated
Image
Selection at
AirBnb &
TripAdvisor

Netflix - Automated Artwork
HireVue – Automated Interviews
Factory
Automation
at Kawasaki
Heavy
Industries

Ideas aren’t in short supply
Demand Forecasting
Capacity Forecasting
Marketing automation
Supply chain management, automatic ordering
Automatic scaling of infrastructure
Document Classification
Image Annotation
Customer Service
Machine Translation
Anomaly Detection
Product Recommendation
Fraud Detection
Image Selection
Text Generation
Predictive Maintenance
Automated Pricing
Automated routing
Medical diagnosis

Deciding what
to work on
– A better way

Data Science can’t happen in a vacuum
Situational Awareness is needed

Alignment of data science with the rest of the
organisation and it's goals

Your business already has hypothesis for
what creates value
Actively avoid work on anything else
It’s the Corporate Strategy and Objectives
(everyone is aligned behind)

Measurement of everything gives feedback of not just individual deliverables (fast
loop) but also the organisation’s hypothesis of what adds value (slow loop)
Situational Awareness
Themes (Objectives)
Initiatives
Epics
Stories
Initiatives Initiatives
Themes (Objectives)
Epics Epics Epics Epics Epics
Stories Stories Stories Stories Stories Stories Stories Stories Stories Stories Stories
Corporate strategy is broken down into many
options (Epics) for Agile delivery

Improve employee retention Ensure compliance
Increase reliability of operations
Capitalise on physical facilities
Reduce energy usage per unit of production
Improve and maintain workplace safety
Reduce error rates
Improve customer retentionImprove customer satisfaction
Improve customer serviceAcquire new customers from innovative offerings
Grow percentage of sales from new products
Differentiate the product
Understands customer needs
Increase share of wallet
Increase share of market
Cross-sell more products
Reliable products/services Grow Revenue
Increase Margin
Reduce Costs
Work on the Right Things: Organisational
Objectives are your Themes

Themes inspire Initiatives & Epics for the Backlog
Ensure
compliance
Achieve Compliance objective by adhering to anti-money laundering
(AML) regulations; Applying machine learning to automatically monitor
customer transaction data to identify anomalies for investigation.
Resulting in reduced risk of fines; Partnering with Compliance and IT
teams; and Integrating financial transaction and customer identity data.
Achieve [Objective] by [Target outcome]; Proposed [Hypothesis];
Resulting in [Predicted Benefit]; Partnering with [Other Teams and
Partners]; and Integrating [sources and data types].
Develop multiple Initiatives. There is more than one way to achieve
objectives.
Achieve Compliance objective by ensuring information security;
Applying natural language processing and machine learning to
automatically identify sensitive content and off-policy distribution
for review; Resulting in reducing risk of fines and reputational risk;
Partnering with Compliance and IT teams; and Integrating enterprise
email data.

HIRE DATA
SCIENTISTS
How businesses think they become
data-driven
1 2 3
MONEY
FLOWS
HOARD
DATA
4

We know there’s more to it
Opportunity Data Modelling Deployment Benefits
Realisation Refresh
How well do we
plan and
prioritise the
projects we
work on?
How aligned
are plans to the
business
objectives,
stakeholders
and customer
needs
identified
earlier?
How integrated
is our Internal
data?
How structured
and clean is our
data?
How secure is
our data?
How real-time
is our data?
How well do we
track the
benefits of our
work?
How effective is
the feedback
loop to new
opportunities?
How good is
our ability to
sell our work?
How easily can
we create and
validate
models?
Do we have
access to a
wide range of
Algorithms?
Do we have
monitoring in
place?
Are models
explainable?
How easily can
we deploy data
products across
touchpoints?
How quickly
can we create
and update
reporting?
How well do we
drive business
and customer
change?
How well do we
review/revalidate
our output and
processes?
How often do we
reuse existing
processes?
How easily can we
launch
experiments?
How easily can we
update existing
data products?

Meet their needs
• Engage and consult on
interest area
• Increase level of
interest
• Move to right hand box
Key Player
• Focus on this area
• Involve in Governance
• Engage and consult
regularly
Least Important
• Inform via General
Communications
• Increase level of
interest
• Aim to move to right
box
Show Consideration
• Involve in low risk
projects
• Keep informed and
consult
• Potential supporter
Work with the right people
STAKEHOLDER
POWER
STAKEHOLDER INTEREST
STAKEHOLDER
POWER
STAKEHOLDER INTEREST
Which stakeholders are more likely to respond to your recommendations and requests?

Create an Epic prioritisation Matrix
Objective Epic Stakeholders
Power and
Interest
Data
Availability,
Quality and
Integration
Resour
ces
Deployment
Capability
Benefits
Measurement
Expected
Business
Value
Ensure
compliance
Applying machine learning
to automatically monitor
customer transaction data to
identify anomalies. Resulting
in reducing risk of fines
H/L H/H/M M L H 5
Applying natural language
processing and machine
learning to automatically identify
sensitive content and off-policy
distribution; Resulting in
reducing risk of fines and
reputational risk.
H/M L/M/L H L M 3
Improve
customer
retention
to identify customers at risk
of churn. Resulting in higher
retention.
M/H M/H/M M M H 6
to identify customer LTV for
differentiated service.
Resulting in higher retention.
M/H M/H/M M M H 5
Improve
customer
service
Applying natural language
processing to prioritise
customer messages for
resolution. Resulting in
higher NPS.
M/H M/H/M M M H 2
… …..
../.. ../../.. .. .. .. ..

Reduce Batch sizes of work and
have options to keep flow going

Get Specific
As a Data Engineer I
need to create a data
pipeline of
transaction data for
Data scientists
As a Data Scientist I
need to train and
validate a machine
learning model for
accurate predictions
As a Data Engineer I
need to
operationalise and
deploy the machine
learning to make
predictions
need to explore data
to understand the
quality and value for
modelling
need to engineer
features to improve
model accuracy
need to monitor
model performance
to know when to
retrain the model
Big, coarse grained
Small, specific
to automatically monitor
customer transaction data to
identify anomalies.
Resulting in reducing risk of
fines

32
Use small
experiments
to prove
business
value

Objectives
Treat each prioritised Epic as a
hypothesis to be tested
Think of
Interesting
Epics
(Formulate
Hypotheses)
Prioritise
Epics
Create
Stories &
Test
Refine,
Alter,
Expand, or
Reject
Hypotheses
Gather
Data to test
Predictions Develop Testable Predictions
(If my hypothesis is correct
then I expect [benefit])
Pilot
Experiment
Measure
Embed

There’s one process that needs to be
transformed before all others

model.fit(X_train, y_train)
is actually the easiest part

Only 22% of companies are currently seeing
a significant return from data science
expenditures*
*Obligatory conference presentation quote from GartnerForresterMcKinsey Consulting. Sorry.

Multiple challenges in the process of turning
data into value on existing infrastructure
Business
Problem
Evaluate
available
data
Request
Data Access
from IT
Request
Compute
Resources
from IT
Negotiate
with IT for
requested
resources
Wait for
resources to
be
provisioned
Install
Languages
and tools
Configure
connectivity,
Access and
security
RAM/CPU
Availability,
scaling,
monitoring
Request
network
Config
Change
Request to
install
another
package
Model
building
Compose
PowerPoint
to share
results
Edit
Confluence
to document
work
Negotiate with
business
stakeholder
on
deployment
timeline
Wait for Data
Engineering to
implement the
model
Test Newly
implemented
model to
ensure valid
results
Request
Modifications
to model due
to unexpected
results
Release model
to production
and schedule
Document
release notes
and
deployment
steps
Prepare for
change
management

Data Science trapped in laptops

Thinking real-life Data Science is a Kaggle competition

Treating Data Science as a Death Star
Technology Project

DATA
STORAGE
Cloud File Storage Distributed File System
NoSQL DB RDBMS
COMPUTE
INFRASTRUCTURE
ResourceManagement/Monitoring/Auditing
Scheduling
ProjectandDataGovernance
DataEngineering
Distributed
SQLQuery
Engine
Distributed
Compute
Framework
Compute
Instances
Coding
Workspace &
Language
Libraries
Output
Files
ANALYTICS
LAYER Machine
Learning
libraries
Data
Visualisation
libraries
BI Tools
Interactive
dashboards/
Web Apps
Security/IdentityAccesscontrol
APIs
Data Prep/Exploration tools
Summary Analysis, Analysis of Experiments, Segmentation,
Machine Learning, Data Matching
Revision/Deployment Tools
Interactive
Dashboards/
Web App
development
Applications
(Business Layer) Insight
Marketing
Optimisation
External Data
Products
Internal
Reporting
Website
Optimisation
Commercial
Optimisation
Production Code
DevOps/Infrastructure
DBAs
ETL
DQM
MetadataManagement
Agile Data Science does not solve tech complexity problem
Container
Service
Resource Vertical requirementsDATA PRODUCTS
(Presentation/
Service Layer)
Deployment,OrchestrationandScaling
ConfigurationManagement
RevisionControl
KnowledgeManagement
DataScientists
DATA SOURCES
Stream
Processing
Framework

Insight does not scale!.
Using data to generate ad hoc Decision
Support Insight INSTEAD OF ACTION

Money is wasted
Time is wasted
Talent is wasted

Collaboration is key
Shared Buy-in from Senior management
Organizational behavior structured around the
ideal data-journey model
Shared Priorities
Shared Trust in data
Shared Rewards based on measured outcomes,
not outputs

Test &
Collect
Model Embed Roll Out
Feedback
Plan
Pilot test
Collect Data
Build Model,
Identify segments
Adjust model to fit
organisation
Re-engineer business
processes to support
segmented execution
Train organisation
Creation of fast feedback loop

Data cycles are measured to
eliminate bottlenecks

Shortened Data Cycles to be Agile
Data Engineering
Dev Ops/Infrastructure
DB Management
Cloud File
Storage
Distributed
File System
NoSQL DB
RDBMS
Distributed
SQLQuery Engine
Distributed
Compute
Framework
Compute
Instance
Container
Service
Data Prep/
Exploration
tools
Coding
Workspace
&
Language
Libraries
Machine
Learning
Data
Visualisation
Interactive
Dashboards/
Web App
development
Version/
Deployment
Tool
Output
Files
BI Tools
Interactive
dashboard
s/Web
Apps
APIs
Knowledge Management
Security/Identity Access control
Revision Control
Configuration Management
Orchestration and scaling
Project and Data Governance
Scheduling
Resource Management/Monitoring/Auditing
ETL
DQM
Data Scientists
Epic
Customer
Feedback & Iteration
Data
Product
Strategy
Story
Stream
Processing
Data Sources

Agile
Practice
DevOps
Culture
Lean
Thinking
We had
accidentally
stumbled on
DataOps
Data
Analytics

DataOps was popularised by Andy
Palmer in a 2015 Blog post

DataOps is an independent approach to data analytics
Data Analytics team
moves at lightening speed
using highly optimized
tools and processes
across the whole data
lifecycle
Agile Collaboration to
break down silos and work
on “The Right Things” that
add value
Lean Manufacturing like
focus on eliminating waste
& bottlenecks, improving
quality, monitoring and
control
Iterative project management
Continuous delivery
Automated test and deployment
Monitoring
Self-serve
Quality
Governance
Organisational alignment
Ease of use PredictabilityReproducibility
Strategic Objectives

Further steps to
Trust
DevOps
Reproducibility
implementation
Self-serve
Organisation

Why do we have brakes on a car?

Accept the delivery pipeline is governed by
rules and constraints

Trust part 1: Make the “What you do to data”
people in the organisation happy
Identity and
Access
Management
Custom role
permissions
Audit trail
logs
Data Loss
Prevention
Encryption
of Data at
Rest
Encryption
of Data in
Motion
Resource
Monitoring
Firewall
rules
Resource
and
Object
Isolation
Penetration
Testing
Code
Encryption
and
Backup
Segregation
of Duties
Authorisation
protocols
Data
Access and
Privacy
Policy
Metadata
Management
Data Lineage
Tracking
Data
Stewards
and
Owners

Trust part 2: Make the “What you do with
data” people in the organisation happy
Data
Quality
Testing
Transformation
Testing
End-User
Testing
ETL
Integration
Testing
Metadata
Testing
Data
Completeness
Testing
ETL
Regression
Testing
Incremental
ETL Testing
Reference
Data
Testing
ETL
Performance
Testing

Automated reproducibility is a must

For consistently reproducible computational
environments

Continuous Integration: Commit Code Regularly
Data Cleaning Master
Data Cleaning
Dev Branch
Feature Extraction Dev
Feature Extraction
Master
Model Train Master
Model Train Dev Branch
Machine Learning Pipeline
Product Development (e.g. App, Website, Marketing system, Operational System, Dashboard, etc.)

Run tests and review code
(please integrate safely)

Continuous Delivery and Beyond:
Accelerating Deployment
Dev Integration testApplication test Acceptance test Production
Continuous Integration
Dev Integration testApplication test
Continuous Delivery
Dev Integration testApplication test Acceptance test Production
Continuous Deployment
Automated
Manual

Chemistry is not about tubes
DataOps is not about tools
(but the right ones help)

Align your spine
Needs
Principles
Practices
Tools
Values
How do you know it is the best
possible tool?
How do you know that
the Practices actively help the
system?
How do you know
which Principles you want to
apply?
“We use _____ to get our work done”
“We DO Self-Service and DataOps to
continuously create VALUE for the
customer and business”
We LEVERAGE Agile and Lean
PRINCIPLES to change the system and
make sure resources work on the right
thing
We OPTIMISE for Speed, Accuracy,
Experimentation/Feedback and Security.
We are here to SATISFY THE NEED to
help customers save money and the
business to execute it’s strategy
It all starts at Needs. Why does
this system exist in the first place?
Source: Kevin Trethewey, Danie Roux, Joanne Perold

Avoid building your own anything or
being on the bleeding edge.
Cost of Delay is high.

Data Scientists need a way to manage their projects end-to-
end with self-service data AND ARCHITECTURE
Business
Problem
Evaluate
available
data
Request
Data Access
from IT
Request
Compute
Resources
from IT
Negotiate
with IT for
requested
resources
Wait for
resources to
be
provisioned
Install
Languages
and tools
Configure
connectivity,
Access and
security
RAM/CPU
Availability,
scaling,
monitoring
Request
network
Config
Change
Request to
install
another
package
Model
building
Compose
PowerPoint
to share
results
Edit
Confluence
to document
work
Negotiate with
business
stakeholder
on
deployment
timeline
Wait for Data
Engineering to
implement the
model
Test Newly
implemented
model to
ensure valid
results
Request
Modifications
to model due
to unexpected
results
Release model
to production
and schedule
Document
release notes
and
deployment
steps
Prepare for
change
management

Modern serverless and managed
infrastructure makes it easy to create
data products just bring code and data
A single unified platform reduces data
fragmentation, overcomes business silos
and helps enforce consistent governance

Data Science Platforms add further self-serve
capabilities
Data Access, Prep
and Exploration
Jupyter, Rstudio,
Zeppelin, etc.
Automation and
Machine Learning
Run experiments,
track and compare
results
Delivery and Model
Management
Publish APIs, Interactive
web apps Schedule
reports
Collaboration and Version Control
Discover, discuss and build on existing work
Compute Environment Library
Customised software stack
Compute Grid
Orchestrate hardware for development and deployment
Source: Domino Data Labs

The market for platforms is exploding

Data
Scientist
Data
Analyst
Data
Engineer
Self-serve enables reduced DataOps roles
ETL
Quality Testing
Descriptive
Analytics
Advanced
Analytics
BI
Dev
Ops
IT
Ops
DBAs
X X
X
Business
Stakeholders
Operations
Sys
adminX
Developers
ML
Product Managers

Implement AI: Actionable Intelligence

#1 Align with the Organisation
through Agile Collaboration

Find the FASTEST, CHEAPEST path between data and consumers
#2 Eliminate wasted effort
The Optimist The Pessimist The Lean Thinker
THE GLASS IS
HALF FULL
THE GLASS IS
HALF EMPTY
WHY IS THE GLASS
TWICE AS BIG AS IT
SHOULD BE?

#3 Deliver
Products
not
Projects
Prioritize solutions that fit into a DataOps workflow over others

#4 Build a measurement and feedback
culture

#5 Embrace
Development
best practise in
Data Science
Version Control, Configuration
management, Continuous Integration,
Continuous Operations

#6 KEEP CALM
AND
BUILD TRUST IN DATA
Put Effective Data Governance, Security and Testing in place

#7 Invest in tools and process to reduce
bottlenecks and increase quality
Managed Infrastructure and Serverless Cloud,
Automation and Data Science Platforms

#8 Decentralise Self-service analytics
AND cloud infrastructure

#9 Organise around the ideal data
journey instead of teams
Fewer roles, more end-to-end ownership, less friction
Store Share UseManageAcquire Process
Data Engineering
Data Scientists
Data Analysts
Business Stakeholders

#9.5
Optimise
data cycles
for…
SPEED!

Data Science Today
Customer
Data
?
Hamster
wheel
Analytics
The
Roadblock
The Aimless
crash and
burn
The “So What happened?”
The “We did it
once, why doesn’t
it work again?”

The DataOps Data Science Factory
Epic
Customer
Data
Product
Strategy
Story
Data
Rest of
Business Analytics
Agile Collaboration
Data Governance
Automated testing
Value Measurement
Version Control
Self-Serve Infrastructure
Automation
Continuous Integration

// Harvinder Atwal // Web
var current: {
companyName : "MoneySuperMarket",
position : "Head of Data Strategy"
+ " and Advanced Analytics"
};
var previous1: {
companyName : "Dunnhumby",
position : "Insight Director,"
+ " Tesco Clubcard"
};
var previous2: {
companyName : "Lloyds Banking Group",
position : "Senior Manager"
};
var previous3: {
companyName : "British Airways",
position : "Senior Operational Research Analyst"
};
{"about" : "me"}
var username = "harvindersatwal";
var linkedIn = "/in/" + username;
var twitter = "@" + username;
var email = username + "@gmail.com";

Data Leaders Summit Barcelona 2018

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Data Leaders Summit Barcelona 2018

Similaire à Data Leaders Summit Barcelona 2018 (20)

Dernier

Dernier (20)

Data Leaders Summit Barcelona 2018

Notes de l'éditeur