Winning the right to deploy AI: Dedication to craft, designing the right experiments, and creating leverage at scale

Winning the Right to
Deploy AI
Joshua Mabry
April 2024
INFORMS Analytics

• New AI venture
creation for financial
services company
• Personalized loyalty
offer optimization for an
international restaurant
company
Recent Projects
About Me: Bain & Company | Data Science & Machine Learning Engineering
Joshua Mabry
Sr. Director
AMERS | Palo Alto
12+ years of experience
in machine learning and
statistical modeling,
focused on
experimentation
and development of
custom software
solutions
Deep
technology and
science
• Camping, Hiking
• Basketball
• Retail Marketing /
Personalization / Loyalty
• Reinforcement Learning
• Experimental Design /
Bayesian Modeling
• Data Engineering / MLOps
Ask me about
Expertise
Summary Interests

On a scale of 1-3,
How mature are the AI capabilities in your organization?
Comfortable
as users /
consumers
of GPT tools
Developing
solutions based
on APIs /
existing models
Aligning
multi-modal
models
3
1 2

Agenda
Designing the
right experiments
Dedication to
craft
Creating leverage at
scale
Winning
the Right
to Deploy
AI
DISCLAIMER: The views presented here represent my own and not those of my employer, Bain & Company, Inc.
1 2
3

Enterprise leaders
see AI changing the
basis of competition
ahead
Source: 2023 Bain AI Survey
think AI will
upend
competition
in terms of
core product
different-
iation
think AI is
changing rules
of the game for
customer
engagement
and business
models
70-75%
think AI will
significantly
disrupt the
cost structure
in their
industries
65-70% 60-65%
Customer Cost Product

Becoming a deep expert on AI / ML while working in industry is
fundamentally challenging…
… because it requires us as
practitioners to repeatedly cross the
Valley of Death
Source: https://www.ideatovalue.com/inno/nickskillicorn/2021/05/the-innovation-valley-of-death/
I N N O VAT I O N VA L L E Y O F D E AT H
Level of
investment
Software
Maturity
Proof of Concept Minimum Viable Product Production
Valley of
Death
M I N I M U M I N V E S T M E N T L E V E L T O
S U R V I V E
Public Funding
R&D, Universities
Private Funding
For-profit companies

Delivering AI use cases depends as much on “traditional” tech
and data capabilities as new ones
Large scalability demands
Orchestration challenges
Privacy and safety constraints
Cross-functional collaboration
Emphasis on experimentation and
measured risk-taking
Building a portfolio of bets needed
to realize full value and stay
relevant in fast-moving ecoystem
R E Q U I R E D S K I L L S &
W AY S O F W O R K I N G
N E W T E C H N I C A L
R E Q U I R E M E N T S
D
E
G
R
E
E
O
F
D
I
F
F
I
C
U
L
T
Y
L E V E L O F A M B I T I O N
N E W
C H A L L E N G E S
E X I S T I N G C H A L L E N G E S
NEW CHALLENGES
B E F O R E
G E N E R AT I V E
A I
W I T H
G E N E R AT I V E
A I

The ideal
Product
Manager
Scrum
Master
Tech Lead
Specialized
functions
Engineers
Experience
Designer
Reality (at times)
• Well-
defined
products
• Dedicated
team
• Complem
entary
skillsets
across the
squad
Tech Lead
Specialized
functions
Engineers
Experience
Designer
Working as part of a squad – What Agile often means
• Ill-defined
products
• Not fully
dedicated to
the role
• Missing
skillsets

Working as part of a squad – Building a T-shaped skillset
My progression in industry
• What has slowed me down
– NO CONFIDENCE in my
capabilities
– NO NEED within my team
– NO RESOURCES within my
company
• What has sped me up
– PREFERRED LEARNING
STYLE
– TECHNICAL MENTORSHIP
within my teams
– SERVANT LEADERSHIP in my
company

Building a bench of advisors – Adding to my deep spike of
expertise
• Our team was
actively
supporting a
broad range of
personalized
customer
marketing
projects
• Recruited early-
career
researchers with
capacity to help
us
• Helped us design marketing
algorithms at scale
• Gave a talk to about 50
clients to demystify AI
• Delivered an internal
seminar series
• Recruited new
tech talent from
their research
groups
• Opportunity to
learn from leaders
in the fields of
reinforcement
learning,
experimentation
and casual
inference.
C O N T E X T C O N T R I B U T I O N I M PA C T
Zhimei Ren
Asst.
Professor, The
Wharton
School,
University of
Pennsylvania
Kevin Jamieson
Assistant
Professor, Allen
School of
Computer
Science at
University of
Washington
Zhengyuan
Zhou
Assistant
Professor, NYU
Stern School of
Business – Dept.
of Technology
C O N T E X T C O N T R I B U T I O N I M P A C T

S o lu tio n
n e e d s a
GU I to
s e ll!
R e b u ild
R a p id
P ro to typ in g
P re ma tu re
P ro d u c tiza tio n
Sources: https://www.flickr.com/photos/interactivemark/15033569833
Automation assessment
SAN
Proposed architecture: We can ”go live” with active learning for crack and defe
detection with minimal disruption to established workflows
Client CO
Lessons learned developing MVP solutions – GUIs are great…and
terrible
C H U R N !

C h a s in g
U p lift
F a llb a c k
P a tte rn s
S h in y Ob je c t
S yn d ro me
Z o o o f
C o mp le x
A P Is a n d
Ove rfit
Mo d e ls
C H U R N !
Sources: https://facebook.github.io/prophet/; https://github.com/VowpalWabbit/vowpal_wabbit
Lessons learned developing MVP solutions – Our love of libraries

P ro b le m-
S o lu tio n F it
A c h ie ve d !
H a n d o ff
F e a r,
U n c e rta in ty
a n d D o u b t
F irs t-
P rin c ip le s
Mo d e l
D e ve lo p e d
12
200317 TFR steerco v7
DAL
Productizing MVT tools and addressing limitations of Group Balance solver would
require resources into August
Note: Assumes adding min / max functionality to Group Balance tool, updating MVT tooling as estimated for Tim Horton’s / Subway cases
Application Feature Timeline
I L L U S T R A T I V E
Lessons learned developing MVP solutions – Needing first-
principles
C H U R N !

Leaders embrace and celebrate experimentation
Source: Bain “Marketing Leaders & Laggards” Study; Bain Project Experience; Lit Search
Dedicated testing platform and
testing teams enabling
2,000+ tests per year
1,000+ tests per year – every
product, UI layout or marketing
campaign thoroughly A/B tested
yielding up to 30% more impact
Global systematic Marketing
experimentation program with
~30 ongoing marketing tests per
quarter per key market globally
Our success at
Amazon is a function of
how many tests we do
per year, per month,
per week, per day
“
“
Jeff Bezos
Our success comes
from building a
system to test many
ideas fast and scale
the few good ones
“
“
Marc Randolph
“Media Mix Modeling for
us is a bit misleading,
particularly what is the
ROI I‘m going to get from
this campaign. The right
model to measure
incrementality is
“Test & Learn
“
“
Simon Peel
Global Head of Media

Meet the cast at Traditional Retail Co.
Mark the Marketer
Mark takes orders from Fred. He is excited about experimentation and ML, but he knows
it will be his head on a platter if anything goes wrong.
Donna the Data Scientist
Donna is formally trained in statistics/mathematics/physics and has a deep
understanding of ML algorithms. She can also build cloud infrastructure, having worked
at a startup.
Fred the Founder
Spends summers in France and winters in Jackson Hole. Enjoys reviewing marketing
copy in his spare time and has an intuitive feel for protecting profit margins

The start to a typical month in retail marketing
Black Friday is upon us.
Snowpack is thin at the ski
cabin and I want to be in the
loop on EVERY marketing
decision this next month!
Of course! There are many things we
can try here. Let’s do all of them!
We will email / text our customers
new content EVERY day!
20% off for new customers
Free shipping for all
Facebook campaign for
overstocked items
Fred Mark

3 weeks later…
Hey Donna! Weekly sales are down
despite throwing EVERYTHING at our
customers.
That is so disappointing….
Did you create holdout groups for
all your marketing campaigns? Do
you have a variety of offers in-field
so we can back out price
sensitivity?
Weekly sales
Donna
Mark

3 weeks later…
No, I did not create holdout groups or
experiment with different offers. But Fred
has a friend at Davos raving about
personalization and is sure it will work.
Let’s begin doing 1:1 marketing…now!
Let’s go! I have propensity models,
churn models, and generative AI at
the ready! But about those holdout
groups…
Donna
Mark

Hitting the barriers to effective experimentation
Not being able to test something
directly is common in science.
Let me think how we leverage
those same techniques here.
I’m not interested in holdout groups! I’m
interested in personalization! I just can’t
approve any experiments that might
alienate even one of my loyal customers…
Simulations
Causal inference
Logging and measurement
Off-policy evaluation
Fred Donna Mark

Finding the right place to start
So Donna, Fred has given me
a HUGE budget to build this
personalization capability.
What should we do?
Mark, let’s figure out how to build better
datasets, deeper insights, stronger hypotheses
and then test those hypotheses in silico before
we go back to Fred and ask to experiment with
HIS CUSTOMERS.
Once he’s feeling bought in, I think we should run
a LOT of experiments.
Mark Donna

Where to start: Reaching your known customers more effectively
V I S I O N J O U R N E Y
Email 1
Purchase event
Transact Retain
Customer
touchpoint
Convert
Click events, offers, content
metadata
Facebook Campaign
Email 2
Click events, offers, content
metadata
Reach metrics, content
metadata
Measure success
of app-signup
campaigns across
email and paid
channels for known
customers
App Download
User ID
Measure
Paid digital
Web/App
In-store
E-mail
Data
Marketing Channels

Where to start – Focus tech investments first on measurement
Paid Media
Owned Media
Delivery
Layer Website Email App push Social media Search YouTube
SMS
Analytics
Layer
Intelligence
UI / Dashboards
Data Science / Machine learning
Query engine
Propensity modeling Recommendation engine
Consumption
Layer
Measurement Experimentation
Site analytics
Attribution
Experiment logging
UI / Dashboards for tests
CRM
Data
Acquisition
Layer
Sources
Branch Product Transaction Behavior ID resolution Web/app Tags
Activation
Layer
Marketing automation
Convert / Retain Attract
Dynamic Creative Optimization (DCO)
Demand-side Platform (DSP)
Customer data
Data
Management
Layer
Content
Customer Data Platform (CDP)
Website content
Digital Asset Manager (DAM)
Template management
R O A D M A P M A R T E C H

We can take a “crawl-walk-run” approach to targeting and
measurement
Single-Channel
A/B Tests
Segment-Level 1:1

Basic Building Block – Uplift models
Source: https://arxiv.org/pdf/1908.05372.pdf; https://albahnsen.com/2020/04/28/uplift-modeling/
Integration with tech stack
How uplift models typically work

Advanced Approach - Simulating individual customers interacting with AI
agents
Performance of different agents
RetailSynth customer choice model
• Four-stage model for simulating customer
shopping behavior in RetailSynth
• Calibrated on publicly available grocery data to
create realistic synthetic shopping transactions
• Models price sensitivity across wide range of
customer types and products
• Reinforcement learning agents were designed to
target coupons to customers
• Without extensive tuning, contextual bandit agents
outperformed deep learning agents
• Simulations captured impact on short-term
revenue and long-term retention
Source: https://arxiv.org/abs/2312.14095

After making investments in measurement and experimentation
Donna, I am so thankful you stopped me from
launching yet another series of high-cost, low ROI
marketing campaigns! Our new measurement
capabilities have blown Fred’s diamond-encrusted
ski boots off! He is SO READY to start launching
experiments at scale now!
Mark, the months spent configuring logging
services and databases was well worth it! We are
in an amazing place to measure the value of our
soon-to-be launched generative AI content and
multi-agent optimization systems! And we have
the data we need to run simulations and off-policy
optimization now too!
Mark Donna

Beyond foundation models, multiple other technology elements
are required to deliver Generative AI applications
Many of you have / will find yourself contributing to the development of
AI platforms

Test of Time Award – Steve Yegge’s 2011 Google Platforms Rant
* Landing page
for platform I
helped build in
2019*
“The problem is that we are trying to predict
what people want and deliver it for them.
You can’t do that. Not really. Not reliably.
There have been precious few people in the
world, over the entire history of computing,
who have been able to do it reliably.”
Steve Yegge, Google, 2011
Source: https://courses.cs.washington.edu/courses/cse452/23wi/papers/yegge-platform-rant.html

There is high potential for tech re-
use at multiple layers
In the early stages of a new tech
space (AI), learning will often
happen “at the edge”
Many key AI
frameworks/components are open
source and contributing back
“builds the muscles”
Foundation Team
UC5
UC6
UC1
UC2
UC3
UC4
BU4
BU1
BU2
BU3
Innersource is a natural fit for enterprise-wide AI
L E A R N I N G AT T H E E D G E
Source: Source: https://aws.amazon.com/blogs/devops/building-an-innersource-ecosystem-using-aws-devops-tools/
Teams across the business
Roles within teams

Enterprise ecosystem Measuring success
Innersource requires alignment across the organization
Source: Source: https://aws.amazon.com/blogs/devops/building-an-innersource-ecosystem-using-aws-devops-tools/
I N N E R S O U R C E F O U N D AT I O N
Example
source code
contribution
history for
successful
innersource
library
Firm needs to follow typical open-
source software delivery process
and leverage DevOps technologies

Rededicate yourself to
your craft
“Learn & Test” > ”Test
& Learn”
Scale like open-source
The
on Winning the Right to Deploy AI
Main Messages
T-shaped skillsets are as
relevant as ever for
developing analytical
applications
Experiments carry risk and
so ensure you have the
requisite measurement
capabilities before scaling
Learnings from AI projects often
best captured by an innersource
strategy

Winning the right to deploy AI: Dedication to craft, designing the right experiments, and creating leverage at scale

Recommandé

Recommandé

Contenu connexe

Similaire à Winning the right to deploy AI: Dedication to craft, designing the right experiments, and creating leverage at scale

Similaire à Winning the right to deploy AI: Dedication to craft, designing the right experiments, and creating leverage at scale (20)

Dernier

Dernier (20)

Winning the right to deploy AI: Dedication to craft, designing the right experiments, and creating leverage at scale