Abigail is Principal Analyst at uSwitch. She has a background in probability and statistics and a PhD in Queueing Theory from Imperial College London. At uSwitch, she has focused on using statistical and machine learning techniques for descriptive analytics and modelling and understanding customer behaviour. Abigail is passionate about encouraging an understanding of uncertainty in both big and small data.
3. The Attribution Problem
• Customers are coming to the site from lots of different channels: SEO,
PPC, direct, social, banner ads etc.
• Sometimes they come and browse and sometimes they buy.
• How do you work out how much each advertising stream contributes to the
purchase.
• Basic models – last click, first click, even distribution etc.
5. Theoretical Solutions
• Top-Down Approach
• Gather customer journey information as a series of visits and
purchases.
• Shapley Value
• Beta Distribution
• Survival Analysis
6. Shapley Value
• Measure used in Game Theory, but can be used in investment problems of
cost distribution, measuring player impact in a football team etc.
• A coalition is created between a number of contributing actors.
• When the last actor joins the coalition, the aim is reached, but the last
actor should not receive the whole reward as it could not be achieved
without the other members.
• Some actors contribute more than other’s and need to be weighted
accordingly.
• A measure is used (designed by Lloyd Shapley) to fairly distribute reward
based on contribution.
• Requires details of failures as well as successes.
7. Beta Distribution
• Bath tub shaped
distribution.
• Wooff, D. A. & Anderson,
J. M. (2013). Time-
weighted multi-touch
attribution and channel
relevance in the
customer journey to
online purchase. Journal
of Statistical Theory and
Practice
8. Survival Analysis
• Survival analysis models ‘time to event’
• The ‘event’ in marketing terms is a conversion.
• Using a time measurement is a useful addition.
• The ‘time’ can be defined as each marketing channel.
Time_1 Time_2 Time_3 Event
Time_1 Time_2
9. • Other academic literature on the topic (eg using HMMs), but usually specific to
a particular (unavailable) dataset.
• Not always easy to get hold of all the information required for your own data
• at what point do you decide a customer is not a purchaser? 1 week, 1
month, 1 year?
• How do you correctly attach sessions to attribution methods?
Practical Big Data issues
10. • Use data provided by external sources eg Google Analytics, they answer
these questions for you.
• You don’t know what their answers are.
• Only provides purchase figures not conversion rates.
• Need GA Premium and BigQuery to approach a solution.
• Number of ‘solutions’ provided by external sources e.g. Google Analytics
‘Data Driven Attribution Model’ and ‘Model Explorer’.
• Issues with ‘Blackbox’ algorithm.
Outsource?
11. More problems…
• Multi-device
• A customer can exist as multiple personas across numerous devices.
• Offline advertising
• How do you add a measure of the impact of ‘hidden’ channels (eg TV advertising)
into the attribution journey.
• Correlation/Causation – Inherent bias from data-driven result
• Does a channel perform well because it already has a high spend?
Communicating uncertainty
• Is the data big enough? How much can be attributed to randomness?
• Does every purchase have equal worth? Unlikely…
• Some customers buy higher value items, or multiple items in one session.
14. • Like the Attribution Problem, Fermat’s Last Theorem is easily
understood.
• I have discovered a truly marvellous proof of this, which this
margin is too narrow to contain – Pierre de Fermat, 1637
• Solved by Andrew Wiles in 1994.
• BUT the work done by other mathematicians who failed to
solve it in the intervening 357 years, not only helped towards
the solution, but also led to separate useful mathematical
discoveries.
• Approached the problem by starting proving simple cases,
n=3, n=4, n=5, then extending for all primes.
• Consider the building blocks before going for a general rule.
Could Attribution be Data Science’s Fermat’s Last Theorem?
15. Look at the building blocks
• Each marketing channel is often made up of a diverse and fluctuating set
of materials
• Paid and organic positions and keywords
• Email content and strategy
• Need to take into account inherent details of each channel and how they
interact with each other
• Dependencies and conditionality
• Use Survival analysis to investigate time as a parameter.
• Are there purchasing differences for different channels? Which customer
journeys produce the most engaged customers?
• What is special about YOUR data? Not always easy to uniformly apply
methods to very different data sets.
16. Conclusion
• There exist simple methods to provide a top down overview of your
attribution paths and contributors. But,
• Are they too general?
• Can you prepare your data well enough?
• How much are they being biased by the existing data, rather than
showing underlying truths?
• Or, you can start at the bottom and study the complexities and
dependencies of your customer journeys. But,
• When do you stop?
• Find the balance!
17. Title: Open Sans 100 px
• Subtitle: Open Sans 48 px
@A_Lebrecht
www.uswitch.com/tech
abigail.lebrecht@uswitch.com