Scott Burger presents on attribution modelling using multi-touch attribution and Markov chains. The presentation covers generating user journey data from Google Analytics in BigQuery, developing Markov models in R, and visualizing results in Tableau. Multi-touch attribution helps share credit across channels in a customer's journey rather than assigning all credit to the last touchpoint.
2. Agenda
• Intro to multi-touch attribution
• End to end modelling approach
• Handling big data in Google Bigquery
• Markov chain-based modelling in R
• Model outputs and insights
3. About Me
• WWU Physics 2010
• UCL Astrophyics 2012
• Data Scientist at Microsoft, Tableau
• Author of Intro to Machine Learning with R
• Race bikes for LiquidVelo
• Blog at http://svburger.com
• (slides available here)
4. Motivation for Attribution – the Customer Journey
• Goal: how much of our total budget
per channel?
• Problem: last touch model gives too
much credit to final channel
• Solution: multi-touch attribution and
sharing the credit!
Touch 1 Touch 2 Touch 3 Convert
Channel A Channel A Channel B
Channel A
Channel B
Budget (LT) Budget (MTA)
0%
100%
66%
33%
5. Modelling Process Overview
• Develop user journeys via
Google Analytics data in
Google BigQuery
• Establish conversion rates
with unique journey paths
• Pass through Markov model
in R
• Push results from R back to
Google Bigquery
• Visualization in Tableau
Markov / other
Statistical models
Dashboarding
6. Google Bigquery Intro
• Just need a Google
account to get started
• https://console.cloud.
google.com/bigquery
7. Generating Hit-Level User Data: Table Sharding
Date Sharding
One day: select * from `project.ga_sessions_20190306`
One month: select * from `project.ga_sessions_201903*`
One year: select * from `project.ga_sessions_2019*`
All of it: select * from `project.ga_sessions_*`
8. Generating Hit-Level User Data: Table Sharding
Querying select days:
select * from `project.ga_sessions_*`
where _TABLE_SUFFIX between ‘20190301’ and ‘20190304’
Future-state bounded:
select * from `project.ga_sessions_*`
where _TABLE_SUFFIX between ‘20190101’ and ‘20191231’
This will query the daily tables when available
9. Generating Hit-Level User Data: Hit Unnesting
Querying nested table data
when using UNNEST(hits)
use:
hits.type = ‘PAGE’ for page hits
hits.type = ‘EVENT’ for page
interactions
project_name
10. Touchpoint Journey Path Examples
• String_agg() function in
GBQ to pivot journeys to
unique paths, then group-by
• Focus on 2+ journey
lengths
• If using impression data, be
cautious about it
overwhelming your journeys
• Channel uniqueness per
journey is another
interesting field of
investigation
• Data must be in this shape
(path, conversions, unique
11. Modelling Process Overview
• Attribution models available
in Google Analytics UI
• Documentation for the more
statistical models is lacking
• Heuristic (simple) models
can be done in SQL on GA
hit data. Complex models in
R.
12.
13. Modelling in R
• Packages “bigrquery”, “ChannelAttribution”
• Data stored as a table in Bigquery
• Data pulled in to R instance (bigrquery)
• Markov chain calculated (ChannelAttribution)
• Markov_model() in R
• Data pushed back to Google Bigquery as a table
VM Layer
14. Attribution Output
• R: channel credit based on
various models
• GBQ: applied fractional
credit to hit-level data
• Tableau: live visualizations
of actual attributed credit
15. Findings and Summary
• Multi-touch Markov model showed good results for 2+ touch journeys
• Credit shifted away from “Direct” channels to more paid channels (paid search, paid social)
• Cautionary tale of impressions: adding more types of impression channels will
bias results significantly in that direction and shift credit
• Big difference between viewable and measurable impression types
• Markov chain order: slight improvements with Markov chain length 2, diminishing
returns after length 3
16. More to Explore
• Google Analytics UI demo account:
https://analytics.google.com/analytics/web/?utm_source
=demoaccount&utm_medium=demoaccount&utm_cam
paign=demoaccount#/report-
home/a54516992w87479473p92320289
• Google Bigquery:
https://console.cloud.google.com/bigquery
• More on the R package ChannelAttribution:
https://towardsdatascience.com/multi-channel-
attribution-model-with-r-part-1-markov-chains-concept-
fdd964017626
• Slides and more at: https://svburger.com
18. Markov Chains – How do they work?
First Order Markov Chains :
• P_{n,n-1}=P(w_n|w_{n-1})
• The next touchpoint depends on the
previous touchpoint
x1 x2 x3
x1 x2 x3 x4
Second Order Markov Chains :
• P_{n,n-1}=P(w_n|w_{n-1}, w_{n-2})
• The next touchpoint depends on the
previous TWO touchpoints
20. Single Lead touchpoint Examples via Google
Analytics
• In this example, a user was exposed to numerous impressions and then finally
clicked to come to the main site
• Users will typically have a lot of impression touches in their journey