19-05-2016 Page 4
Who are we?
SDU is a publisher that supplies current information on law and regulations to
lawyers, tax experts, policy makers and other legal professionals
Traditional company in transition
We believe in creating content / product to the wishes of our customers , because
progress is different for everybody
Both off- and online content/products
19-05-2016 Page 5
Why did we want this?
• Ownership data
• Open generic tools (no vendor
• Ability to give support internally
And not be reliable on external
• Improving customer journey
• Insights in product use
• Future wish: reacting realtime to
triggers in market
• Insights in Acquisition –
development – retention – winback
• Ask and answer business
• Integration of customer behavior in
• Integration offline and online.
• In depth analytical possibilities on
top of google analytics
• Optimal mix of advertising budget
19-05-2016 Page 6
What steps did we take?
Proof in use
19-05-2016 Page 7
• Implementing Snowplow in the cloud
• Implementing Apache Spark in the cloud
• Incloud database with all the captured data
• Alignment with Google Universal
Delivering the Intelligence Platform:
Snowplow + Spark
19-05-2016 Page 9
The Delivered Intelligence Platform – Alignment with Google Universal
Intelligence platform - Snowplow / Spark
• Unlimited external data
• Advanced reporting through tools
• Advanced Machine Learning options
• Customer id + fingerprint + IP
• Full export options
• Limited external data
• Slice and dice in frontend user system
• No machine learning options
• Upload a customer id in a dimension
• Limited export options
19-05-2016 Page 10
Planning 6 weeks Proof Of Concept (POC)
•First (generic) tags and triggers in GTM
•Second batch of tags and triggers in GTM
•Test of the snowplow data and first EDA
•Implementation of Databricks / Spark
•Setting the connection to Snowplow S3 and Redshift
•Start of use cases
•Finalization of use cases
• Budget calculations for future tools (with cloud computing not so straightforward)
•Wrap up project
19-05-2016 Page 11
What were our Technical learnings / findings
Security certifications in AWS
IT expertise with experience in
network and AWS
Complex Google Analytics
Completeness of the tracking
Combining off- and online data
Account structure in AWS
Using multiple accounts good
for governance, more complex
in use (whitelisting IP)
Data collection through GTM (=
browser side) is not 100%
complete. Neither is GA.
Implement key in datalayer.
You need web developers
Either start with clean
implementation, or plan
19-05-2016 Page 13
Use Case 1: The Correlation Between Site Visits and Products Put in the Basket
• Products (below, right) are visited frequently,
but are not often added to the basket.
• Products (upper left) are not frequently visited,
but are often added to the basket
• Is the price of some products too high or too
• Are pages difficult to find?
• Is there a difference between our high valued
customers vs low valued customers?
19-05-2016 Page 14
Use Case 2: Most Frequently Visited Service Pages
• Top 10 of webpages related to service
• The top (detailed) service webpage is
‘abonnement-opzeggen’ (cancel subscription)
• 75% (57% + 19%) of the sessions that visit this
page, continues to the cancellation form.
• In 25% of the sessions the customer uses
another form, i.e. the general contact form
(instead of or on top of the cancellation form)
• Cancellations reach Sdu not in different ways.
Are the forms processed similarly?
Contact No 19% 57%
form Yes 5% 19%
19-05-2016 Page 15
Use Case 3: Search Pages
• 6 Distinct clusters, of which ‘zoekers’
(searchers) is a small group with relatively high
• What can we do to leverage the relatively large
group of visits with no revenue that visits
predominantly in the evening? Are these
private people visiting our site?
• Hypothesis: the searchers have a need for a
specific product. Further research and a/b
testing is advised; specifically on search.
19-05-2016 Page 17
How are we organized for Snowplow?
and R and
- Google Tag
- Alignment with
19-05-2016 Page 18
Which are the next steps for Sdu?
• Duplicates: create a script to deduplicate current and future records.
• Implement server-side tracker as a solution to prevent missing web shop transactions.
• Assess low-cost alternative to the use of the Redshift database (AWS) for the long term.
• Structural solution for security Redshift database (whitelisting IP address of Databricks cluster)
Technical next steps
• Determining KPI’s
• Measuring product use
• Analysing data and determine next action
Supporting lean startup
• Answer Business questions on customer behaviour
• Answer questions not asked
• Tracking product use