Snowplow is at the core of everything we do

Presentation authored by Simon Rumble covering the journey that Bauer Media Australia have gone through implementing Snowplow, and the central role Snowplow now plays in their data strategy / products.

  1. 1. Snowplow drives everything we do What and why?
  2. 2. Digital and print publisher Family-owned German company 116 sites across Australia and New Zealand Tag management across all sites Bauer Media
  3. 3. Just start collecting Snowplow data collection in 2014 We didn’t really have a use case
  4. 4. Stuff we record Page views Metadata around content User logins Email click-throughs Ad impressions
  5. 5. Use cases started showing up Cross-site integrated reporting Ad hoc tricky analysis Sanity checking industry audience reporting Stalking individual users Audience overlaps
  6. 6. User behaviour Ad impressions Content metadata Trending service Recommendations Dashboards Ad hoc analysis
  7. 7. Some things you can’t do in GA Tag-based reporting Accurate reporting of in-app Facebook using user-agent contains FBAN
  8. 8. We’re using Snowplow 0.9.2 from 2014-04-29! It just works We’ve been busy building other stuff
  9. 9. But... Page pings is b0rken: no time spent or scroll depth (Out-of-the-box) browser categorisation is terrible Hourly batches are a bit higher latency than we’d like No context shredding, but JSON queries are performant enough
  10. 10. runSnowPlow.sh Web page (JavaScript in page creates image beacon) S3 Cloudfront SnowCannon (Node app in Elastic Beanstalk) Redirects to Writes logs to ETL (Elastic Map Reduce) S3 events (Redshift) events_temp (Redshift) x_events (Redshift)
  11. 11. Tips Redshift can get very expensive very quickly Decent dashboarding platforms are rare And plenty of crap ones are overpriced Just tip everything in and worry about what you’ll do later
  12. 12. What’s next?
  13. 13. Future plans Upgrade ETL to real-time: probably our own solution Time spent and scroll depth Shredding?