2. A lot is wrong with web analytics today…
• Focus on marketing-related analytics (visits, click-throughs, conversions)
Narrow • Focus on ecommerce sites. (Limited number of goals, limited set of clearly defined workflows e.g.
sign up to email, purchase product)
focused
• No analytics for SaaS based businesses, drivers of customer value, product analytics
• Hard to perform analyses on users / customers that span multiple visits
• Hard to examine the ways users actually engage on sites (esp. for SaaS / web apps), aggregate
Inflexible customer journeys
• Hard to map and segment users based on their behaviour and customer journeys
• Limited tools to pick out the root cause of differences in customer journey
Too high
level AND • Too high level: impractical or impossible to zoom in on individual customers and events
too low • Too low level: hard to see the wood for the trees in a sea of data / pre-defined views
level level
• Hard to integrate with other sources of customer data including CRM, email marketing, social
marketing, customer service, financial systems ad serving systems
Siloed • Typically separated from other business intelligence system, with each system used to answer
different types of business questions
3. …with bad consequences for businesses
Hard to export web analytics data to answer
Cannot answer important business questions
questions in other systems
• Questions related to the customer base • Two reasons to export our data:
– Who are our most valuable customers? – So that we can answer business questions using this
– How can I spot them in advance? data in another (more appropriate) system
– What are the “sliding doors” moments in a customer’s – So that we can use this data in other value generating
journey that impact their future value? ways e.g. drive product / content
– How does our customer base break down, by recommendation, service personalisation
behaviour? • Sometimes impossible,
– How well do I serve each segment? – Impossible to export granular data out of Google
– How well do I monetize each segment? Analytics
– Where are the best opportunities for growing the • Otherwise expensive
value of my customer base? – Enterprise web analytics products charge for export
• Product development questions based on data volumes, making export expensive for
large data sets
– How successful has each product iteration been at
driving user engagement? • Hard to house exported data
– Does our product work better for some customer – Web analytics systems generate big data volumes of
segments than others? If so, why? data, which can be costly to warehouse and query
– Does our product work better at some parts of the
customer journey than others? Where?
– Where should we focus product development efforts?
4. SnowPlow takes a radically new approach to web
analytics…
Traditional approach SnowPlow approach
1. What reports 1. What is all the
do we want available data
to deliver? that we could
ever want?
2. What data do 2. What tools will
we collect to empower our
support those analysts to
reports? answer any
possible biz Q?
5. …one that starts from the principal of having all the
data
Capture all data • All data is captured via easy-to-implement JavaScript tags
• Light-weight event tracking makes it easy to capture any type of online behaviour
• No limits on the number, type or categories of events or variables that can be assigned
• Data is stored in Amazon S3 for scalability
• Data can be enriched from other 1st and 3rd party sources. (Data can be exported and imported)
Complete data
ownership • Data capture is via 1st party cookies
• Javascript tracking and ETL source code is open source
• All data is stored in SnowPlow users’ own Amazon S3 accounts
Powerful
• Latest big data and cloud computing technologies for data storage and querying
analytics toolset
• Data is queried using Facebook-developed Apache Hive via Elastic MapReduce, making it easy to run
queries against enormous data sets
• Possible to run any big data analytics toolset (e.g. Mahout, Cascalog, Microstrategy) on SnowPlow data
6. To date, SnowPlow users can query data using Apache
Hive, which is great for analysts but bad for business users
Hive is a datawarehousing platform SnowPlow data is stored in a single Hive table
Built on top of Hadoop: scalable Each line of data represents one event (e.g.
Developed at Facebook, but now widely used page view, add-to-basket, video play, ad view
at e.g. Netflix, OpenX, The Globe and Mail. etc)
Enables analysts to query data using SQL Each line of data includes a user_id and visit_id
Pros Cons
• Easy for anyone with SQL knowledge to run queries • Command-line interface not suitable for many
• Straightforward to aggregate data business people
• Straightforward to ingest new data sources to • No in-built data visualisation capability. (Have to
enrich the web analytics data (e.g. CRM export data to a separate application)
data, media catalogues) • KPI dashboards can be driven from Hive
• Interactive UI allows for ad hoc query development analysis, but always require the integration of
sessions another application
• Straightforward to export aggregated data sets
into other tools
• Possible to schedule jobs to populate e.g. KPI
dashboard
7. Our priority now is to develop the toolset to answer
business questions using all this analytics data
SnowPlow web analytics data
Operational systems e.g.
KPIs and standard reports Ad hoc analytics
recommendation engines, marketing
• Enable analysts to easily create and • Enable analysts with more limited • Use SnowPlow data in live systems
distribute KPI dashboards and SQL and programming knowledge e.g. in-store product
reports including on customer to query data e.g. pivot tables, data recommendation…
lifetime value and cohort analysis visualisation tools
• …or to send personalised marketing
• Statistical and machine learning to customers to drive up customer
• Reports will vary in scope e.g. for tools to perform e.g. behavioural satisfaction
management team, marketing segmentations of customer
teams, product development team base, predict likely customer Some of the analytics tools
etc. lifetime value we develop will be offered as
cloud-based solutions, for a
monthly subscription
8. Whilst many of the tools are not yet developed, we
recommend installing SnowPlow today
•
1 Start warehousing your web analytics data using SnowPlow today
•
2 Start using the already available (free, open source) tools, particularly Apache Hive, to drive
insight from your user data today
•
3 Have a large data set ready for when our more business friendly analytics tools become
available
Download SnowPlow from Github Contact Keplar LLP for support and
consultancy
github.com/snowplow/snowplow www.keplarllp.com