Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Using big data tools to analyse web
analytics data

Why use big data tools to analyse web analytics data?
How would you use big data tools to analyse web
analytics data (with Snowplow and Qubole)
Web event data is incredibly valuable
• It tells you how your customers actually behave (in lots of detail), and how that varies
• Between different customers
• For the same customers over time. (Seasonality, progress in customer journey)
• How behaviour drives value

• It tells you how customers engage with you via your website / webapp
• How that varies by different versions of your product
• How improvements to your product drive increased customer satisfaction and lifetime value

• It tells you how customers and prospective customers engage with your different
marketing campaigns and how that drives subsequent behaviour

Web analytics data should be essential to driving customer
development, product development and marketing decisions
Deriving value from web analytics data often involves very
bespoke analytics
• The web is a rich and varied space! E.g.
•
•
•
•
•
•
•

Bank
Newspaper
Social network
Analytics application
Government organisation (e.g. tax office)
Retailer
Marketplace

• For each type of business you’d expect different :
•
•
•
•

Types of events, with different types of associated data
Ecosystem of customers / partners with different types of relationships
Product development cycle (and approach to product development)
Types of business questions / priorities to inform how the data is analysed
Web analytics tools are good at delivering the standard reports
that are common across different business types…
• Where does your traffic come from e.g.
• Sessions by marketing campaign / referrer
• Sessions by landing page

• Understanding events common across business types (page views, transactions, ‘goals’)
e.g.
•
•
•
•

Page views per session
Page views per web page
Conversion rate by traffic source
Transaction value by traffic source

• Capturing contextual data common people browsing the web
•
•
•
•
•
•

Timestamps
Referer data
Web page data (e.g. page title, URL)
Browser data (e.g. type, plugins, language)
Operating system (e.g. type, timezone)
Hardware (e.g. mobile / tablet / desktop, screen resolution, colour depth)
…but not at enabling the high-value bespoke analytics
• What is the impact of different ad campaigns and creative on the way users
behave, subsequently? What is the return on that ad spend?

• How do visitors use social channels (Facebook / Twitter) to interact around video
content? How can we predict which content will “go viral”?

• How do updates to our product change the “stickiness” of our service? ARPU?
Does that vary by customer segment?
That is because there are significant limitations in the way
traditional web analytics programmes handle:
Data collection
• Sample-based (e.g.
Google Analytics)
• Limited set of events e.g.
page views, goals,
transactions

• Limited set of ways of
describing events
(custom dim 1, custom
dim 2…)

Data processing

Data access

• Data is processed ‘once’

• Data is either aggregated
(e.g. Google Analytics),
or available as a
complete log file for a
fee (e.g. Adobe
SiteCatalyst)

• No validation
• No opportunity to
reprocess e.g. following
update to business rules

• Data is aggregated
prematurely
• Only particular
combinations of metrics
/ dimensions can be
pivoted together
(Google Analytics)
• Only particular type of
analysis are possible on
different types of
dimension (e.g. sProps,
eVars, conversion goals
in SiteCatalyst

• As a result, data is siloed:
hard to join with other
data sets
We built Snowplow to address those limitations and enable high
value, bespoke analytics on web event data

Data pipeline

Big data store

Snowplow is a data pipeline:
•
•
•

Captures data from website via Javascript tags
Validates, cleans, and enriches the incoming data (using Hadoop)
Loads the cleaned / enriched data store into a big data store for
analysis e.g. S3 where it can be analysed using big data tools e.g.
Qubole
Understanding the technology that powers the Snowplow data
pipeline
The Snowplow data pipeline consists of five loosely coupled modules:
Understanding the technology that powers the Snowplow data
pipeline
The Snowplow data pipeline consists of five loosely coupled modules:

Trackers generate event data
•
•
•
•
•

Javascript tracker for collecting data client-side
No-JS / pixel tracker (e.g. for email marketing)
Server side trackers (e.g. Lua tracker). Python / Ruby / Java / Scala on roadmap
Mobile trackers (iOS, Android on the roadmap…)
Internet of things (e.g. Arduino tracker)
Understanding the technology that powers the Snowplow data
pipeline
The Snowplow data pipeline consists of five loosely coupled modules:

Collectors receive data and write it to a queue for processing
• Cloudfront collector writes data to S3
• Clojure collector sets 3rd party cookie writes to S3
• Scala RT collector sets 3rd party cookie writes to S3 AND Kinesis
Understanding the technology that powers the Snowplow data
pipeline
The Snowplow data pipeline consists of five loosely coupled modules:

Enrichment validates and enriches the data
• Validates e.g. checks expected fields are set for each event type
• Enrichments e.g. categorising referrers (search / social), inferring location from IP
• Hadoop-based enrichment module (easy reprocessing of data)
• Kinesis-based enrichment module (real time processing) in development
Understanding the technology that powers the Snowplow data
pipeline
The Snowplow data pipeline consists of five loosely coupled modules:

Storage – make data available for analysis
• Store data in Amazon S3 for processing using big data tools e.g. Qubole
• Also support storage in Amazon Redshift / PostgreSQL for analysis using
traditional BI tools
So what does Snowplow data look like?
• A single table
• One line of data per event
• Fat table: 98 different fields (and counting)…
Type of field

Example field(s)

Description

User ID

domain_userid,
network_userid

Fields to identify user performing browsing. 1st and 3rd party
cookie IDs, browser fingerprints, IP address and separate field for
setting to custom value all available

Web page

page_urlpath

Fields that describe the web page the event occurred on,
including document size, URL, title

Traffic source

mkt_source, refr_source

Fields that relate to indicate the source of traffic. Snowplow
includes fields that can be set via utm parameters and others
based on the referrer

Event (rather
than context)

event, se_action, tr_total

Fields that relate to a specific event (e.g. transaction total)

User tech
setup

br_type, os_name,
dvce_type, br_viewheight

Fields that describe the user’s browser / OS / device setup

…

…

…
How do you analyse Snowplow data with Qubole?
• Common approach: use Hive on Qubole (could also use Pig or other Hadoop-based jobs)
• Create the events table (incl. recovering partitions)
• Write highly bespoke queries directly against the complete events table
DEMO!
Performing more sophisticated analysis
• Unfortunately there’s not time on this webinar to do a deeper demo…
• …however, there are resources available, in particular, the Snowplow Analytics
Cookbook - http://snowplowanalytics.com/analytics/index.html
1 sur 16

Recommandé

How we use Hive at SnowPlow, and how the role of HIve is changing par
How we use Hive at SnowPlow, and how the role of HIve is changingHow we use Hive at SnowPlow, and how the role of HIve is changing
How we use Hive at SnowPlow, and how the role of HIve is changingyalisassoon
3.4K vues12 diapositives
Snowplow: where we came from and where we are going - March 2016 par
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016yalisassoon
1.5K vues12 diapositives
Snowplow Analytics and Looker at Oyster.com par
Snowplow Analytics and Looker at Oyster.comSnowplow Analytics and Looker at Oyster.com
Snowplow Analytics and Looker at Oyster.comyalisassoon
1.9K vues15 diapositives
Understanding event data par
Understanding event dataUnderstanding event data
Understanding event datayalisassoon
5.1K vues24 diapositives
How to evolve your analytics stack with your business using Snowplow par
How to evolve your analytics stack with your business using SnowplowHow to evolve your analytics stack with your business using Snowplow
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
950 vues19 diapositives
Snowplow Analytics: from NoSQL to SQL and back again par
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
4.9K vues29 diapositives

Contenu connexe

Tendances

Snowplow - Evolve your analytics stack with your business par
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your businessGiuseppe Gaviani
884 vues20 diapositives
A taste of Snowplow Analytics data par
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics dataRobert Kingston
8.2K vues22 diapositives
Big data meetup budapest adding data schemas to snowplow par
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplowyalisassoon
4.7K vues23 diapositives
Snowplow presentation for Amsterdam Meetup #3 par
Snowplow presentation for Amsterdam Meetup #3Snowplow presentation for Amsterdam Meetup #3
Snowplow presentation for Amsterdam Meetup #3Snowplow Analytics
916 vues25 diapositives
Snowplow the evolving data pipeline par
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipelineyalisassoon
950 vues20 diapositives
Big Data Beers - Introducing Snowplow par
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing SnowplowAlexander Dean
1.7K vues33 diapositives

Tendances(20)

Snowplow - Evolve your analytics stack with your business par Giuseppe Gaviani
Snowplow - Evolve your analytics stack with your businessSnowplow - Evolve your analytics stack with your business
Snowplow - Evolve your analytics stack with your business
Giuseppe Gaviani884 vues
A taste of Snowplow Analytics data par Robert Kingston
A taste of Snowplow Analytics dataA taste of Snowplow Analytics data
A taste of Snowplow Analytics data
Robert Kingston8.2K vues
Big data meetup budapest adding data schemas to snowplow par yalisassoon
Big data meetup budapest   adding data schemas to snowplowBig data meetup budapest   adding data schemas to snowplow
Big data meetup budapest adding data schemas to snowplow
yalisassoon4.7K vues
Snowplow the evolving data pipeline par yalisassoon
Snowplow   the evolving data pipelineSnowplow   the evolving data pipeline
Snowplow the evolving data pipeline
yalisassoon950 vues
Big Data Beers - Introducing Snowplow par Alexander Dean
Big Data Beers - Introducing SnowplowBig Data Beers - Introducing Snowplow
Big Data Beers - Introducing Snowplow
Alexander Dean1.7K vues
Snowplow: evolve your analytics stack with your business par yalisassoon
Snowplow: evolve your analytics stack with your businessSnowplow: evolve your analytics stack with your business
Snowplow: evolve your analytics stack with your business
yalisassoon1.2K vues
Simply Business - Near Real Time Event Processing par idan_by
Simply Business - Near Real Time Event ProcessingSimply Business - Near Real Time Event Processing
Simply Business - Near Real Time Event Processing
idan_by2.3K vues
Snowplow: open source game analytics powered by AWS par Giuseppe Gaviani
Snowplow: open source game analytics powered by AWSSnowplow: open source game analytics powered by AWS
Snowplow: open source game analytics powered by AWS
Giuseppe Gaviani1.9K vues
How Gousto is moving to just-in-time personalization with Snowplow par Giuseppe Gaviani
How Gousto is moving to just-in-time personalization with SnowplowHow Gousto is moving to just-in-time personalization with Snowplow
How Gousto is moving to just-in-time personalization with Snowplow
Giuseppe Gaviani1.5K vues
2016 09 measurecamp - event data modeling par yalisassoon
2016 09 measurecamp - event data modeling2016 09 measurecamp - event data modeling
2016 09 measurecamp - event data modeling
yalisassoon798 vues
Data driven video advertising campaigns - JustWatch & Snowplow par Giuseppe Gaviani
Data driven video advertising campaigns - JustWatch & SnowplowData driven video advertising campaigns - JustWatch & Snowplow
Data driven video advertising campaigns - JustWatch & Snowplow
Giuseppe Gaviani2.1K vues
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more par Amazon Web Services
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Modelling event data in look ml par yalisassoon
Modelling event data in look mlModelling event data in look ml
Modelling event data in look ml
yalisassoon3.8K vues
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series par Amazon Web Services
Introduction to Amazon Kinesis Firehose - AWS August Webinar SeriesIntroduction to Amazon Kinesis Firehose - AWS August Webinar Series
Introduction to Amazon Kinesis Firehose - AWS August Webinar Series
Amazon big success using big data analytics par Kovid Academy
Amazon big success using big data analyticsAmazon big success using big data analytics
Amazon big success using big data analytics
Kovid Academy473 vues
Implementing improved and consistent arbitrary event tracking company-wide us... par yalisassoon
Implementing improved and consistent arbitrary event tracking company-wide us...Implementing improved and consistent arbitrary event tracking company-wide us...
Implementing improved and consistent arbitrary event tracking company-wide us...
yalisassoon1.3K vues
How to Build Fast Data Applications: Evaluating the Top Contenders par VoltDB
How to Build Fast Data Applications: Evaluating the Top ContendersHow to Build Fast Data Applications: Evaluating the Top Contenders
How to Build Fast Data Applications: Evaluating the Top Contenders
VoltDB1.2K vues
Snowplow is at the core of everything we do par yalisassoon
Snowplow is at the core of everything we doSnowplow is at the core of everything we do
Snowplow is at the core of everything we do
yalisassoon3.8K vues

En vedette

Using Snowplow for A/B testing and user journey analysis at CustomMade par
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMadeyalisassoon
2.1K vues7 diapositives
Snowplow at Sigfig par
Snowplow at SigfigSnowplow at Sigfig
Snowplow at Sigfigyalisassoon
3K vues9 diapositives
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015 par
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015yalisassoon
5.6K vues22 diapositives
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016 par
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016yalisassoon
1.6K vues15 diapositives
Qubole @ AWS Meetup Bangalore - July 2015 par
Qubole @ AWS Meetup Bangalore - July 2015Qubole @ AWS Meetup Bangalore - July 2015
Qubole @ AWS Meetup Bangalore - July 2015Joydeep Sen Sarma
1.7K vues57 diapositives
Qubole - Big data in cloud par
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
1.6K vues47 diapositives

En vedette(16)

Using Snowplow for A/B testing and user journey analysis at CustomMade par yalisassoon
Using Snowplow for A/B testing and user journey analysis at CustomMadeUsing Snowplow for A/B testing and user journey analysis at CustomMade
Using Snowplow for A/B testing and user journey analysis at CustomMade
yalisassoon2.1K vues
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015 par yalisassoon
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
Chefsfeed presentation to Snowplow Meetup San Francisco, Oct 2015
yalisassoon5.6K vues
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016 par yalisassoon
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
Analytics at Carbonite: presentation to Snowplow Meetup Boston April 2016
yalisassoon1.6K vues
How to exploit Data with Tools for Social Media: Followerwonk par bigdatablog
How to exploit Data with Tools for Social Media: FollowerwonkHow to exploit Data with Tools for Social Media: Followerwonk
How to exploit Data with Tools for Social Media: Followerwonk
bigdatablog1.1K vues
Big Data in Online Classifieds par Domonkos Tikk
Big Data in Online ClassifiedsBig Data in Online Classifieds
Big Data in Online Classifieds
Domonkos Tikk2.1K vues
Web Analytics Concepts & Theories par mattPROv1
Web Analytics Concepts & TheoriesWeb Analytics Concepts & Theories
Web Analytics Concepts & Theories
mattPROv14.1K vues
Web Analytics: Challenges in Data Modeling par Excella
Web Analytics: Challenges in Data ModelingWeb Analytics: Challenges in Data Modeling
Web Analytics: Challenges in Data Modeling
Excella6.2K vues
QUÉ ES TRABAJAR par ejemplo12
QUÉ ES TRABAJARQUÉ ES TRABAJAR
QUÉ ES TRABAJAR
ejemplo121.2K vues
WEB Analytics - Data Mining - MIS - eBusiness website par Jyotindra Zaveri
WEB Analytics  - Data Mining - MIS - eBusiness website WEB Analytics  - Data Mining - MIS - eBusiness website
WEB Analytics - Data Mining - MIS - eBusiness website
Jyotindra Zaveri7.6K vues
Big data and APIs for PHP developers - SXSW 2011 par Eli White
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White13.4K vues

Similaire à Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce par
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
793 vues29 diapositives
Analytics in Your Enterprise par
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your EnterpriseWSO2
531 vues50 diapositives
Big data meet_up_08042016 par
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016Mark Smith
338 vues39 diapositives
Web Analytics Primer par
Web Analytics PrimerWeb Analytics Primer
Web Analytics PrimerChad Richeson
1.2K vues37 diapositives
Hadoop in the Cloud: Common Architectural Patterns par
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural PatternsDataWorks Summit
1.7K vues22 diapositives
Splunk Digital Intelligence par
Splunk Digital IntelligenceSplunk Digital Intelligence
Splunk Digital IntelligenceDmitry Anoshin
1.2K vues19 diapositives

Similaire à Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole(20)

Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce par Deep.BI
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.bi - Real-time, Deep Data Analytics Platform For Ecommerce
Deep.BI793 vues
Analytics in Your Enterprise par WSO2
Analytics in Your EnterpriseAnalytics in Your Enterprise
Analytics in Your Enterprise
WSO2531 vues
Big data meet_up_08042016 par Mark Smith
Big data meet_up_08042016Big data meet_up_08042016
Big data meet_up_08042016
Mark Smith338 vues
Hadoop in the Cloud: Common Architectural Patterns par DataWorks Summit
Hadoop in the Cloud: Common Architectural PatternsHadoop in the Cloud: Common Architectural Patterns
Hadoop in the Cloud: Common Architectural Patterns
DataWorks Summit1.7K vues
SplunkLive! Milano 2016 - customer presentation - Unicredit par Splunk
SplunkLive! Milano 2016 -  customer presentation - UnicreditSplunkLive! Milano 2016 -  customer presentation - Unicredit
SplunkLive! Milano 2016 - customer presentation - Unicredit
Splunk5.8K vues
Big Data and User Segmentation in Mobile Context par InMobi Technology
Big Data and User Segmentation in Mobile ContextBig Data and User Segmentation in Mobile Context
Big Data and User Segmentation in Mobile Context
InMobi Technology1.5K vues
[WSO2Con USA 2018] Patterns for Building Streaming Apps par WSO2
[WSO2Con USA 2018] Patterns for Building Streaming Apps[WSO2Con USA 2018] Patterns for Building Streaming Apps
[WSO2Con USA 2018] Patterns for Building Streaming Apps
WSO2240 vues
UNIT I Streaming Data & Architectures.pptx par Rahul Borate
UNIT I Streaming Data & Architectures.pptxUNIT I Streaming Data & Architectures.pptx
UNIT I Streaming Data & Architectures.pptx
Rahul Borate2 vues
Azure Stream Analytics : Analyse Data in Motion par Ruhani Arora
Azure Stream Analytics  : Analyse Data in MotionAzure Stream Analytics  : Analyse Data in Motion
Azure Stream Analytics : Analyse Data in Motion
Ruhani Arora694 vues
Analytical Innovation: How to Build the Next Generation Data Platform par VMware Tanzu
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu1K vues
Extracting Insights from Data at Twitter par Prasad Wagle
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
Prasad Wagle2K vues
Business Analytics Paradigm Change par Dmitry Anoshin
Business Analytics Paradigm ChangeBusiness Analytics Paradigm Change
Business Analytics Paradigm Change
Dmitry Anoshin1.9K vues
Advanced Analytics and Machine Learning with Data Virtualization par Denodo
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
Denodo 149 vues
Dealing with Common Data Requirements in Your Enterprise par WSO2
Dealing with Common Data Requirements in Your EnterpriseDealing with Common Data Requirements in Your Enterprise
Dealing with Common Data Requirements in Your Enterprise
WSO2590 vues

Plus de yalisassoon

Snowplow: putting digital analysts at the heart of digital analytics - the fo... par
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...yalisassoon
2.2K vues15 diapositives
Capturing online customer data to create better insights and targeted actions... par
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...yalisassoon
862 vues20 diapositives
Yali presentation for snowplow amsterdam meetup number 2 par
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2yalisassoon
2.2K vues10 diapositives
Snowplow at DA Hub emerging technology showcase par
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcaseyalisassoon
2.2K vues11 diapositives
Modeling event data par
Modeling event dataModeling event data
Modeling event datayalisassoon
1.2K vues16 diapositives
The analytics journey at Viewbix - how they came to use Snowplow and the setu... par
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...yalisassoon
2.5K vues15 diapositives

Plus de yalisassoon(8)

Snowplow: putting digital analysts at the heart of digital analytics - the fo... par yalisassoon
Snowplow: putting digital analysts at the heart of digital analytics - the fo...Snowplow: putting digital analysts at the heart of digital analytics - the fo...
Snowplow: putting digital analysts at the heart of digital analytics - the fo...
yalisassoon2.2K vues
Capturing online customer data to create better insights and targeted actions... par yalisassoon
Capturing online customer data to create better insights and targeted actions...Capturing online customer data to create better insights and targeted actions...
Capturing online customer data to create better insights and targeted actions...
yalisassoon862 vues
Yali presentation for snowplow amsterdam meetup number 2 par yalisassoon
Yali presentation for snowplow amsterdam meetup number 2Yali presentation for snowplow amsterdam meetup number 2
Yali presentation for snowplow amsterdam meetup number 2
yalisassoon2.2K vues
Snowplow at DA Hub emerging technology showcase par yalisassoon
Snowplow at DA Hub emerging technology showcaseSnowplow at DA Hub emerging technology showcase
Snowplow at DA Hub emerging technology showcase
yalisassoon2.2K vues
Modeling event data par yalisassoon
Modeling event dataModeling event data
Modeling event data
yalisassoon1.2K vues
The analytics journey at Viewbix - how they came to use Snowplow and the setu... par yalisassoon
The analytics journey at Viewbix - how they came to use Snowplow and the setu...The analytics journey at Viewbix - how they came to use Snowplow and the setu...
The analytics journey at Viewbix - how they came to use Snowplow and the setu...
yalisassoon2.5K vues
Customer lifetime value par yalisassoon
Customer lifetime valueCustomer lifetime value
Customer lifetime value
yalisassoon50.5K vues
A KPI framework for startups par yalisassoon
A KPI framework for startupsA KPI framework for startups
A KPI framework for startups
yalisassoon79.3K vues

Dernier

CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T par
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TShapeBlue
112 vues34 diapositives
Why and How CloudStack at weSystems - Stephan Bienek - weSystems par
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsShapeBlue
197 vues13 diapositives
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
166 vues8 diapositives
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... par
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...ShapeBlue
88 vues13 diapositives
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... par
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
120 vues13 diapositives
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...ShapeBlue
123 vues28 diapositives

Dernier(20)

CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T par ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue112 vues
Why and How CloudStack at weSystems - Stephan Bienek - weSystems par ShapeBlue
Why and How CloudStack at weSystems - Stephan Bienek - weSystemsWhy and How CloudStack at weSystems - Stephan Bienek - weSystems
Why and How CloudStack at weSystems - Stephan Bienek - weSystems
ShapeBlue197 vues
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT par ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue166 vues
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... par ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue88 vues
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... par ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue120 vues
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... par ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue123 vues
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue par ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue163 vues
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue par ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue222 vues
DRBD Deep Dive - Philipp Reisner - LINBIT par ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue140 vues
Data Integrity for Banking and Financial Services par Precisely
Data Integrity for Banking and Financial ServicesData Integrity for Banking and Financial Services
Data Integrity for Banking and Financial Services
Precisely78 vues
Igniting Next Level Productivity with AI-Infused Data Integration Workflows par Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software385 vues
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... par ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue117 vues
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... par Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker50 vues
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates par ShapeBlue
Keynote Talk: Open Source is Not Dead - Charles Schulz - VatesKeynote Talk: Open Source is Not Dead - Charles Schulz - Vates
Keynote Talk: Open Source is Not Dead - Charles Schulz - Vates
ShapeBlue210 vues
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue par ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue103 vues
Digital Personal Data Protection (DPDP) Practical Approach For CISOs par Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash153 vues
The Role of Patterns in the Era of Large Language Models par Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li80 vues

Why use big data tools to do web analytics? And how to do it using Snowplow and Qubole

  • 1. Using big data tools to analyse web analytics data Why use big data tools to analyse web analytics data? How would you use big data tools to analyse web analytics data (with Snowplow and Qubole)
  • 2. Web event data is incredibly valuable • It tells you how your customers actually behave (in lots of detail), and how that varies • Between different customers • For the same customers over time. (Seasonality, progress in customer journey) • How behaviour drives value • It tells you how customers engage with you via your website / webapp • How that varies by different versions of your product • How improvements to your product drive increased customer satisfaction and lifetime value • It tells you how customers and prospective customers engage with your different marketing campaigns and how that drives subsequent behaviour Web analytics data should be essential to driving customer development, product development and marketing decisions
  • 3. Deriving value from web analytics data often involves very bespoke analytics • The web is a rich and varied space! E.g. • • • • • • • Bank Newspaper Social network Analytics application Government organisation (e.g. tax office) Retailer Marketplace • For each type of business you’d expect different : • • • • Types of events, with different types of associated data Ecosystem of customers / partners with different types of relationships Product development cycle (and approach to product development) Types of business questions / priorities to inform how the data is analysed
  • 4. Web analytics tools are good at delivering the standard reports that are common across different business types… • Where does your traffic come from e.g. • Sessions by marketing campaign / referrer • Sessions by landing page • Understanding events common across business types (page views, transactions, ‘goals’) e.g. • • • • Page views per session Page views per web page Conversion rate by traffic source Transaction value by traffic source • Capturing contextual data common people browsing the web • • • • • • Timestamps Referer data Web page data (e.g. page title, URL) Browser data (e.g. type, plugins, language) Operating system (e.g. type, timezone) Hardware (e.g. mobile / tablet / desktop, screen resolution, colour depth)
  • 5. …but not at enabling the high-value bespoke analytics • What is the impact of different ad campaigns and creative on the way users behave, subsequently? What is the return on that ad spend? • How do visitors use social channels (Facebook / Twitter) to interact around video content? How can we predict which content will “go viral”? • How do updates to our product change the “stickiness” of our service? ARPU? Does that vary by customer segment?
  • 6. That is because there are significant limitations in the way traditional web analytics programmes handle: Data collection • Sample-based (e.g. Google Analytics) • Limited set of events e.g. page views, goals, transactions • Limited set of ways of describing events (custom dim 1, custom dim 2…) Data processing Data access • Data is processed ‘once’ • Data is either aggregated (e.g. Google Analytics), or available as a complete log file for a fee (e.g. Adobe SiteCatalyst) • No validation • No opportunity to reprocess e.g. following update to business rules • Data is aggregated prematurely • Only particular combinations of metrics / dimensions can be pivoted together (Google Analytics) • Only particular type of analysis are possible on different types of dimension (e.g. sProps, eVars, conversion goals in SiteCatalyst • As a result, data is siloed: hard to join with other data sets
  • 7. We built Snowplow to address those limitations and enable high value, bespoke analytics on web event data Data pipeline Big data store Snowplow is a data pipeline: • • • Captures data from website via Javascript tags Validates, cleans, and enriches the incoming data (using Hadoop) Loads the cleaned / enriched data store into a big data store for analysis e.g. S3 where it can be analysed using big data tools e.g. Qubole
  • 8. Understanding the technology that powers the Snowplow data pipeline The Snowplow data pipeline consists of five loosely coupled modules:
  • 9. Understanding the technology that powers the Snowplow data pipeline The Snowplow data pipeline consists of five loosely coupled modules: Trackers generate event data • • • • • Javascript tracker for collecting data client-side No-JS / pixel tracker (e.g. for email marketing) Server side trackers (e.g. Lua tracker). Python / Ruby / Java / Scala on roadmap Mobile trackers (iOS, Android on the roadmap…) Internet of things (e.g. Arduino tracker)
  • 10. Understanding the technology that powers the Snowplow data pipeline The Snowplow data pipeline consists of five loosely coupled modules: Collectors receive data and write it to a queue for processing • Cloudfront collector writes data to S3 • Clojure collector sets 3rd party cookie writes to S3 • Scala RT collector sets 3rd party cookie writes to S3 AND Kinesis
  • 11. Understanding the technology that powers the Snowplow data pipeline The Snowplow data pipeline consists of five loosely coupled modules: Enrichment validates and enriches the data • Validates e.g. checks expected fields are set for each event type • Enrichments e.g. categorising referrers (search / social), inferring location from IP • Hadoop-based enrichment module (easy reprocessing of data) • Kinesis-based enrichment module (real time processing) in development
  • 12. Understanding the technology that powers the Snowplow data pipeline The Snowplow data pipeline consists of five loosely coupled modules: Storage – make data available for analysis • Store data in Amazon S3 for processing using big data tools e.g. Qubole • Also support storage in Amazon Redshift / PostgreSQL for analysis using traditional BI tools
  • 13. So what does Snowplow data look like? • A single table • One line of data per event • Fat table: 98 different fields (and counting)… Type of field Example field(s) Description User ID domain_userid, network_userid Fields to identify user performing browsing. 1st and 3rd party cookie IDs, browser fingerprints, IP address and separate field for setting to custom value all available Web page page_urlpath Fields that describe the web page the event occurred on, including document size, URL, title Traffic source mkt_source, refr_source Fields that relate to indicate the source of traffic. Snowplow includes fields that can be set via utm parameters and others based on the referrer Event (rather than context) event, se_action, tr_total Fields that relate to a specific event (e.g. transaction total) User tech setup br_type, os_name, dvce_type, br_viewheight Fields that describe the user’s browser / OS / device setup … … …
  • 14. How do you analyse Snowplow data with Qubole? • Common approach: use Hive on Qubole (could also use Pig or other Hadoop-based jobs) • Create the events table (incl. recovering partitions) • Write highly bespoke queries directly against the complete events table
  • 15. DEMO!
  • 16. Performing more sophisticated analysis • Unfortunately there’s not time on this webinar to do a deeper demo… • …however, there are resources available, in particular, the Snowplow Analytics Cookbook - http://snowplowanalytics.com/analytics/index.html