BigQuery is a powerful tool for marketers because it allows for fast SQL queries across large datasets, seamless integration with Google products like Analytics and Machine Learning, and cheap pricing. The document discusses two use cases: 1) integrating CRM and Analytics data to attribute marketing channels to sales outcomes, and 2) using BigQuery ML to cluster user groups, build predictive models, and activate remarketing lists. In conclusion, the document emphasizes that BigQuery makes data warehousing, exploration, and basic machine learning surprisingly easy and affordable.
2. 2
What we’ll cover
● Introduction to BigQuery
● How does BQ works
● Use Case 1 - Integrate with CRM Data to get
meaningful insights
● Use Case 2 - Create remarketing list using BQ
ML
5. What is Big Query
▹ an enterprise data warehouse
▹ super-fast SQL queries
▹ leverage Google's infrastructure
▹ Cheap, pay on demand solution (easy to scale)
5
6. No hardware, easy to set up
▹ You can have a full datawarehouse running within
minutes, with virtually ZERO ongoing operational
overhead
6
BigQuery sits somewhere in between
10. Why is it so powerful for Marketers?
▹ We usually have the limited dev resources to maintain
and configure data warehouse
▹ works seamlessly with google products
▸ data visualisation (DataStudio)
▸ machine learning (BQML*, Cloud auto ML)
▹ BIG PLUS: access to Google Analytics, Adwords, Youtube
data raw data* (if you have a $150k/year Google 360 subscription)
10
15. Function &
Operators
15 Support both:
1. Legacy SQL
2. Standard SQL*
*Not all functions are supported, one of the
prominent example would be - SELECT distinct.
(Instead user will need to use Group By &
Partition By to replace these function)
https://cloud.google.com/bigquery/docs/reference/standard-sql/
16. GA Data in
BigQuery
16
▹ A new table per day "ga_sessions_YYYYMMDD".
▹ Each row represent a new session
▹ Session data is nested and stored in a
massive Big Table - works more like
arrays/subsets
Tips:
- Query by day/ month/year
- Unnest only “subsets” of data needed
18. Integrate with CRM data to
obtain Meaningful Data
Marketing use case 1
18
19. Case Study
Recruitment company, relies heavily on offline activity to
qualify leads (applicants) that comes in
problem:
▹ Misalignment in business metric (P&L) - Digital
marketing and business metrics are not aligned
▹ Attribution - Marketing channels and campaigns
cannot be accurately attributed to each placement
19
20. Solution
▹ A end-to-end bespoke dashboard to show the right
data:
▸ how marketing channels contribute to each
placements (sales) & quality candidates
▸ How offline and online activities works together
20
21. Prerequisite21
CRM
▹ You (or your client) is on Google 360
premium to access google analytics
raw data
▹ Basic standard SQL
▹ Configured an integration between
Google analytics and CRM (Customer
Relationship Management)
22. Configure GA & CRM integration
▹ User/client ID
https://support.google.com/analyti
cs/answer/7584446?hl=en
▹ using Data layer to send
back CRM user ID to match
particular sessions on GA
22
23. Approach23
Integration GA & CRM
Import CRM data
into BigQuery
Join GA Session Data & CRM
using unique identifier
Attribute GA sessions
Connect with Data
Studio
24. Create remarketing & look
alike audience using
BigQuery ML
Marketing use case 2
24
25. Case Study
Sales problem:
▹ Need a better way to prioritise leads: Offline sales
oriented nature the lead volume they get per month is
enormous (ten of thousands)
Marketing problem
▹ Struggle to improve ROI of remarketing efforts
▹ Lack understanding of the traits of high value
converters
25
28. 1- Collect &
Create data
sets
28
▹ Started with supervised learning
▹ Created a datasets that require a lot of
data cleaning, labeling
▹ Trying to obtain as much data points as
possible incl email, sales consultant
performance, seasonality (eg school
holiday)
29. - Desktop_flag
- Tablet_flag
- Mobile_flag
- Total time on site
- Total count of sessions
- Sum of morning visits
- Sum of daytime visits
- Sum of time spent on Product Page
*
- Sum of time spent on resource page
- Channel coming through first touch
- Channel coming through last touch
- Count of pages visited
- Days difference since the first
session
- Sum hits number
29
What data is available in GA that
we can use?
30. 2 - Clean and
categorise data
30 Verify validity and clean data to make it ready
to process. Look out for data quality
problems like:
▹ Missing data
▹ Invalid or inconsistent data types
▹ Distribution bias or uniformity
Tips:
▹ IF function to process missing value
▹ CASE statements to convert and transform data into
dummy variables
31. 3 - Cluster &
categorise
different group
of users
31 Based on our analysis and business context,
we cluster the data into 4 main categories
1. Not interested, non-converters
2. Interested, non-converts
3. Not yet interested , converters
4. Interested, converters
32. 4 - Create &
train model
32 Create a model for each group of users:
▹ BQ has a amazing integration with R
studio & Python (using code lab)
▹ Alternatively, we can use BigQuery ML
(using standard SQL function!)
▸ Linear Regression
▸ Logistic Regression(in this case)
▸ Multiclass logistic regression for
classification
https://cloud.google.com/bigquery/docs/bigqueryml-intro
33. ● Decrease complexity
● Increase speed
● Simple, easy to learn
● Export data from Big
Query may be
prevented by legal
restrictions (such as
HIPAA guidelines).
33
Advantage of using BQ ML
34. 5 - Evaluate
your model
34
▹ allow you to evaluate your model based
on simple function “ML.Evaluate”
▹ Based on what model you have chosen
you will have a different results
Example of logistic regression:
35. 6 - Predict
Outcome
35 Example of using the PREDICT function to predict
ecommerce purchase:
37. 1. Create & test out lookalike audience based on your findings eg
demographic features, location
2. Create remarketing list based on lead score
- Import lead score & visitor ID in GA > create custom metric
for lead score> create remarketing audience
https://github.com/GoogleCloudPlatform/google-analytics-premium-bigquery-
statistics/blob/master/README.md
37
Marketing activation
38. 1. BigQuery is a cheap, easy to set up data warehouse/
exploration/preparation/machine learning(!) tool
2. Data brings people together (sales & marketing) & goes a long
way if use appropriately
3. Creating and using Machine learning model is easier than you
think (with the right tool*)
38
Conclusion
39. 39
LET’S BQ THIS
Any queries?
You can find me at
gabriella@inmarketingwetrust.com.au
linkedin.com/in/gabriella-wong
Notes de l'éditeur
Leverage Big Query & Google Analytics Purpose is to show how everyone can all be involved in a data science project
an enterprise data warehouse that solves big data problem by enabling super-fast SQL queries using the processing power of Google's infrastructure.
Im not lying and how fast BQ runs
In this slide we see the overall structure of BigQuery. Data tables are organized into units called datasets. Each data set is scoped to your project and this allows you to logically organize your data
The fully google-managed data warehouse lets you do ad-hoc SQL queries on massive volumes of data sets
.
.
.
Though it is SQL some syntax are different but they are getting used to after you use for some time.
.
GA Data is not stored exactly the same way as what I showed earlier
BQ recently start supporting real time data as well
.
GA Data is not stored exactly the same way as what I showed earlier
BQ recently start supporting real time data as well
.
Nature of business is very sales oriented
Previously can only track leads on a quantity level, not quality hence can’t estimate how much profit/Loss we are making per lead
Has caused a lot of distrust internally towards marketing,Marketing cannot really justify their spendings
Acquisition nature in recruitment works a little differently → a lot more channels aka job boards
30+% placed application placed come from consultant sourcing candidate within CRM, so they need attribution to help them understand where these qualified applicants are coming from the first place
How quick perception change, after we got it up running we immediately got a 200k budget to run campaigns
unique key in CRM and Google Analytics to match a customer in your CRM with the web visitor that is browsing your website.
Google recommended first approach (also the simplest!) by simply adding a hidden input field in the lead form as part of the HTML form element and send it back to GA
Create a Machine Learning mode under a supervised learning environment that score leads when they come through
(Good thing about this case is all the work before the deployment of the model don’t actually require the client to have a Google 360, although exporting data from GA will be a pain in the ass and probably blow the bud
It will still require a connection between GA & CRM )
I wont go into details - it is mainly statistics work from here, I still havent had the chance to try out BQ ML personally I think it only got released not long agohttps://cloud.google.com/bigquery/docs/bigqueryml-intro but the query in the tutorial seem fairly straighforward, I attached a link to the cook book of BQ ML on the slide so if you're interested you can go through that
To be honest the
Theres a lot of different tools and method out there to help you solve these problems and prep the data.
But since this is a use case on BQ, so i will give a few tips in helping to process data in Big Query
we are able to cluster our data into 4 different groups of users, based on exploration & categorising of data and what make sense in a business context
A - have not show any interest in the product, and will never convert
B - Have shown some interest in the product (aka submit a lead form), but they will they convert
C - have not yet shown any interest in the product, but they are going to convert in the future
D - have shown interest, and will convert
It’s quite obvious by now which group we were
I wont go into details - it is mainly statistics work from here.
I still havent had the chance to try out BQ ML personally I think it only got released not long ago but the query in the tutorial seem fairly straighforward, and we can use simple standard SQL to run all of these function - I attached a link to the cook book of BQ ML on the slide so if you're interested you can go through that.
Multiple tools are required if need to be exported
Moving and formatting large amounts data for Python-based ML frameworks takes longer than model training in BigQuery.