One of the hottest topics in database land these days is BigQuery ML. A new way to use machine learning on top of tabular data straight on your tables without leaving the query editor.
With BigQuery ML, you can build machine learning models without leaving the database environment and training it on massive datasets.
In this demo session, we are going to demonstrate common marketing Machine Learning use cases how to build, train, eval and predict, your own scalable machine learning models using SQL language.
The audience will get first hand experience how to write CREATE MODEL sql syntax to build machine learning models such as:
– Multiclass logistic regression for classification
– K-means clustering
– Matrix factorization
– ARIMA time series predictions
– Import TensorFlow models for prediction in BigQuery
Models are trained and accessed in BigQuery using SQL — a language data analysts know. This enables business decision making through predictive analytics across the organization without leaving the query editor.
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
BigdataConference Europe - BigQuery ML
1. Supercharge your data analytics with
BigQuery ML
November 2020
Márton Kodok / @martonkodok
Google Developer Expert at REEA.net
2. ● Among the Top3 romanians on Stackoverflow 175k reputation
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery + Redis database engine expert
Slideshare: martonkodok
Twitter: @martonkodok
StackOverflow: pentium10
GitHub: pentium10
Supercharge your data analytics with BigQuery ML @martonkodok
About me
3. 1. E-commerce Workloads and data models
2. What is BigQuery? - Data warehouse in the Cloud
3. Introduction to BigQuery ML - execute ML models using SQL
4. Practical use cases
5. Predict, recommend and forecastwith BigQuery ML
6. Conclusions
Agenda
Supercharge your data analytics with BigQuery ML @martonkodok
4. Shop - products, tagging, features, attributes
Users profile, preferences, favorites, rating, engagement
Customers orders, re-orders, profile, associated products, survey, feedback, 360°
Analytics metrics, event data, page hits, email campaigns, A/B split tests
Upsells recommendations, price tags, strategy, discounts, vouchers
Enriched data sku, sentiment analysis, image parsing, object recognition
E-commerce Workloads and data models
Supercharge your data analytics with BigQuery ML @martonkodok
5. Shop - products, tagging, features, attributes
Users profile, preferences, favorites, rating, engagement
Customers orders, re-orders, profile, associated products, survey, feedback, 360°
Analytics metrics, event data, page hits, email campaigns, A/B split tests
Upsells recommendations, price tags, strategy, discounts, vouchers
Enriched data sku, sentiment analysis, image parsing, object recognition
E-commerce Workloads and data models
Supercharge your data analytics with BigQuery ML @martonkodok
6. “ Where to store all these
rawdata?
Supercharge your data analytics with BigQuery ML @martonkodok
8. Analytics-as-a-Service - Data Warehouse in the Cloud
Familiar DB Structure (table, columns, views, struct, nested, JSON)
Decent pricing (storage: $20/TB cold: $10/TB,queries $5/TB) *Nov 2020
SQL 2011 + Javascript UDF (User Defined Functions)
BigQuery ML enables users to create machine learning models by SQL queries
Scales into Petabytes on Managed Infrastructure
Integrates with Cloud SQL + Cloud Storage + Sheets + Pub/Sub connectors
What is BigQuery?
Supercharge your data analytics with BigQuery ML @martonkodok
9. What is BigQuery’s Superpower?
Supercharge your data analytics with BigQuery ML @martonkodok
10. 1. Load from file - either local or from GCS (max 5TB each)
2. Streaming rows - event driven approach - high throughput 1M rows/sec
3. Functions - observer-trigger based (Google Cloud Functions)
4. Join with Cloud SQL - Ability to join with MySQL, Postgres
5. Pipelines - flexibility to do ETL - FluentD, Kafka, Google Dataflow
6. Export from connected services - Firestore, Billing, AuditLogs, Stackdriver
7. Firebase - Analytics - Messaging - Crashlytics - Perf. Monitoring - Predictions
Loading Data into BigQuery
Supercharge your data analytics with BigQuery ML @martonkodok
11. “ Capturing the data
Supercharge your data analytics with BigQuery ML @martonkodok
12. Data Pipeline Integration at REEA.net
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
FluentD
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Application
ServersServers
Cloud Storage
archive
Load
Export
Replay
Standard
Devices
HTTPS
Supercharge your data analytics with BigQuery ML @martonkodok
13. “ We have our app outside of GCP.
We need to join with our SQL database.
Solution: EXTERNAL_QUERY
Supercharge your data analytics with BigQuery ML @martonkodok
14. Combine on-premise with Cloud
App
Load
Balancing
NGINX
Compute Engine
10GB PD
2 1
Database Service (Master/Slave)
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
Compute Engine
10GB PD
4 1
BigQuery
Supercharge your data analytics with BigQuery ML @martonkodok
Zone 1
us-east1-a
Replica
Cloud SQL
Cloud
VPN
Gateway
Execute combined
queries
Report
15. EXTERNAL_QUERY: Run in BQ a query from Cloud SQL db
Supercharge your data analytics with BigQuery ML @martonkodok
18. BigQuery ML
1. CREATE MODEL in SQL to increase
development speed
2. Predict, recommend, foreast on tabular
data with SQL
3. Automate common ML tasks and
hyperparameter tuning by creating new
models as easy ascreatingtables
19. ● Binary or Multiclass logistic regression for classification (labels can have up to 50 unique values)
● K-means clustering for data segmentation (unsupervised learning - not require labels/training)
● Recommend with Matrix factorization
● Import TensorFlow models for prediction in BigQuery
● Time series forecasting with ARIMA - the sales of an item on a given day
● Boosted Tree for creating XGBoost | Deep Neural Network DNN models | AutoML tables
● and others...
Supported models in BigQuery ML
Supercharge your data analytics with BigQuery ML @martonkodok
20. Conversion/Purchase prediction MODEL: Logistic-Regression
Predict if a user “converts” or "purchases". It is in the company's interest if many users sign up for this
membership as it helps streamline their Ads convertion and also helps with recurring revenue.
Customer Lifetime Value (LTV) prediction. MODEL: Logistic-Regression
It is used by the organisations to identify and prioritizesignificantcustomersegments that would be most
valuable to the company.
Customer Segmentation MODEL: K-means clustering
dividing a client base into groups in specific ways relevanttomarketing, such as interestsandspending
habits. Segmentation allows marketers to better customize their efforts to various audience groups.
E-commerce Use Cases
Supercharge your data analytics with BigQuery ML @martonkodok
21. Create a MODELthat predicts whether a website visitor will make a transaction.
● CREATEMODEL statement
● TheML.EVALUATE function to evaluate the ML model
● TheML.PREDICTfunction to make predictions using the ML model
Getting started with BigQuery ML
Supercharge your data analytics with BigQuery ML @martonkodok
25. Use cases:
● Customer segmentation
● Data quality
Options and defaults
● Number of clusters: Default log10
(num_rows) clusters
● Distance type - Euclidean(default), Cosine
● Supports all major SQL data types including GIS
K-means clustering
Supercharge your data analytics with BigQuery ML @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “kmeans”)
AS SELECT..
FROM
ml.PREDICT maps rows to closest clusters
ml.CENTROID for cluster centroids
ml.EVALUATE
ml.TRAINING_INFO
ml.FEATURE_INFO
26. Available data:
● Encode yes/no features
(eg: has a microwave, has a kitchen, has a TV, has a bathroom)
● Can apply clustering on the encoded data
K-means clustering: Problem definition
Supercharge your data analytics with BigQuery ML @martonkodok
27. Premise
We can identify oddities
(potential data quality issues)
by grouping things together
and separating outliers.
K-means clustering: Problem definition
Supercharge your data analytics with BigQuery ML @martonkodok
28. Use cases:
● Product recommendation
● Marketing campaign target optimization tool
Options and defaults
● Input: User, Item, Rating
● Can use L2 regularization
● Specify training-test split (default random 80-20)
Matrix Factorization
Supercharge your data analytics with BigQuery ML @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “matrix_factorization”)
AS SELECT..
FROM
ml.RECOMMEND for full user-item matrix
ml.EVALUATE
ml.WEIGHTS
ml.TRAINING_INFO
ml.FEATURE_INFO
29. Available data:
● User
● Item
● Rating
Problem
● assigning values for previously unknown values
(zeros in our case)
Matrix Factorization: Problem definition
Supercharge your data analytics with BigQuery ML @martonkodok
30. BigQuery ML - Matrix Factorization
Supercharge your data analytics with BigQuery ML @martonkodok
CREATE MODEL wr_temp.purchases_mf_model
options(model_type= 'matrix_factorization' )
as
SELECT user,item,rating FROM `wr_temp.purchases`;
SELECT * FROM
ML.RECOMMEND(MODEL wr_temp.purchases_mf_model);
Step 1
Create a model from a dataset.
Step 2
To view the rating associated with a
given user-item pair, use
ML.RECOMMEND with the model name.
The output will return a rating
for each user-item pair.
31. Use cases:
● All sort of time series data forecast
● Marketing campaign target optimization tool
Options and defaults
● Holiday effects adjustments by Region
● Seasonal and trend decomposition
● Auto data frequency detection
Time Series forecasting with ARIMA model
Supercharge your data analytics with BigQuery ML @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type = “ARIMA”)
AS SELECT..
ml.FORECAST to be use with HORIZON
ml.EVALUATE
ml.ARIMA_COEFFICIENTS
32. Available data:
● Past Timestamp
● Past Value
Problem
● Forecasts for next X slots (called horizon)
Time Series forecasting with ARIMA model
Supercharge your data analytics with BigQuery ML @martonkodok
SELECT forecast_timestamp, forecast_value FROM
ML.FORECAST(MODEL bqml_tutorial.nyc_citibike_arima_model,
STRUCT(300 AS horizon, 0.8 AS confidence_level))
33. Use cases:
● Easily add TensorFlow predictions to BigQuery
● Build unstructured data models in TensorFlow,
predict in BigQuery
Key restrictions
● Model size limit of 250MB
Import TensorFlow models for prediction
Supercharge your data analytics with BigQuery ML @martonkodok
CREATE MODEL yourmodel
OPTIONS (model_type =“tensorflow”,
Model_path =’gs://’)
ml.PREDICT()
DEMO
Search 'QueryIt Smart' on GitHub to learn more.
34. Google Drive - Collaboratory - Jupyter Notebook
Supercharge your data analytics with BigQuery ML @martonkodok
35. New on BigQuery UI - Evaluation charts
Supercharge your data analytics with BigQuery ML @martonkodok
37. Automation
● Run the process daily
● Determine hyperparameters
● Surface the results and route them somewhere for inspection and improvement
Testing
● AB test around impact of data quality on conversion and customer NPS (net promoter score)
Improvements
● Determine, and explore outliers
● Repeat, automate
Considerations
Supercharge your data analytics with BigQuery ML @martonkodok
38. ● Democratizes the use of ML by empowering data analysts to build and run models using existing
business intelligence tools and spreadsheets
● Generalist team. Models are trained using SQL. There is no need to program an ML solution using
Python or Java.
● Increases the innovation and speed of model development by removing the need to export data from
the data warehouse.
● A Model serves a purpose. Easy to change/recycle.
Benefits of BigQuery ML
Supercharge your data analytics with BigQuery ML @martonkodok
39. The possibilities are endless
Supercharge your data analytics with BigQuery ML @martonkodok
Marketing Retail IndustrialandIoT Media/gaming
Predict customer value
Predict funnel conversion
Personalize ads, email,
webpage content
Optimize inventory
Forecast revenue
Enable product
recommendations
Optimize staff promotions
Forecast demand for
parking, traffic utilities,
personnel
Prevent equipment
downtime
Predict maintenance needs
Personalize content
Predict game difficulty
Predict player lifetime value
40. Thank you.
Slides available on:
slideshare.net/martonkodok
Reea.net - Integrated web solutions driven by creativity
to deliver projects.