Getting started with BigQuery

Getting started with BigQuery
Pradeep Bhadani
Founder, Cloud Native Technologies
cntek.io
pbhadani.com
linkedin.com/in/pradeepbhadani
linkedin.com/company/cloudnativetech
22nd August 2020, Google Next OnAir Extended

About Me
IT Consultant with 9 years of experience in Big Data, Cloud & DevOps
GDE (Google Developers Expert) - Cloud
Google Cloud Authorized Trainer
HashiCorp Ambassador
Blog: pbhadani.com
Cloud Native Technologiescntek.io

Services
● Big Data Consultancy
● Cloud & DevOps Consultancy
● Tailored Training and Workshops

Agenda
● Overview
○ What is a Data Warehouse?
○ Choosing a Data Warehouse Option?
● Introduction to BigQuery
○ What is BigQuery?
○ Why BigQuery?
○ Concepts
● Best Practices
● Interacting with BigQuery
● Demo

Data Warehouse

What is a Data Warehouse?
A data warehouse is a critical component in Business Intelligence
solution which enables an organization to make a better decision.
Data warehouse offers:
● Scheduled & ad-hoc reporting
● Ad-hoc analysis
● Integrates with Visualization tools

Data Warehouse options?
Source:commons.wikimedia.org
iconfinder.com

Choosing a Data Warehouse?

BigQuery

What is BigQuery?
BigQuery is a fully-managed enterprise-grade modern data warehouse
offering on Google Cloud Platform.
cloud.google.com/bigquery

Why BigQuery?
Serverless Fast SQL Security Scalable
Data
Encryption
Managed
Storage
Flexible
Pricing
Advanced
Features

Advanced Features
BigQueryML BigQuery GIS
BigQuery Omni
(private alpha)
DataQnA
(private alpha)

Architecture

Columnar based storage
Row based Storage Column based Storage

Decoupled Storage & Compute
Storage ComputePetabit Network

Resources
● An Inside Look at Google BigQuery
https://cloud.google.com/files/BigQueryTechnicalWP.pdf
● Dremel
static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf

Concepts

GCP Project is a top-level logical container to organize all the Google Cloud
Platform resources like Storage, BigQuery.
GCP Project
GCP Project

Logical container to organize the BigQuery tables.
BigQuery Datasets
GCP Project
Dataset A Dataset B

BigQuery tables contains the data and the schema that describe the data.
<project_id>.<dataset_id>.<table>
BigQuery Tables
Table 2
GCP Project
Dataset A Dataset B
Table 1
Table 2
Table 1
Table 2

● Native Tables
● External Tables
● Views
BigQuery Tables types
GCP Project
BQ Dataset
BQ Tables

A BigQuery slot is a combination of CPU, memory and network resources.
BigQuery automatically calculates the number of slots required to execute a
query based on query size and complexity.
Slots

● Interactive queries — 100 concurrent queries
● Query execution time limit — 6 hours
● Load jobs per table per day — 1,500 (including failures)
● Maximum columns per table — 10,000
● Copy jobs per destination table per day — 1,000 (including failures)
● Number of datasets per project — No limit
● Number of tables per dataset — No limit
● Maximum number of table operations per day — 1,500
● Maximum number of partitions per partitioned table — 4,000
Please refer cloud.google.com/bigquery/quotas for latest service limits
Service Limits

● On-Demand
○ $5 per TB
○ First 1TB per month is free
● Flat Rate
○ Monthly - $2000 per 100 slots
○ Annual - $1700 per 100 slots
Please refer cloud.google.com/bigquery/pricing for latest Pricing
Pricing

Interacting with
BigQuery

Ways to interact with BigQuery
● Web UI - Cloud Console, Classic UI
● Command Line - bq
● Client Libraries - Go, Python, Java, etc.
● Third-party tools

Web UI

Command Line tool

Client Libraries

Best Practices

● Avoid “SELECT *”
● Use of Partitions
● Denormalization
● Use wildcards on tables appropriately
● Use external data source appropriately
● Reduce the amount of data before JOIN
● Avoid repetitive data transformation using SQL Queries
● Use Nested and Repeated ﬁelds
Query Performance

● Use table expiration
● Avoid data duplication
● Avoid full table scan
● Only scan required columns
● Use caching feature
● Use of Partitions
● Use of Clustering
Cost Optimization

Demo
Photo by Markus Spiske on UnsplashPhoto by Alex Litvin on Unsplash

Image by TeroVesalainen from Pixabay
pbhadani.com
pradeepbhadani
pradeepbhadani
bhadanipradeep
bit.ly/cntek-youtube

cntek.io
CloudNativeTech
CloudNativeTech
cntekio
bit.ly/cntek-youtube

Getting started with BigQuery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Getting started with BigQuery

Similar to Getting started with BigQuery (20)

More from Pradeep Bhadani

More from Pradeep Bhadani (8)

Recently uploaded

Recently uploaded (20)

Getting started with BigQuery