Google BigQuery

•

11 likes•9,785 views

Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)

Technology

About myself
Matthias Feys
work @Datatonic:
- big data (with Google Cloud)
- machine learning
- data visualizations (Tableau/Spotfire)
Google Qualified Cloud Developer
contact:
- @FsMatt
- matthias@datatonic.com

About Datatonic
Datatonic is a team of data science experts that help corporations unleash
the power of data. They use Google Cloud Platform, data visualisation
technologies (like Tableau or Spotfire) and machine learning to build
breakthrough solutions. Either as expert advisors, or built as a fully
managed solution with support end-to-end (A3S).
Some references:
@teamdatatonic

● What is BigQuery?
● How does it scale/work?
● How does it compare to:
- NoSQL datastores
- MapReduce
● Demo
● Pricing Model
● Best Practices
This talk

What is BigQuery?
“BigQuery is a fully-managed
and cloud-based interactive
query service for
massive datasets.”
It’s the externalization
of Dremel, one of
Google’s core
technologies

What is BigQuery? (2)
BigQuery Service is available via:
● Web UI (bigquery.cloud.google.com)
● console (gcloud)
● API (+ client libraries)
● external tools (Tableau, Excel, …)
● ODBC connector

Dremel Architecture
Data Model/Storage:
- Columnar Storage
- Nested/Repeated Fields
- No Index!
-> Single Full Table Scan (from disk)
Query Execution:
- Tree Architecture
- Using tens of thousands
machines over fast Google
network (+1Petabit/s)

Columnar Storage
● Traffic minimization:
○ only read selected
columns
● Higher Compression Ratio:
○ Similar values in the
same column
○ From 1:3 → 1:10

Tree Architecture
- root server:
->receives query + reads table metadata
->rewrites the query(s)
->sends queries to the next level
<-returns final query results
- intermediate servers:
->(similar steps)
<-parallel partial aggregation
- leaf servers:
->actually scan (parts) of the table
<-send data to intermediate servers

NoSQL Datastore vs. BigQuery?
NoSQL Datastore
● Index based
(expected queries)
● Read-write
BigQuery
● Non-index based
(ad hoc queries)
● Read-only (append-
only)

MapReduce vs. BigQuery?
MapReduce
● High latency
● Flexible (complex)
batch processing
● Unstructured data
BigQuery
● Low latency
● SQL-like queries
● Structured data

Pricing Model
Category Price Note
Storage Cost $0.020 per GB, per
month
Query Cost $5 per TB 1st TB per month
is free

Denormalize / Pre-Join Where Possible
● Best performance
● Only pay for the columns you need
● Nested/repeated fields!
Relational Database Design Denormalized Nested/Repeated (JSON)

Table Sharding
● You pay for what you read
→ Read less, pay less
● Table wildcards allow for easy reading over multiple tables
https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions

Optimize for Query vs. Storage Costs
Common Queries?
- Materialized views
(save intermediate results in tables)
with pre-aggregated data:
→ faster + cheaper queries
- Store data in multiple tables:
- table for daily data
- table for weekly data
- table for monthly data

Narrow the Table Scans
You only pay for the columns you read
Don’t use “SELECT *” !!!

Table Decorators
Only way to avoid doing full table scans!
Allows undeleting tables
options:
● snapshot decorator + range decorator
● relative value + absolute values
https://cloud.google.com/bigquery/table-decorators

Query optimizations
Query Plan
https://cloud.google.com/bigquery/query-plan-explanation

Questions?
You can reach me at:
- mail: matthias@datatonic.com
- Twitter: @FsMatt

What's hot

BigQuery walk through.pptxVikRam S

You might be paying too much for BigQueryRyuji Tamagawa

BigQuery implementationSimon Su

Google BigQuery - Features & BenefitsAndreas Raible

Free Training: How to Build a LakehouseDatabricks

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez

Redshift VS BigQueryKostas Pardalis

Pentaho Data Integration Introductionmattcasters

Data MeshPiethein Strengholt

Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettDaniel Zivkovic

Big data by Mithlesh sadhMithlesh Sadh

Introduction to Data EngineeringHadi Fadlallah

A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk

BigQuery best practices and recommendations to reduce costs with BI Engine, S...Márton Kodok

Databricks Platform.pptxAlex Ivy

DW Migration Webinar-March 2022.pptxDatabricks

Google Cloud DataflowAlex Van Boxel

Big data pptOECLIB Odisha Electronics Control Library

Big data analyticsVikram Nandini

What's hot (20)

BigQuery walk through.pptx

You might be paying too much for BigQuery

BigQuery implementation

Google BigQuery - Features & Benefits

Free Training: How to Build a Lakehouse

Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...

Redshift VS BigQuery

Pentaho Data Integration Introduction

Data Mesh

Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett

Big data by Mithlesh sadh

Introduction to Data Engineering

A Thorough Comparison of Delta Lake, Iceberg and Hudi

Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop

BigQuery best practices and recommendations to reduce costs with BI Engine, S...

Databricks Platform.pptx

DW Migration Webinar-March 2022.pptx

Google Cloud Dataflow

Big data ppt

Big data analytics

Similar to Google BigQuery

Exploring BigData with Google BigQueryDharmesh Vaya

SQL vs NoSQL, an experiment with MongoDBMarco Segato

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok

GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok

MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB

MongodbApurva Vyas

Relational databases vs Non-relational databasesJames Serra

Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB

The Six pillars for Building big data analytics ecosystemstaimur hafeez

Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit

Workshop on Google Cloud Data PlatformGoDataDriven

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB

NoSQL Solutions - a comparative studyGuillaume Lefranc

Gcp data engineerNarendranath Reddy T

QuerySurge Slide Deck for Big Data Testing WebinarRTTS

Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY

A data analyst view of Bigdata Venkata Reddy Konasani

BigData Hadoop Kumari Surabhi

Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green

GCP Data Engineer cheatsheetGuang Xu

Similar to Google BigQuery (20)

Exploring BigData with Google BigQuery

SQL vs NoSQL, an experiment with MongoDB

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...

GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery

MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB

Mongodb

Relational databases vs Non-relational databases

Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau

The Six pillars for Building big data analytics ecosystems

Challenges of Implementing an Advanced SQL Engine on Hadoop

Workshop on Google Cloud Data Platform

MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas

NoSQL Solutions - a comparative study

Gcp data engineer

QuerySurge Slide Deck for Big Data Testing Webinar

Estimating the Total Costs of Your Cloud Analytics Platform 

A data analyst view of Bigdata

BigData Hadoop

Big Query - Women Techmarkers (Ukraine - March 2014)

GCP Data Engineer cheatsheet

Recently uploaded

AI as an Interface for Commercial BuildingsMemoori

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Install Stable Diffusion in windows machinePadma Pradeep

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Key Features Of Token Development (1).pptxLBM Solutions

Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4

Recently uploaded (20)

AI as an Interface for Commercial Buildings

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

The 7 Things I Know About Cyber Security After 25 Years | April 2024

Install Stable Diffusion in windows machine

Maximizing Board Effectiveness 2024 Webinar.pptx

Pigging Solutions in Pet Food Manufacturing

Breaking the Kubernetes Kill Chain: Host Path Mount

A Domino Admins Adventures (Engage 2024)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

My Hashitalk Indonesia April 2024 Presentation

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

08448380779 Call Girls In Civil Lines Women Seeking Men

Salesforce Community Group Quito, Salesforce 101

Injustice - Developers Among Us (SciFiDevCon 2024)

Key Features Of Token Development (1).pptx

Azure Monitor & Application Insight to monitor Infrastructure & Application

Google BigQuery

1. Google BigQuery

2. About myself Matthias Feys work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire) Google Qualified Cloud Developer contact: - @FsMatt - matthias@datatonic.com

3. About Datatonic Datatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S). Some references: @teamdatatonic

4. ● What is BigQuery? ● How does it scale/work? ● How does it compare to: - NoSQL datastores - MapReduce ● Demo ● Pricing Model ● Best Practices This talk

5. What is BigQuery? “BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.” It’s the externalization of Dremel, one of Google’s core technologies

6. What is BigQuery? (2) BigQuery Service is available via: ● Web UI (bigquery.cloud.google.com) ● console (gcloud) ● API (+ client libraries) ● external tools (Tableau, Excel, …) ● ODBC connector

7. How Does it Scale?

8. Dremel Architecture Data Model/Storage: - Columnar Storage - Nested/Repeated Fields - No Index! -> Single Full Table Scan (from disk) Query Execution: - Tree Architecture - Using tens of thousands machines over fast Google network (+1Petabit/s)

9. Columnar Storage ● Traffic minimization: ○ only read selected columns ● Higher Compression Ratio: ○ Similar values in the same column ○ From 1:3 → 1:10

10. Tree Architecture - root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results - intermediate servers: ->(similar steps) <-parallel partial aggregation - leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers

11. NoSQL Datastore vs. BigQuery? NoSQL Datastore ● Index based (expected queries) ● Read-write BigQuery ● Non-index based (ad hoc queries) ● Read-only (append- only)

12. MapReduce vs. BigQuery? MapReduce ● High latency ● Flexible (complex) batch processing ● Unstructured data BigQuery ● Low latency ● SQL-like queries ● Structured data

13. Demo’s

14. Demo’s

15. Pricing Model Category Price Note Storage Cost $0.020 per GB, per month Query Cost $5 per TB 1st TB per month is free

16. Best Practices

17. Denormalize / Pre-Join Where Possible ● Best performance ● Only pay for the columns you need ● Nested/repeated fields! Relational Database Design Denormalized Nested/Repeated (JSON)

18. Table Sharding ● You pay for what you read → Read less, pay less ● Table wildcards allow for easy reading over multiple tables https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions

19. Optimize for Query vs. Storage Costs Common Queries? - Materialized views (save intermediate results in tables) with pre-aggregated data: → faster + cheaper queries - Store data in multiple tables: - table for daily data - table for weekly data - table for monthly data

20. Narrow the Table Scans You only pay for the columns you read Don’t use “SELECT *” !!!

21. Table Decorators Only way to avoid doing full table scans! Allows undeleting tables options: ● snapshot decorator + range decorator ● relative value + absolute values https://cloud.google.com/bigquery/table-decorators

22. Query optimizations Query Plan https://cloud.google.com/bigquery/query-plan-explanation

23. Big Data Reference Architecture

24. Questions? You can reach me at: - mail: matthias@datatonic.com - Twitter: @FsMatt

Google BigQuery

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Google BigQuery

Similar to Google BigQuery (20)

Recently uploaded

Recently uploaded (20)

Google BigQuery