SlideShare a Scribd company logo
1 of 24
Download to read offline
Google BigQuery
About myself
Matthias Feys
work @Datatonic:
- big data (with Google Cloud)
- machine learning
- data visualizations (Tableau/Spotfire)
Google Qualified Cloud Developer
contact:
- @FsMatt
- matthias@datatonic.com
About Datatonic
Datatonic is a team of data science experts that help corporations unleash
the power of data. They use Google Cloud Platform, data visualisation
technologies (like Tableau or Spotfire) and machine learning to build
breakthrough solutions. Either as expert advisors, or built as a fully
managed solution with support end-to-end (A3S).
Some references:
@teamdatatonic
● What is BigQuery?
● How does it scale/work?
● How does it compare to:
- NoSQL datastores
- MapReduce
● Demo
● Pricing Model
● Best Practices
This talk
What is BigQuery?
“BigQuery is a fully-managed
and cloud-based interactive
query service for
massive datasets.”
It’s the externalization
of Dremel, one of
Google’s core
technologies
What is BigQuery? (2)
BigQuery Service is available via:
● Web UI (bigquery.cloud.google.com)
● console (gcloud)
● API (+ client libraries)
● external tools (Tableau, Excel, …)
● ODBC connector
How Does it Scale?
Dremel Architecture
Data Model/Storage:
- Columnar Storage
- Nested/Repeated Fields
- No Index!
-> Single Full Table Scan (from disk)
Query Execution:
- Tree Architecture
- Using tens of thousands
machines over fast Google
network (+1Petabit/s)
Columnar Storage
● Traffic minimization:
○ only read selected
columns
● Higher Compression Ratio:
○ Similar values in the
same column
○ From 1:3 → 1:10
Tree Architecture
- root server:
->receives query + reads table metadata
->rewrites the query(s)
->sends queries to the next level
<-returns final query results
- intermediate servers:
->(similar steps)
<-parallel partial aggregation
- leaf servers:
->actually scan (parts) of the table
<-send data to intermediate servers
NoSQL Datastore vs. BigQuery?
NoSQL Datastore
● Index based
(expected queries)
● Read-write
BigQuery
● Non-index based
(ad hoc queries)
● Read-only (append-
only)
MapReduce vs. BigQuery?
MapReduce
● High latency
● Flexible (complex)
batch processing
● Unstructured data
BigQuery
● Low latency
● SQL-like queries
● Structured data
Demo’s
Demo’s
Pricing Model
Category Price Note
Storage Cost $0.020 per GB, per
month
Query Cost $5 per TB 1st TB per month
is free
Best Practices
Denormalize / Pre-Join Where Possible
● Best performance
● Only pay for the columns you need
● Nested/repeated fields!
Relational Database Design Denormalized Nested/Repeated (JSON)
Table Sharding
● You pay for what you read
→ Read less, pay less
● Table wildcards allow for easy reading over multiple tables
https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
Optimize for Query vs. Storage Costs
Common Queries?
- Materialized views
(save intermediate results in tables)
with pre-aggregated data:
→ faster + cheaper queries
- Store data in multiple tables:
- table for daily data
- table for weekly data
- table for monthly data
Narrow the Table Scans
You only pay for the columns you read
Don’t use “SELECT *” !!!
Table Decorators
Only way to avoid doing full table scans!
Allows undeleting tables
options:
● snapshot decorator + range decorator
● relative value + absolute values
https://cloud.google.com/bigquery/table-decorators
Query optimizations
Query Plan
https://cloud.google.com/bigquery/query-plan-explanation
Big Data Reference Architecture
Questions?
You can reach me at:
- mail: matthias@datatonic.com
- Twitter: @FsMatt

More Related Content

What's hot

BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptxVikRam S
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQueryRyuji Tamagawa
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementationSimon Su
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introductionmattcasters
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettDaniel Zivkovic
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data EngineeringHadi Fadlallah
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...Márton Kodok
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 

What's hot (20)

BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
You might be paying too much for BigQuery
You might be paying too much for BigQueryYou might be paying too much for BigQuery
You might be paying too much for BigQuery
 
BigQuery implementation
BigQuery implementationBigQuery implementation
BigQuery implementation
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Pentaho Data Integration Introduction
Pentaho Data Integration IntroductionPentaho Data Integration Introduction
Pentaho Data Integration Introduction
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha JarettRetail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
Retail Analytics and BI with Looker, BigQuery, GCP & Leigha Jarett
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 

Similar to Google BigQuery

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBMarco Segato
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauMongoDB
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyGuillaume Lefranc
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform DATAVERSITY
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Ido Green
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 

Similar to Google BigQuery (20)

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDBMongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
MongoDB .local Houston 2019: Wide Ranging Analytical Solutions on MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with TableauWebinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB AtlasMongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
MongoDB World 2019: Packing Up Your Data and Moving to MongoDB Atlas
 
NoSQL Solutions - a comparative study
NoSQL Solutions - a comparative studyNoSQL Solutions - a comparative study
NoSQL Solutions - a comparative study
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform Estimating the Total Costs of Your Cloud Analytics Platform 
Estimating the Total Costs of Your Cloud Analytics Platform 
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)Big Query - Women Techmarkers (Ukraine - March 2014)
Big Query - Women Techmarkers (Ukraine - March 2014)
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Google BigQuery

  • 2. About myself Matthias Feys work @Datatonic: - big data (with Google Cloud) - machine learning - data visualizations (Tableau/Spotfire) Google Qualified Cloud Developer contact: - @FsMatt - matthias@datatonic.com
  • 3. About Datatonic Datatonic is a team of data science experts that help corporations unleash the power of data. They use Google Cloud Platform, data visualisation technologies (like Tableau or Spotfire) and machine learning to build breakthrough solutions. Either as expert advisors, or built as a fully managed solution with support end-to-end (A3S). Some references: @teamdatatonic
  • 4. ● What is BigQuery? ● How does it scale/work? ● How does it compare to: - NoSQL datastores - MapReduce ● Demo ● Pricing Model ● Best Practices This talk
  • 5. What is BigQuery? “BigQuery is a fully-managed and cloud-based interactive query service for massive datasets.” It’s the externalization of Dremel, one of Google’s core technologies
  • 6. What is BigQuery? (2) BigQuery Service is available via: ● Web UI (bigquery.cloud.google.com) ● console (gcloud) ● API (+ client libraries) ● external tools (Tableau, Excel, …) ● ODBC connector
  • 7. How Does it Scale?
  • 8. Dremel Architecture Data Model/Storage: - Columnar Storage - Nested/Repeated Fields - No Index! -> Single Full Table Scan (from disk) Query Execution: - Tree Architecture - Using tens of thousands machines over fast Google network (+1Petabit/s)
  • 9. Columnar Storage ● Traffic minimization: ○ only read selected columns ● Higher Compression Ratio: ○ Similar values in the same column ○ From 1:3 → 1:10
  • 10. Tree Architecture - root server: ->receives query + reads table metadata ->rewrites the query(s) ->sends queries to the next level <-returns final query results - intermediate servers: ->(similar steps) <-parallel partial aggregation - leaf servers: ->actually scan (parts) of the table <-send data to intermediate servers
  • 11. NoSQL Datastore vs. BigQuery? NoSQL Datastore ● Index based (expected queries) ● Read-write BigQuery ● Non-index based (ad hoc queries) ● Read-only (append- only)
  • 12. MapReduce vs. BigQuery? MapReduce ● High latency ● Flexible (complex) batch processing ● Unstructured data BigQuery ● Low latency ● SQL-like queries ● Structured data
  • 15. Pricing Model Category Price Note Storage Cost $0.020 per GB, per month Query Cost $5 per TB 1st TB per month is free
  • 17. Denormalize / Pre-Join Where Possible ● Best performance ● Only pay for the columns you need ● Nested/repeated fields! Relational Database Design Denormalized Nested/Repeated (JSON)
  • 18. Table Sharding ● You pay for what you read → Read less, pay less ● Table wildcards allow for easy reading over multiple tables https://cloud.google.com/bigquery/query-reference#tablewildcardfunctions
  • 19. Optimize for Query vs. Storage Costs Common Queries? - Materialized views (save intermediate results in tables) with pre-aggregated data: → faster + cheaper queries - Store data in multiple tables: - table for daily data - table for weekly data - table for monthly data
  • 20. Narrow the Table Scans You only pay for the columns you read Don’t use “SELECT *” !!!
  • 21. Table Decorators Only way to avoid doing full table scans! Allows undeleting tables options: ● snapshot decorator + range decorator ● relative value + absolute values https://cloud.google.com/bigquery/table-decorators
  • 23. Big Data Reference Architecture
  • 24. Questions? You can reach me at: - mail: matthias@datatonic.com - Twitter: @FsMatt