SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
Google BigQuery
for the Everyday Developer
Márton Kodok
Senior Software Engineer at REEA
twitter: martonkodok stackoverflow: pentium10 github: @pentium10
IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
1. Big Data Challenges
2. Infrastructure overview
3. Choosing a Strategy
4. BigQuery - Use Cases
5. Q & A
BigQuery for the Everyday Developer @martonkodok
Agenda
BigQuery for the Everyday Developer @martonkodok
The 5 V’s of Big Data
Every scientist who needs big data analytics to save millions of lives
should have that power
Legacy systems don’t provide the power.
The simple fact is that you are brilliant but your brilliant ideas require
complex analytics.
Traditional solutions are not applicable.
BigQuery for the Everyday Developer @martonkodok
Big Data movement
With more and more people realizing the value of realtime
analytics, the race to build realtime middleware has begun.
The Plan: have oversight over developments as they happen.
* More slides about Beanstalkd at slideshare.net/martonkodok
BigQuery for the Everyday Developer @martonkodok
Fast computing systems
BigQuery for the Everyday Developer @martonkodok
Simple legacy Website stack/infrastructure
BigQuery for the Everyday Developer @martonkodok
Infrastructure with Middleware
Task: Need backend/database for STORE, QUERY, EXTRACT deep analytics:
● Events (website events, app events, email events)
● Entities (user data, business entities, A/B split test data)
● Metrics (sales data, code commit history, profiler data)
● Streaming (sensor data, IoT data burst)
● 3rd party Analytics (Google Analytics, Mixpanel)
● Local generated data (log file entries, trace output, unstructured data...)
❏ Run Ad-Hoc queries (without Developer)
❏ Be realtime
BigQuery for the Everyday Developer @martonkodok
Requirements
● Terabyte scalable storage
● Real-time row ingestion
● Ask sophisticated queries
● Query-performance
● Low-maintenance
● Cost effective
● Wire them up easily
Goal: Store everything accessible by SQL immediately.
BigQuery for the Everyday Developer @martonkodok
Desired system/platform Equipment strategy
* people still required
Engines:
❏ MongoDB, Riak, Redis
❏ ELK Stack (Elastic-Logstash-Kibana)...
❏ Cassandra, Hive, Hadoop...
❏ Amazon RedShift, Google BigQuery...
In-House Hosted Managed
BigQuery for the Everyday Developer @martonkodok
Google BigQuery
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing
● 100.000 rows / sec Streaming API
● Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
● Familiar DB Structure (table, views, record, nested, JSON)
● Convenience of SQL + Javascript UDF (User Defined Functions)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Client libraries available in YFL (your favorite languages)
BigQuery for the Everyday Developer @martonkodok
What is BigQuery?
● ACL - row level locking (individual or group based)
● Columnar storage (max 10 000 columns in table)
● Rich SQL: JSON, IP, Math, RegExp, Window functions
● Data types: String, Integer, Float, Boolean, Timestamp,
Record, Nested, Struct, Array.
● Append-only tables prefered (DML syntax available)
● Batch load file size limits: 1TB (CSV or JSON)
● User Defined Functions in SQL or Javascript
● Date partitioned tables
BigQuery for the Everyday Developer @martonkodok
BigQuery: Convenience of SQL
* 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000
Queries Storage Ingestion
➔ 1 TB per month free
➔ 5 USD per TB
➔ only pay for the columns you use
in your query
➔ 20 USD per TB frequently accessed
data
➔ 10 USD per TB long term storage
90 days
➔ Batch load free (CSV/JSON)
➔ Exporting free
➔ Table copy free
➔ 50 USD per TB
Estimate 1
- Storage 5 TB
- Streaming Inserts 1 TB
- Queries 3 TB
Monthly total: $165
Estimate 2
- Storage 24 TB
- Streaming Inserts 1 TB
- Queries 50 TB
Monthly total: $788
BigQuery for the Everyday Developer @martonkodok
BigQuery Costs - October 2016
BigQuery for the Everyday Developer @martonkodok
+--------------------------+-----------+----------+
| order_id | INTEGER | REQUIRED |
| ... | | | Example:
| products | RECORD | REPEATED | ”products”:[p1,p2]
| products.name | STRING | NULLABLE |
| products.product_id | INTEGER | NULLABLE | ”products”:[{”name”:”p1”,
| products.attributes | STRING | REPEATED | ”product_id”:10,
| products.price | FLOAT | NULLABLE | ”attributes”:[”red”,”xl”]
| ... | | | ,”price”:9.99},
| common | RECORD | NULLABLE | {”name”:”p2”,
| common.insert_id | INTEGER | REQUIRED | ”product_id”:20,
| common.event | INTEGER | REQUIRED | ”attributes”:[“red”,”xl”]
| common.user_id | INTEGER | REQUIRED | ,”price”:9.99}]
| common.timestamp | TIMESTAMP | REQUIRED |
| common.utm | RECORD | NULLABLE |
| common.utm.source | STRING | NULLABLE |
| common.utm.medium | STRING | NULLABLE |
| common.utm.campaign | STRING | NULLABLE |
| common.utm.content | STRING | NULLABLE |
| common.utm.term | STRING | NULLABLE |
| meta | STRING | NULLABLE |
+--------------------------+-----------+----------+
Schema modelling / JSON
1. Use aggregation MIN/MAX on timestamp to find first/last and join back to the same table.
2. Use analytic functions FIRST_VALUE and LAST_VALUE when only one column is needed
SELECT LAST_VALUE(email) OVER(
PARTITION BY user_id
ORDER BY timestamp ASC) AS email_last ...
3. Using Window Functions for full row selection
SELECT email, firstname, lastname
FROM
(SELECT email, firstname, lastname
row_number() over (partition BY user_id
ORDER BY timestamp DESC) seqnum
FROM [profile_event]
)
WHERE seqnum=1
BigQuery for the Everyday Developer @martonkodok
Append only tables - Get last value
BigQuery for the Everyday Developer @martonkodok
Streaming insert time (ms) - last year
● Funnel Analysis
BigQuery for the Everyday Developer @martonkodok
Achievements
BigQuery for the Everyday Developer @martonkodok
Funnel analysis: Time on upsell pages
Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
BigQuery for the Everyday Developer @martonkodok
Attribute credit to first article visited on purchase
● Funnel Analysis
● Email URL click heatmap
BigQuery for the Everyday Developer @martonkodok
Achievements
BigQuery for the Everyday Developer @martonkodok
Email URL clicks heat-map
● Funnel Analysis
● Email URL click heatmap
● Email Health Dashboard (SPAM, ISP deferral, content
A/B split tests, trends or low open rate campaigns)
● Advanced segmentation (all raw data stored)
● Behavioral analytics - engaged users etc...
BigQuery for the Everyday Developer @martonkodok
Achievements Continued
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
● pay only the columns we have in your queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data.
BigQuery for the Everyday Developer @martonkodok
Our benefits
1. githubarchive.org: 20+ event types available since 2012
a. pull request latency
b. expressions, emotions in commit messages
c. etc...
2. httparchive.org
a. sites that use multiple popular JS frameworks
b. sites that use multiple jQuery versions
3. Export for Google Analytics
4. MLab - broadband connection performance (26B rows)
5. GSOD - samples of weather (rainfall, temp…)
6. US Natality - 1969 to 2008
7. Wikipedia edits
BigQuery for the Everyday Developer @martonkodok
BigQuery: Sample projects to try out
BigQuery for the Everyday Developer @martonkodok
HttpArchive - multiple JS frameworks
BigQuery for the Everyday Developer @martonkodok
HttpArchive - multiple jQuery versions
Questions?
Thank you.
Slides available soon on: slideshare.net/martonkodok

Contenu connexe

Tendances

bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptxHarissh16
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptxVikRam S
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Yohei Onishi
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...Márton Kodok
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for ExperimentationGleb Kanterov
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLIntroduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLMárton Kodok
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters MongoDB
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composerBruce Kuo
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platformrajdeep
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkIlya Ganelin
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021StreamNative
 

Tendances (20)

Big query
Big queryBig query
Big query
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Google BigQuery
Google BigQueryGoogle BigQuery
Google BigQuery
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)Building a Data Pipeline using Apache Airflow (on AWS / GCP)
Building a Data Pipeline using Apache Airflow (on AWS / GCP)
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...BigQuery best practices and recommendations to reduce costs with BI Engine, S...
BigQuery best practices and recommendations to reduce costs with BI Engine, S...
 
Using ClickHouse for Experimentation
Using ClickHouse for ExperimentationUsing ClickHouse for Experimentation
Using ClickHouse for Experimentation
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Introduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQLIntroduction to Databases - query optimizations for MySQL
Introduction to Databases - query optimizations for MySQL
 
Sizing MongoDB Clusters
Sizing MongoDB Clusters Sizing MongoDB Clusters
Sizing MongoDB Clusters
 
From airflow to google cloud composer
From airflow to google cloud composerFrom airflow to google cloud composer
From airflow to google cloud composer
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Google cloud platform
Google cloud platformGoogle cloud platform
Google cloud platform
 
How to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They WorkHow to Actually Tune Your Spark Jobs So They Work
How to Actually Tune Your Spark Jobs So They Work
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 

Similaire à Google BigQuery for Everyday Developer

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupMárton Kodok
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryMárton Kodok
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQueryMárton Kodok
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMárton Kodok
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
MongoDB@sfr.fr
MongoDB@sfr.frMongoDB@sfr.fr
MongoDB@sfr.frbeboutou
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQueryWilliam M. Cohee
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Christopher Gutknecht
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysWout Scheepers
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Pythonwesley chun
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays
 
When it all GOes right
When it all GOes rightWhen it all GOes right
When it all GOes rightPavlo Golub
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 

Similaire à Google BigQuery for Everyday Developer (20)

VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Complex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch WarmupComplex realtime event analytics using BigQuery @Crunch Warmup
Complex realtime event analytics using BigQuery @Crunch Warmup
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
DevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQueryDevTalks Keynote Powering interactive data analysis with Google BigQuery
DevTalks Keynote Powering interactive data analysis with Google BigQuery
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB@sfr.fr
MongoDB@sfr.frMongoDB@sfr.fr
MongoDB@sfr.fr
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)Building Data Products with BigQuery for PPC and SEO (SMX 2022)
Building Data Products with BigQuery for PPC and SEO (SMX 2022)
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Data Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM ExellysData Architecture at Vente-Exclusive.com - TOTM Exellys
Data Architecture at Vente-Exclusive.com - TOTM Exellys
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botifyapidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
 
When it all GOes right
When it all GOes rightWhen it all GOes right
When it all GOes right
 
Frappe Open Day - August 2018
Frappe Open Day - August 2018Frappe Open Day - August 2018
Frappe Open Day - August 2018
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 

Plus de Márton Kodok

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionMárton Kodok
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsMárton Kodok
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementMárton Kodok
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationMárton Kodok
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudMárton Kodok
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMárton Kodok
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLMárton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.Márton Kodok
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsMárton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigMárton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps EngineersMárton Kodok
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youMárton Kodok
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud PlatformonMárton Kodok
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokMárton Kodok
 

Plus de Márton Kodok (20)

Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in ActionGen Apps on Google Cloud PaLM2 and Codey APIs in Action
Gen Apps on Google Cloud PaLM2 and Codey APIs in Action
 
DevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflowsDevBCN Vertex AI - Pipelines for your MLOps workflows
DevBCN Vertex AI - Pipelines for your MLOps workflows
 
Discover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statementDiscover BigQuery ML, build your own CREATE MODEL statement
Discover BigQuery ML, build your own CREATE MODEL statement
 
Cloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerizationCloud Run - the rise of serverless and containerization
Cloud Run - the rise of serverless and containerization
 
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google CloudVertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
Vertex AI - Unified ML Platform for the entire AI workflow on Google Cloud
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Cloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automationCloud Workflows What's new in serverless orchestration and automation
Cloud Workflows What's new in serverless orchestration and automation
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatások
 

Dernier

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...software pro Development
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdfPearlKirahMaeRagusta1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 

Dernier (20)

Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...How to Choose the Right Laravel Development Partner in New York City_compress...
How to Choose the Right Laravel Development Partner in New York City_compress...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 

Google BigQuery for Everyday Developer

  • 1. Google BigQuery for the Everyday Developer Márton Kodok Senior Software Engineer at REEA twitter: martonkodok stackoverflow: pentium10 github: @pentium10 IV. IT&C Innovation Conference - October 2016 - Sovata, Romania
  • 2. 1. Big Data Challenges 2. Infrastructure overview 3. Choosing a Strategy 4. BigQuery - Use Cases 5. Q & A BigQuery for the Everyday Developer @martonkodok Agenda
  • 3. BigQuery for the Everyday Developer @martonkodok The 5 V’s of Big Data
  • 4. Every scientist who needs big data analytics to save millions of lives should have that power Legacy systems don’t provide the power. The simple fact is that you are brilliant but your brilliant ideas require complex analytics. Traditional solutions are not applicable. BigQuery for the Everyday Developer @martonkodok Big Data movement
  • 5. With more and more people realizing the value of realtime analytics, the race to build realtime middleware has begun. The Plan: have oversight over developments as they happen. * More slides about Beanstalkd at slideshare.net/martonkodok BigQuery for the Everyday Developer @martonkodok Fast computing systems
  • 6. BigQuery for the Everyday Developer @martonkodok Simple legacy Website stack/infrastructure
  • 7. BigQuery for the Everyday Developer @martonkodok Infrastructure with Middleware
  • 8. Task: Need backend/database for STORE, QUERY, EXTRACT deep analytics: ● Events (website events, app events, email events) ● Entities (user data, business entities, A/B split test data) ● Metrics (sales data, code commit history, profiler data) ● Streaming (sensor data, IoT data burst) ● 3rd party Analytics (Google Analytics, Mixpanel) ● Local generated data (log file entries, trace output, unstructured data...) ❏ Run Ad-Hoc queries (without Developer) ❏ Be realtime BigQuery for the Everyday Developer @martonkodok Requirements
  • 9. ● Terabyte scalable storage ● Real-time row ingestion ● Ask sophisticated queries ● Query-performance ● Low-maintenance ● Cost effective ● Wire them up easily Goal: Store everything accessible by SQL immediately. BigQuery for the Everyday Developer @martonkodok Desired system/platform Equipment strategy * people still required Engines: ❏ MongoDB, Riak, Redis ❏ ELK Stack (Elastic-Logstash-Kibana)... ❏ Cassandra, Hive, Hadoop... ❏ Amazon RedShift, Google BigQuery... In-House Hosted Managed
  • 10. BigQuery for the Everyday Developer @martonkodok Google BigQuery
  • 11. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● Decent pricing (queries $5/TB, storage: $20/TB) *October 2016 pricing ● 100.000 rows / sec Streaming API ● Open Interfaces (Web UI, BQ command line tool, REST, ODBC) ● Familiar DB Structure (table, views, record, nested, JSON) ● Convenience of SQL + Javascript UDF (User Defined Functions) ● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors ● Client libraries available in YFL (your favorite languages) BigQuery for the Everyday Developer @martonkodok What is BigQuery?
  • 12. ● ACL - row level locking (individual or group based) ● Columnar storage (max 10 000 columns in table) ● Rich SQL: JSON, IP, Math, RegExp, Window functions ● Data types: String, Integer, Float, Boolean, Timestamp, Record, Nested, Struct, Array. ● Append-only tables prefered (DML syntax available) ● Batch load file size limits: 1TB (CSV or JSON) ● User Defined Functions in SQL or Javascript ● Date partitioned tables BigQuery for the Everyday Developer @martonkodok BigQuery: Convenience of SQL
  • 13. * 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000 Queries Storage Ingestion ➔ 1 TB per month free ➔ 5 USD per TB ➔ only pay for the columns you use in your query ➔ 20 USD per TB frequently accessed data ➔ 10 USD per TB long term storage 90 days ➔ Batch load free (CSV/JSON) ➔ Exporting free ➔ Table copy free ➔ 50 USD per TB Estimate 1 - Storage 5 TB - Streaming Inserts 1 TB - Queries 3 TB Monthly total: $165 Estimate 2 - Storage 24 TB - Streaming Inserts 1 TB - Queries 50 TB Monthly total: $788 BigQuery for the Everyday Developer @martonkodok BigQuery Costs - October 2016
  • 14. BigQuery for the Everyday Developer @martonkodok +--------------------------+-----------+----------+ | order_id | INTEGER | REQUIRED | | ... | | | Example: | products | RECORD | REPEATED | ”products”:[p1,p2] | products.name | STRING | NULLABLE | | products.product_id | INTEGER | NULLABLE | ”products”:[{”name”:”p1”, | products.attributes | STRING | REPEATED | ”product_id”:10, | products.price | FLOAT | NULLABLE | ”attributes”:[”red”,”xl”] | ... | | | ,”price”:9.99}, | common | RECORD | NULLABLE | {”name”:”p2”, | common.insert_id | INTEGER | REQUIRED | ”product_id”:20, | common.event | INTEGER | REQUIRED | ”attributes”:[“red”,”xl”] | common.user_id | INTEGER | REQUIRED | ,”price”:9.99}] | common.timestamp | TIMESTAMP | REQUIRED | | common.utm | RECORD | NULLABLE | | common.utm.source | STRING | NULLABLE | | common.utm.medium | STRING | NULLABLE | | common.utm.campaign | STRING | NULLABLE | | common.utm.content | STRING | NULLABLE | | common.utm.term | STRING | NULLABLE | | meta | STRING | NULLABLE | +--------------------------+-----------+----------+ Schema modelling / JSON
  • 15. 1. Use aggregation MIN/MAX on timestamp to find first/last and join back to the same table. 2. Use analytic functions FIRST_VALUE and LAST_VALUE when only one column is needed SELECT LAST_VALUE(email) OVER( PARTITION BY user_id ORDER BY timestamp ASC) AS email_last ... 3. Using Window Functions for full row selection SELECT email, firstname, lastname FROM (SELECT email, firstname, lastname row_number() over (partition BY user_id ORDER BY timestamp DESC) seqnum FROM [profile_event] ) WHERE seqnum=1 BigQuery for the Everyday Developer @martonkodok Append only tables - Get last value
  • 16. BigQuery for the Everyday Developer @martonkodok Streaming insert time (ms) - last year
  • 17. ● Funnel Analysis BigQuery for the Everyday Developer @martonkodok Achievements
  • 18. BigQuery for the Everyday Developer @martonkodok Funnel analysis: Time on upsell pages
  • 19. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... BigQuery for the Everyday Developer @martonkodok Attribute credit to first article visited on purchase
  • 20. ● Funnel Analysis ● Email URL click heatmap BigQuery for the Everyday Developer @martonkodok Achievements
  • 21. BigQuery for the Everyday Developer @martonkodok Email URL clicks heat-map
  • 22. ● Funnel Analysis ● Email URL click heatmap ● Email Health Dashboard (SPAM, ISP deferral, content A/B split tests, trends or low open rate campaigns) ● Advanced segmentation (all raw data stored) ● Behavioral analytics - engaged users etc... BigQuery for the Everyday Developer @martonkodok Achievements Continued
  • 23. ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no need to re-implement tricky concepts (time windows / join streams) ● pay only the columns we have in your queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data. BigQuery for the Everyday Developer @martonkodok Our benefits
  • 24. 1. githubarchive.org: 20+ event types available since 2012 a. pull request latency b. expressions, emotions in commit messages c. etc... 2. httparchive.org a. sites that use multiple popular JS frameworks b. sites that use multiple jQuery versions 3. Export for Google Analytics 4. MLab - broadband connection performance (26B rows) 5. GSOD - samples of weather (rainfall, temp…) 6. US Natality - 1969 to 2008 7. Wikipedia edits BigQuery for the Everyday Developer @martonkodok BigQuery: Sample projects to try out
  • 25. BigQuery for the Everyday Developer @martonkodok HttpArchive - multiple JS frameworks
  • 26. BigQuery for the Everyday Developer @martonkodok HttpArchive - multiple jQuery versions
  • 27. Questions? Thank you. Slides available soon on: slideshare.net/martonkodok