SlideShare a Scribd company logo
1 of 35
Download to read offline
Powering Interactive Data Analysis
with Google BigQuery
Márton Kodok / @martonkodok
Google Developer Expert at REEA
May 2017 - Bucharest, Romania
● Geek. Hiker. Do-er.
● Among the Top3 romanians on Stackoverflow
● Google Developer Expert on Cloud technologies
● Crafting Web/Mobile backends at REEA.net
● BigQuery and database engine expert
● Active in mentoring
Twitter: @martonkodok
StackOverflow: pentium10
Slideshare: martonkodok
GitHub: pentium10
Powering Interactive Data Analysis with Google BigQuery @martonkodok
About me
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Agenda
The
Challenge
Powering interactive
Data Analysis/Reporting system
Architecture
Overview
Strategy &
Tricks
Winning
Solution
❏ Need backend/database to STORE, QUERY, EXTRACT data
❏ Deep analytics - large, multi-source, complex, unstructured
❏ Be real time
❏ Terabyte scale
❏ Cost effective
❏ Run Ad-Hoc reports - as the occasion requires
❏ Without Developer - interactive
❏ Minimal engineering efforts
❏ Support streaming - data is generated on a continual basis
❏ Withstand #BlackFriday
❏ Simple Query language (prefered SQL / Javascript)
Powering Interactive Data Analysis with Google BigQuery @martonkodok
The Challenge
“We can't solve problems by
using the same kind of
thinking we used when we
created them”
-Albert Einstein
Powering Interactive Data Analysis with Google BigQuery @martonkodok
The Challenge
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
BehindtheScenes:
DaysToInsights
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Legacy Business Reporting System
Web
Mobile
Web Server
Database
SQL
Cached
Platform Services
CMS/Framework
Report & Share
Business Analysis
Scheduled
Tasks
Batch Processing
Compute Engine
Multiple Instances
Minutes
to kick in
Hours to Run
Batch Processing
Hours to Clean and
Aggregate
DAYS TO
INSIGHTS
● Terabyte scalable storage
● Real-time row ingestion
● Ask sophisticated queries
● Query-performance
● Low-maintenance
● Cost effective
● Wire them up easily
Goal: Store everything accessible by SQL immediately.
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Desired system/platform
Engines:
● MongoDB, Riak, Redis
● ELK Stack (Elasticsearch-Logstash-Kibana)
● Cassandra, Hive, Hadoop...
● Amazon Athena, Google BigQuery...
Powering Interactive Data Analysis with Google BigQuery @martonkodok
● Analytics-as-a-Service - Data Warehouse in the Cloud
● Fully-Managed by Google (US or EU zone)
● Scales into Petabytes
● Ridiculously fast
● SQL 2011 Standard + Javascript UDF (User Defined Functions)
● Familiar DB Structure (table, views, record, nested, JSON)
● Open Interfaces (Web UI, BQ command line tool, REST, ODBC)
● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors
● Client libraries available in YFL (your favorite languages)
● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017
Powering Interactive Data Analysis with Google BigQuery @martonkodok
What is BigQuery?
● Columnar storage (max 10 000 columns in table)
● Batch load file size limits: 5TB (CSV or JSON)
● User Defined Functions in SQL or Javascript
● Rich SQL 2011: JSON, IP, Math, RegExp, Window functions
● Data types: String, Integer, Float, Boolean, Timestamp,
Record, Nested, Struct, Array.
● Append-only tables prefered (DML syntax available)
● Day partitioned tables
● ACL - row level locking (individual or group based)
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Convenience of SQL
* 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000
Queries Storage Ingestion
➔ 1 TB per month free
➔ 5 USD per TB
➔ only pay for the columns you use
in your query
➔ 20 USD per TB frequently accessed
data
➔ 10 USD per TB long term storage
90 days
➔ Batch load free (CSV/JSON)
➔ Exporting free
➔ Table copy free
➔ Streaming 50 USD per TB
Estimate 1
- Storage 5 TB
- Streaming Inserts 1 TB
- Queries 3 TB
Monthly total: $165
Estimate 2
- Storage 25 TB
- Streaming Inserts 1 TB
- Queries 50 TB
Monthly total: $788
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery Costs - May 2017
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Architecting for The Cloud
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Access to Insights without Developer support
Analytics Backend
BigQuery
On-Premises Servers
Pipelines
ETL
Engine
Event Sourcing
Frontend
Platform Services
Metrics / Logs/
Streaming
Development
Team
Data Analysts
Report & Share
Business Analysis
Tools
Tableau
QlikView
Data Studio
Internal
Dashboard
Database
SQL
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Data Pipeline Integration
Analytics Backend
BigQuery
On-Premises Servers
Pipelines ETL
Database
SQL
Standard
Devices
HTTPS
Ingest
Events
Monitoring
Logging
FluentD
Cloud
Storage
Report & Share
Business Analysis
Firebase
archive
Load
Export
Replay
Application
ServersServers
Powering Interactive Data Analysis with Google BigQuery @martonkodok
<filter frontend.user.*>
@type record_transformer
enable_ruby
remove_keys host
<record>
bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"}
</record>
</filter>
<match frontend.user.*>
@type copy
<store>
@type forest
subtype file
<template>
path /tank/storage/${tag}.*.log
time_slice_format %Y%m%d
time_slice_wait 10m
</template>
</store>
<store>
@type bigquery
method insert
...
</store>
</match>
….bigquery section continued….
auth_method json_key
json_key /etc/td-agent/keys/key-31da042be48c.json
project project_id
dataset dataset_name
time_field timestamp
time_slice_format %Y%m%d
table user$%{time_slice}
ignore_unknown_values
schema_path /etc/td-agent/schema/user_login.json
1
2
3
4
● On data that it is difficult to process/analyze using traditional databases
● On exploring unstructured data
● Not a replacement to traditional DBs, but it compliments the system
● Applying Javascript UDF on columnar storage to resolve complex tasks
(eg: JS for natural language processing)
● On streams (form wizard ...)
● On IoT streams
● Major strength is handling Large datasets
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Where to use BigQuery?
Go to the BigQuery web UI.
https://bigquery.cloud.google.com/
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Query a public dataset
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Romanian stations that record the most days of snow
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Mentions of RO politicians since ‘16 Nov in GDELT articles
● Funnel Analysis
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Achievements
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Funnel analysis: Time on upsell pages
Example HITS chain:
● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1
● page1 -> article2-> page3 -> orderpage2 -> ...
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Attribute credit to first article visited on purchase
● Funnel Analysis
● Email URL click heatmap
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Achievements
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Email URL clicks heat-map
● Funnel Analysis
● Email URL click heatmap
● Email Health Dashboard (SPAM, ISP deferral, content
A/B split tests, trends or low open rate campaigns)
● Advanced segmentation (all raw data stored)
● Behavioral analytics - engaged users etc...
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Achievements Continued
● no provisioning/deploy
● no running out of resources
● no more focus on large scale execution plan
● no need to re-implement tricky concepts
(time windows / join streams)
● pay only the columns we have in your queries
● run raw ad-hoc queries (either by analysts/sales or Devs)
● no more throwing away-, expiring-, aggregating old data.
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Our benefits
● No manual sharding
● No capacity guessing
● No idle resources
● No maintenance windows
● No manual scaling
● No file mgmt
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Serverless Data Warehouse
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Sample projects to try out 1
Powering Interactive Data Analysis with Google BigQuery @martonkodok
BigQuery: Sample projects to try out 2
Powering Interactive Data Analysis with Google BigQuery @martonkodok
HttpArchive - multiple JS frameworks
Powering Interactive Data Analysis with Google BigQuery @martonkodok
HttpArchive - multiple jQuery versions
Powering Interactive Data Analysis with Google BigQuery @martonkodok
Easily Build Custom Reports and Dashboards
Questions?
Thank you.
Slides available on: slideshare.net/martonkodok

More Related Content

More from Márton Kodok

Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsMárton Kodok
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLMárton Kodok
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.Márton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsMárton Kodok
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQueryMárton Kodok
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigMárton Kodok
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLMárton Kodok
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps EngineersMárton Kodok
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformMárton Kodok
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youMárton Kodok
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud PlatformonMárton Kodok
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokMárton Kodok
 
GDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformGDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformMárton Kodok
 
Efikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaEfikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaMárton Kodok
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryMárton Kodok
 

More from Márton Kodok (20)

Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
Serverless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud WorkflowsServerless orchestration and automation with Cloud Workflows
Serverless orchestration and automation with Cloud Workflows
 
BigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery MLBigdataConference Europe - BigQuery ML
BigdataConference Europe - BigQuery ML
 
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.DevFest Romania 2020 Keynote: Bringing the Cloud to you.
DevFest Romania 2020 Keynote: Bringing the Cloud to you.
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Applying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analyticsApplying BigQuery ML on e-commerce data analytics
Applying BigQuery ML on e-commerce data analytics
 
Supercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuerySupercharge your data analytics with BigQuery
Supercharge your data analytics with BigQuery
 
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer ExpertigVibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
Vibe Koli 2019 - Utazás az egyetem padjaitól a Google Developer Expertig
 
BigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQLBigQuery ML - Machine learning at scale using SQL
BigQuery ML - Machine learning at scale using SQL
 
Google Cloud Platform Solutions for DevOps Engineers
Google Cloud Platform Solutions  for DevOps EngineersGoogle Cloud Platform Solutions  for DevOps Engineers
Google Cloud Platform Solutions for DevOps Engineers
 
GDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud PlatformGDG DevFest Romania - Architecting for the Google Cloud Platform
GDG DevFest Romania - Architecting for the Google Cloud Platform
 
Next18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to youNext18 Extended Targu Mures - Bringing the Cloud to you
Next18 Extended Targu Mures - Bringing the Cloud to you
 
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon6. DISZ - Webalkalmazások skálázhatósága  a Google Cloud Platformon
6. DISZ - Webalkalmazások skálázhatósága a Google Cloud Platformon
 
GCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatásokGCP - A felhőalapú architektúrák és szolgáltatások
GCP - A felhőalapú architektúrák és szolgáltatások
 
GDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud PlatformGDG Heraklion - Architecting for the Google Cloud Platform
GDG Heraklion - Architecting for the Google Cloud Platform
 
Efikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsaEfikot - Smart City, okos város - a jövőnk kulcsa
Efikot - Smart City, okos város - a jövőnk kulcsa
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQueryGDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
GDG DevFest Ukraine - Powering Interactive Data Analysis with Google BigQuery
 

Recently uploaded

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfproinshot.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 

Recently uploaded (20)

VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 

I TAKE Unconference 2017 - Powering interactive data analysis with Google BigQuery

  • 1. Powering Interactive Data Analysis with Google BigQuery Márton Kodok / @martonkodok Google Developer Expert at REEA May 2017 - Bucharest, Romania
  • 2. ● Geek. Hiker. Do-er. ● Among the Top3 romanians on Stackoverflow ● Google Developer Expert on Cloud technologies ● Crafting Web/Mobile backends at REEA.net ● BigQuery and database engine expert ● Active in mentoring Twitter: @martonkodok StackOverflow: pentium10 Slideshare: martonkodok GitHub: pentium10 Powering Interactive Data Analysis with Google BigQuery @martonkodok About me
  • 3. Powering Interactive Data Analysis with Google BigQuery @martonkodok Agenda The Challenge Powering interactive Data Analysis/Reporting system Architecture Overview Strategy & Tricks Winning Solution
  • 4. ❏ Need backend/database to STORE, QUERY, EXTRACT data ❏ Deep analytics - large, multi-source, complex, unstructured ❏ Be real time ❏ Terabyte scale ❏ Cost effective ❏ Run Ad-Hoc reports - as the occasion requires ❏ Without Developer - interactive ❏ Minimal engineering efforts ❏ Support streaming - data is generated on a continual basis ❏ Withstand #BlackFriday ❏ Simple Query language (prefered SQL / Javascript) Powering Interactive Data Analysis with Google BigQuery @martonkodok The Challenge
  • 5. “We can't solve problems by using the same kind of thinking we used when we created them” -Albert Einstein Powering Interactive Data Analysis with Google BigQuery @martonkodok The Challenge
  • 6. Powering Interactive Data Analysis with Google BigQuery @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances
  • 7. Powering Interactive Data Analysis with Google BigQuery @martonkodok Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances BehindtheScenes: DaysToInsights
  • 8. Powering Interactive Data Analysis with Google BigQuery @martonkodok Legacy Business Reporting System Web Mobile Web Server Database SQL Cached Platform Services CMS/Framework Report & Share Business Analysis Scheduled Tasks Batch Processing Compute Engine Multiple Instances Minutes to kick in Hours to Run Batch Processing Hours to Clean and Aggregate DAYS TO INSIGHTS
  • 9. ● Terabyte scalable storage ● Real-time row ingestion ● Ask sophisticated queries ● Query-performance ● Low-maintenance ● Cost effective ● Wire them up easily Goal: Store everything accessible by SQL immediately. Powering Interactive Data Analysis with Google BigQuery @martonkodok Desired system/platform Engines: ● MongoDB, Riak, Redis ● ELK Stack (Elasticsearch-Logstash-Kibana) ● Cassandra, Hive, Hadoop... ● Amazon Athena, Google BigQuery...
  • 10. Powering Interactive Data Analysis with Google BigQuery @martonkodok
  • 11. ● Analytics-as-a-Service - Data Warehouse in the Cloud ● Fully-Managed by Google (US or EU zone) ● Scales into Petabytes ● Ridiculously fast ● SQL 2011 Standard + Javascript UDF (User Defined Functions) ● Familiar DB Structure (table, views, record, nested, JSON) ● Open Interfaces (Web UI, BQ command line tool, REST, ODBC) ● Integrates with Google Sheets + Google Cloud Storage + Pub/Sub connectors ● Client libraries available in YFL (your favorite languages) ● Decent pricing (queries $5/TB, storage: $20/TB cold: $10/TB) *May 2017 Powering Interactive Data Analysis with Google BigQuery @martonkodok What is BigQuery?
  • 12. ● Columnar storage (max 10 000 columns in table) ● Batch load file size limits: 5TB (CSV or JSON) ● User Defined Functions in SQL or Javascript ● Rich SQL 2011: JSON, IP, Math, RegExp, Window functions ● Data types: String, Integer, Float, Boolean, Timestamp, Record, Nested, Struct, Array. ● Append-only tables prefered (DML syntax available) ● Day partitioned tables ● ACL - row level locking (individual or group based) Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Convenience of SQL
  • 13. * 1 Petabyte storage, 10 TB inserts, 100 TB queries => $22000 Queries Storage Ingestion ➔ 1 TB per month free ➔ 5 USD per TB ➔ only pay for the columns you use in your query ➔ 20 USD per TB frequently accessed data ➔ 10 USD per TB long term storage 90 days ➔ Batch load free (CSV/JSON) ➔ Exporting free ➔ Table copy free ➔ Streaming 50 USD per TB Estimate 1 - Storage 5 TB - Streaming Inserts 1 TB - Queries 3 TB Monthly total: $165 Estimate 2 - Storage 25 TB - Streaming Inserts 1 TB - Queries 50 TB Monthly total: $788 Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery Costs - May 2017
  • 14. Powering Interactive Data Analysis with Google BigQuery @martonkodok Architecting for The Cloud BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming
  • 15. Powering Interactive Data Analysis with Google BigQuery @martonkodok Access to Insights without Developer support Analytics Backend BigQuery On-Premises Servers Pipelines ETL Engine Event Sourcing Frontend Platform Services Metrics / Logs/ Streaming Development Team Data Analysts Report & Share Business Analysis Tools Tableau QlikView Data Studio Internal Dashboard Database SQL
  • 16. Powering Interactive Data Analysis with Google BigQuery @martonkodok Data Pipeline Integration Analytics Backend BigQuery On-Premises Servers Pipelines ETL Database SQL Standard Devices HTTPS Ingest Events Monitoring Logging FluentD Cloud Storage Report & Share Business Analysis Firebase archive Load Export Replay Application ServersServers
  • 17. Powering Interactive Data Analysis with Google BigQuery @martonkodok <filter frontend.user.*> @type record_transformer enable_ruby remove_keys host <record> bq {"insert_id":"${uid}","host":"${host}","created":"${time.to_i}"} </record> </filter> <match frontend.user.*> @type copy <store> @type forest subtype file <template> path /tank/storage/${tag}.*.log time_slice_format %Y%m%d time_slice_wait 10m </template> </store> <store> @type bigquery method insert ... </store> </match> ….bigquery section continued…. auth_method json_key json_key /etc/td-agent/keys/key-31da042be48c.json project project_id dataset dataset_name time_field timestamp time_slice_format %Y%m%d table user$%{time_slice} ignore_unknown_values schema_path /etc/td-agent/schema/user_login.json 1 2 3 4
  • 18. ● On data that it is difficult to process/analyze using traditional databases ● On exploring unstructured data ● Not a replacement to traditional DBs, but it compliments the system ● Applying Javascript UDF on columnar storage to resolve complex tasks (eg: JS for natural language processing) ● On streams (form wizard ...) ● On IoT streams ● Major strength is handling Large datasets Powering Interactive Data Analysis with Google BigQuery @martonkodok Where to use BigQuery?
  • 19. Go to the BigQuery web UI. https://bigquery.cloud.google.com/ Powering Interactive Data Analysis with Google BigQuery @martonkodok Query a public dataset
  • 20. Powering Interactive Data Analysis with Google BigQuery @martonkodok Romanian stations that record the most days of snow
  • 21. Powering Interactive Data Analysis with Google BigQuery @martonkodok Mentions of RO politicians since ‘16 Nov in GDELT articles
  • 22. ● Funnel Analysis Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements
  • 23. Powering Interactive Data Analysis with Google BigQuery @martonkodok Funnel analysis: Time on upsell pages
  • 24. Example HITS chain: ● article1 -> page2 -> page3 -> page4 -> orderpage1 -> thankyoupage1 ● page1 -> article2-> page3 -> orderpage2 -> ... Powering Interactive Data Analysis with Google BigQuery @martonkodok Attribute credit to first article visited on purchase
  • 25. ● Funnel Analysis ● Email URL click heatmap Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements
  • 26. Powering Interactive Data Analysis with Google BigQuery @martonkodok Email URL clicks heat-map
  • 27. ● Funnel Analysis ● Email URL click heatmap ● Email Health Dashboard (SPAM, ISP deferral, content A/B split tests, trends or low open rate campaigns) ● Advanced segmentation (all raw data stored) ● Behavioral analytics - engaged users etc... Powering Interactive Data Analysis with Google BigQuery @martonkodok Achievements Continued
  • 28. ● no provisioning/deploy ● no running out of resources ● no more focus on large scale execution plan ● no need to re-implement tricky concepts (time windows / join streams) ● pay only the columns we have in your queries ● run raw ad-hoc queries (either by analysts/sales or Devs) ● no more throwing away-, expiring-, aggregating old data. Powering Interactive Data Analysis with Google BigQuery @martonkodok Our benefits
  • 29. ● No manual sharding ● No capacity guessing ● No idle resources ● No maintenance windows ● No manual scaling ● No file mgmt Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Serverless Data Warehouse
  • 30. Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Sample projects to try out 1
  • 31. Powering Interactive Data Analysis with Google BigQuery @martonkodok BigQuery: Sample projects to try out 2
  • 32. Powering Interactive Data Analysis with Google BigQuery @martonkodok HttpArchive - multiple JS frameworks
  • 33. Powering Interactive Data Analysis with Google BigQuery @martonkodok HttpArchive - multiple jQuery versions
  • 34. Powering Interactive Data Analysis with Google BigQuery @martonkodok Easily Build Custom Reports and Dashboards
  • 35. Questions? Thank you. Slides available on: slideshare.net/martonkodok