SlideShare a Scribd company logo
1 of 18
Download to read offline
how
BigQuerybroke my heart
Gabe Hamilton
Reporting Solutions Smackdown
We are evaluating replacements for SQL Server for our
Reporting & Business Intelligence backend.
Many TBs of data.
Closer to SQL the less report migration we need to do.
We like saving money.
Solutions we've been testing
Redshift
BigQuery
CouchDB
MongoDB
Cassandra
TerraData
Oracle
Plus various changes to our design
Some of these are necessary for certain technologies.
Denormalization
Sharding strategies
Nested data
Tune our existing Star Schema and Tables
BigQuery is
A massively parallel datastore
Columnar
Queries are SQL Select statements
Uses a Tree structure to distribute across
nodes
How many nodes?
10,000 nodes!
And what price?
3.5 cents /GBResource
Pricing
Query cost is per GB in the columns processed
Interactive Queries $0.035
Batch Queries $0.02
Storage $0.12 (per GB/month)
Which is great for our big queries
A gnarly query that looks at 200GB of data costs $7.50 in
BigQuery.
If that takes 2 hours to run on a $60/hr cluster of a
competing technology...
It's a little more complicated because in theory several of
those queries could run simultaneously on the competing
tech.
Still, that's 4 X cheaper plus the speed improvement.
Example: Github data from past year
3.5 GB Table
SELECT type, count(*) as num FROM [publicdata:samples.github_timeline]
group by type order by num desc;
Query complete (1.1s elapsed, 75.0 MB processed)
Event Type num
PushEvent 2,686,723
CreateEvent 964,830
WatchEvent 581,029
IssueCommentEvent 507,724
GistEvent 366,643
IssuesEvent 305,479
ForkEvent 180,712
PullRequestEvent 173,204
FollowEvent 156,427
GollumEvent 104,808
Cost $0.0026
or 5 for a penny
It was love at first type.
But Then...
Reality
Uploaded our test dataset
Which is 250GB
Docs are good, tools are good.
Hurdle 1: only one join per query.
Ok, rewrite as ugly nested selects...
Result
Round 2
No problem, I had seen that joins were
somewhat experimental.
Try the denormalized version of the data.
SELECT ProductId, StoreId, ProductSizeId, InventoryDate,
avg(InventoryQuantity) as InventoryQuantity
FROM BigDataTest.denorm
GROUP EACH BY ProductId, StoreId, ProductSizeId, InventoryDate
1st error message helpfully says, try GROUP EACH BY
Final Result
It's not you, it's me
The documentation had some semi-useful information:
Because the system is interactive, queries that produce a large number of
groups might fail. The use of the TOP function instead of GROUP BY might
solve the problem.
However, the BigQuery TOP function only operates on one column.
At this point I had jumped through enough hoops. I posted
on Stack Overflow, the official support channel according to
the docs, and have gotten no response.
Epilogue
Simplifying my query down to two grouping
columns did cause it to run with a limit
statement.
SELECT ProductId, StoreId,
avg(InventoryQuantity) as InventoryQuantity
FROM BigDataTest.denorm
GROUP each BY ProductId, StoreId
Limit 1000
Query complete (4.5s elapsed, 28.1 GB processed)
Without a limit it gives Error: Response too large to return.
Perhaps there is still hope for me and BigQuery...
Me
Like this talk?
@gabehamilton
My twitter feed is just technical stuff.
or slideshare.net/gabehamilton

More Related Content

What's hot

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query BasicsIdo Green
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comAlex Van Boxel
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud eventPreetyKhatkar
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewDoiT International
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014James Chittenden
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Vicente Orjales
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Angelos Petheriotis
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryDan Sullivan, Ph.D.
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Rasel Rana
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataTreasure Data, Inc.
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsPatrick Chanezon
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQLYu Ishikawa
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataTreasure Data, Inc.
 

What's hot (20)

Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Big Query Basics
Big Query BasicsBig Query Basics
Big Query Basics
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Google Big Query UDFs
Google Big Query UDFsGoogle Big Query UDFs
Google Big Query UDFs
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 
Google BigQuery 101 & What’s New
Google BigQuery 101 & What’s NewGoogle BigQuery 101 & What’s New
Google BigQuery 101 & What’s New
 
How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014How Google Does Big Data - DevNexus 2014
How Google Does Big Data - DevNexus 2014
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
 
Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.Streaming 4 billion Messages per day. Lessons Learned.
Streaming 4 billion Messages per day. Lessons Learned.
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
How to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQueryHow to Design a Modern Data Warehouse in BigQuery
How to Design a Modern Data Warehouse in BigQuery
 
Google and big query
Google and big queryGoogle and big query
Google and big query
 
Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)Google BigQuery is the future of Analytics! (Google Developer Conference)
Google BigQuery is the future of Analytics! (Google Developer Conference)
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
Augmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure dataAugmenting Mongo DB with treasure data
Augmenting Mongo DB with treasure data
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
 
2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL2017 09-27 democratize data products with SQL
2017 09-27 democratize data products with SQL
 
Scaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big DataScaling to Infinity - Open Source meets Big Data
Scaling to Infinity - Open Source meets Big Data
 

Similar to How BigQuery broke my heart

LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Enginedenis Udod
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handlingGleicon Moraes
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingGleicon Moraes
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming hongbin ma
 
Stacktrace Berlin RC.2
Stacktrace Berlin RC.2Stacktrace Berlin RC.2
Stacktrace Berlin RC.2Oliver Seemann
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureChristos Charmatzis
 
AWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAmazon Web Services
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexChris Wilper
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...Yann Cluchey
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraJeff Smoley
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web ApplicationKetan Deshmukh
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentCrossing Minds
 

Similar to How BigQuery broke my heart (20)

disertation
disertationdisertation
disertation
 
LatJUG. Google App Engine
LatJUG. Google App EngineLatJUG. Google App Engine
LatJUG. Google App Engine
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Final deck
Final deckFinal deck
Final deck
 
Architectural anti-patterns for data handling
Architectural anti-patterns for data handlingArchitectural anti-patterns for data handling
Architectural anti-patterns for data handling
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Architectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handlingArchitectural anti patterns_for_data_handling
Architectural anti patterns_for_data_handling
 
Apache Kylin Streaming
Apache Kylin Streaming Apache Kylin Streaming
Apache Kylin Streaming
 
Stacktrace Berlin RC.2
Stacktrace Berlin RC.2Stacktrace Berlin RC.2
Stacktrace Berlin RC.2
 
Big Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with AzureBig Data Analytics: Finding diamonds in the rough with Azure
Big Data Analytics: Finding diamonds in the rough with Azure
 
AWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APACAWS Presentation at JasperWorld APAC
AWS Presentation at JasperWorld APAC
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource Index
 
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
GOTO Aarhus 2014: Making Enterprise Data Available in Real Time with elastics...
 
NoSql Introduction
NoSql IntroductionNoSql Introduction
NoSql Introduction
 
SEO for Large Websites
SEO for Large WebsitesSEO for Large Websites
SEO for Large Websites
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Scaling Your Web Application
Scaling Your Web ApplicationScaling Your Web Application
Scaling Your Web Application
 
Recommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time DeploymentRecommender Systems from A to Z – Real-Time Deployment
Recommender Systems from A to Z – Real-Time Deployment
 

More from Gabriel Hamilton

Javascript Smart Contracts on NEAR
Javascript Smart Contracts on NEARJavascript Smart Contracts on NEAR
Javascript Smart Contracts on NEARGabriel Hamilton
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
Software engineering for CEOs
Software engineering for CEOsSoftware engineering for CEOs
Software engineering for CEOsGabriel Hamilton
 
Adaptive software engineering
Adaptive software engineeringAdaptive software engineering
Adaptive software engineeringGabriel Hamilton
 
The TensorFlow dance craze
The TensorFlow dance crazeThe TensorFlow dance craze
The TensorFlow dance crazeGabriel Hamilton
 
Software engineering for CEOs ch1
Software engineering for CEOs ch1Software engineering for CEOs ch1
Software engineering for CEOs ch1Gabriel Hamilton
 
DOSUG Intro to google prediction api
DOSUG Intro to google prediction apiDOSUG Intro to google prediction api
DOSUG Intro to google prediction apiGabriel Hamilton
 
How to present lots of information on a screen
How to present lots of information on a screenHow to present lots of information on a screen
How to present lots of information on a screenGabriel Hamilton
 
Intro to Google Prediction API
Intro to Google Prediction APIIntro to Google Prediction API
Intro to Google Prediction APIGabriel Hamilton
 
Dojo: Beautiful Web Apps, Fast
Dojo: Beautiful Web Apps, FastDojo: Beautiful Web Apps, Fast
Dojo: Beautiful Web Apps, FastGabriel Hamilton
 
Dojo: Getting Started Today
Dojo: Getting Started TodayDojo: Getting Started Today
Dojo: Getting Started TodayGabriel Hamilton
 

More from Gabriel Hamilton (15)

Javascript Smart Contracts on NEAR
Javascript Smart Contracts on NEARJavascript Smart Contracts on NEAR
Javascript Smart Contracts on NEAR
 
Smart Contracts
Smart ContractsSmart Contracts
Smart Contracts
 
Web Next
Web NextWeb Next
Web Next
 
Beyond Agile Software
Beyond Agile SoftwareBeyond Agile Software
Beyond Agile Software
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Software engineering for CEOs
Software engineering for CEOsSoftware engineering for CEOs
Software engineering for CEOs
 
Adaptive software engineering
Adaptive software engineeringAdaptive software engineering
Adaptive software engineering
 
The TensorFlow dance craze
The TensorFlow dance crazeThe TensorFlow dance craze
The TensorFlow dance craze
 
DataFlow & Beam
DataFlow & BeamDataFlow & Beam
DataFlow & Beam
 
Software engineering for CEOs ch1
Software engineering for CEOs ch1Software engineering for CEOs ch1
Software engineering for CEOs ch1
 
DOSUG Intro to google prediction api
DOSUG Intro to google prediction apiDOSUG Intro to google prediction api
DOSUG Intro to google prediction api
 
How to present lots of information on a screen
How to present lots of information on a screenHow to present lots of information on a screen
How to present lots of information on a screen
 
Intro to Google Prediction API
Intro to Google Prediction APIIntro to Google Prediction API
Intro to Google Prediction API
 
Dojo: Beautiful Web Apps, Fast
Dojo: Beautiful Web Apps, FastDojo: Beautiful Web Apps, Fast
Dojo: Beautiful Web Apps, Fast
 
Dojo: Getting Started Today
Dojo: Getting Started TodayDojo: Getting Started Today
Dojo: Getting Started Today
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

How BigQuery broke my heart

  • 2. Reporting Solutions Smackdown We are evaluating replacements for SQL Server for our Reporting & Business Intelligence backend. Many TBs of data. Closer to SQL the less report migration we need to do. We like saving money.
  • 3. Solutions we've been testing Redshift BigQuery CouchDB MongoDB Cassandra TerraData Oracle
  • 4. Plus various changes to our design Some of these are necessary for certain technologies. Denormalization Sharding strategies Nested data Tune our existing Star Schema and Tables
  • 5. BigQuery is A massively parallel datastore Columnar Queries are SQL Select statements Uses a Tree structure to distribute across nodes
  • 7. And what price? 3.5 cents /GBResource Pricing Query cost is per GB in the columns processed Interactive Queries $0.035 Batch Queries $0.02 Storage $0.12 (per GB/month)
  • 8. Which is great for our big queries A gnarly query that looks at 200GB of data costs $7.50 in BigQuery. If that takes 2 hours to run on a $60/hr cluster of a competing technology... It's a little more complicated because in theory several of those queries could run simultaneously on the competing tech. Still, that's 4 X cheaper plus the speed improvement.
  • 9. Example: Github data from past year 3.5 GB Table SELECT type, count(*) as num FROM [publicdata:samples.github_timeline] group by type order by num desc; Query complete (1.1s elapsed, 75.0 MB processed) Event Type num PushEvent 2,686,723 CreateEvent 964,830 WatchEvent 581,029 IssueCommentEvent 507,724 GistEvent 366,643 IssuesEvent 305,479 ForkEvent 180,712 PullRequestEvent 173,204 FollowEvent 156,427 GollumEvent 104,808 Cost $0.0026 or 5 for a penny
  • 10. It was love at first type.
  • 12. Uploaded our test dataset Which is 250GB Docs are good, tools are good. Hurdle 1: only one join per query. Ok, rewrite as ugly nested selects...
  • 14. Round 2 No problem, I had seen that joins were somewhat experimental. Try the denormalized version of the data. SELECT ProductId, StoreId, ProductSizeId, InventoryDate, avg(InventoryQuantity) as InventoryQuantity FROM BigDataTest.denorm GROUP EACH BY ProductId, StoreId, ProductSizeId, InventoryDate 1st error message helpfully says, try GROUP EACH BY
  • 16. It's not you, it's me The documentation had some semi-useful information: Because the system is interactive, queries that produce a large number of groups might fail. The use of the TOP function instead of GROUP BY might solve the problem. However, the BigQuery TOP function only operates on one column. At this point I had jumped through enough hoops. I posted on Stack Overflow, the official support channel according to the docs, and have gotten no response.
  • 17. Epilogue Simplifying my query down to two grouping columns did cause it to run with a limit statement. SELECT ProductId, StoreId, avg(InventoryQuantity) as InventoryQuantity FROM BigDataTest.denorm GROUP each BY ProductId, StoreId Limit 1000 Query complete (4.5s elapsed, 28.1 GB processed) Without a limit it gives Error: Response too large to return. Perhaps there is still hope for me and BigQuery...
  • 18. Me Like this talk? @gabehamilton My twitter feed is just technical stuff. or slideshare.net/gabehamilton