SlideShare une entreprise Scribd logo
1  sur  19
Télécharger pour lire hors ligne
Google Dremel
Vicente Orjales - vorjales@gmail.com
http://www.linkedin.com/in/vorjales
Index
● Introduction.
● Google and Big Data. History.
● Columnar representation of Nested Data.
● Query Execution.
● Google Dremel vs. Map-Reduce.
● Implementations: Google BigQuery and Apache Drill
● Case Studies.
● Demo - BigQuery.
● Appendix (References and Screenshots): SLIDES 16-26
Introduction.What is Dremel?
Google Dremel is a system developed by Google for interactively querying
large data sets.
● Near real time interactive analysis (instead batch processing). SQL-
like query language.
● Nested data (most of unstructured data supported) with a column
storage representation.
● Tree architecture (as in web search): multi-level execution trees for
query processing.
It is widely used inside Google and available for the public as Google BigQuery.
There is also an open source implementation named Apache Drill (formerly
Open Dremel) still in beta phase.
Google and Big Data. History.
● 2003: Google File System. Scalable distributed file system for
data intensive applications built over cheap commodity hardware.
● 2004: Map-Reduce. Programming model for large data sets.
Automatically parallelizes jobs in large cluster of commodity
machines. Inspiration for Hadoop.
● 2006: Bigtable. Distributed storage system for structured data.
Inspiration for NoSQL databases
● 2010: Percolator. Adds transactions and locks at table and row
level.
● 2010: Pregel. Scalable Graph Computing. Mining data from social
graphs.
● 2010: Dremel. Ner real time solution with SQL-like language.
Model for Google BigQuery.
Columnar Representation of Nested Data
All values of a nested field such as A.B.C are stored contiguously. Therefore,
A.B.C can be retrieved without reading A.E, A.B.D, etc.
Columnar storage proved has been successfully used for flat relational data,
but making it work for the nested data within Dremel was one the challenges
in the Dremel design: how to preserve all structural information and be able
to reconstruct records from an arbitrary subset of fields.
Columnar Representation of Nested Data
Columnar Representation of Nested Data
record (r=0) has repeated
Language (r=0) has repeated
Query execution
Multi-level serving tree for executing queries:
● Root Server: receive incoming queries, read metadata from tables, route
query to next level.
● Leaf Server: Communication and access to the data layer for retrieving the
results.
● Each server has an internal execution tree that corresponds with a physical
query execution plan.
Query execution - Example: count()
Google Dremel vs Map-Reduce.
M-R
Batch Processing
Not query language. May
be added with extra
modules (e.g.: Hive in
Hadoop)
Data organization depends
on the source
Hadoop implementation:
Considerable admin effort
Dremel
Interactive query
SQL-like query language
Nested data. Column
oriented data organization.
BigQuery implementation:
Admin transparent to the
user.
Google Dremel vs Map-Reduce.
● Emerged at different times and with different motivations: MR focused on
distributed processing for large data sets, while Gremel driver was ad hoc.
queries in near real time.
● Some authors argue that MR approach is not enough for the new big data
requirements companies are confronting today (real time, complex joins,
ACID...).
● While web companies has moved forward with their own products (e.g.
Google with Dremel or Pregel), most of "conventional" entreprises has
decided to buy instead build, selecting and investing in Hadoop based
products in most of cases (Cloudera, Hortonworks, EMC).
The market tendency is to keep building Hadoop based solutions and enrich
them with extra layers providing better real time support and scalability, while
web companies keep innovating with new products. In this scenario Dremel
and MR may be complementary instead of alternative approachs.
Dremel Implementations
● Google BigQuery
○ It is the externalization of the Dremel product, making it available to
the public as part of the Google Cloud.
○ Data upload: directly to BigQuery or through Google Cloud Storage
(better performance for big large data sets).
○ Data manipulation: web interface / Language API / REST API
● Apache Drill (formerly Open Dremel)
○ Open Source implementation of Google Dremel.
○ Apache License (Apache Incubator). http://incubator.apache.org/drill/
○ Still Beta (Alpha?) Phase
BigQuery - Case studies
● Top advertiser network in Brazil.
● Challenge: process millions of records generated daily
(impressions, click-through, views) to be sure system is targeting
ads effectively.
● Solution: BigQuery to gain real time insights into more than 3
billion ads in 350K blogs and webs.
● Travel agency with bus ticketing in India.
● Challenge: Analyze booking and inventory from systems of
hundreds of bus operators serving more than 10,000 routes.
Hadoop takes much time and requires specialized staff.
● Solution: BigQuery allows to quickly detect improvement
opportunities (e.g.: routes with available seats)
● its customer has 16 bungalow villages in Europe.
● Challenge: Forecast number of vacationers and determine
optimal pricing for accommodations. Limitations of current
systems. Several architectures has failed (MS, SAS, BO, Excel).
● Solution: Their own application based in BigQuery performs
analysis in seconds instead hours and without extra people.
Source: https://developers.google.com/bigquery/case_studies/
BigQuery - Demo - Conclusions
● No any installation or configuration of infrastructure is needed. Everything
is in the Google Cloud.
● Interactive system.
● The amount of processed data depends on the selected columns (column
oriented storage)
● ...and the billing depends on the amount of data we've processed!!
Appendix
Demo Screens
BigQuery - Demo - Project Creation
1. Select existing
project or create a
new one
2. Choose the API
we plan to use, e.g.:
BigQuery API
https://code.google.com/apis/console/
3. Go to BigQuery
section
First of all, we have to create
a new project into the
Google API Console and add
the BigQuery API to that
project.
BigQuery - Demo - Web Console
Data sets
https://bigquery.cloud.google.com/
Query results /
Data set records
Query executed over
the data sets
Query completion
measures
● BigQuery offers a web console to upload
any data set and execute any query.
● Also test data sets are available to start
using the system immediately.
BigQuery - Demo - Loading Data
● It will not allow to create data sets neither upload information until we activate the billing for the
project.
● Two ways for uploading data:
○ Coming directly from our machine
○ Or we can use Google Cloud Storage as intermediary (better performance for big data sets).
○ Data uploaded as CSV or JSON (flat or nested) files.
● Simple example: Upload list of stations from our machine
○ Create a new dataset named "hubway" and create a new table "stations"
○ Table schema: id:integer, name:string, des:string, install:boolean, locked:boolean, temporary:
boolean, lat:float, lng:float
List of steps to
upload data from
local disk
Data source for the example: http://hubwaydatachallenge.
org/submission/leaderboard/
END

Contenu connexe

Tendances

Tendances (20)

The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database clusterGrokking Techtalk #40: Consistency and Availability tradeoff in database cluster
Grokking Techtalk #40: Consistency and Availability tradeoff in database cluster
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Presto Summit 2018 - 09 - Netflix Iceberg
Presto Summit 2018  - 09 - Netflix IcebergPresto Summit 2018  - 09 - Netflix Iceberg
Presto Summit 2018 - 09 - Netflix Iceberg
 
Bigquery 101
Bigquery 101Bigquery 101
Bigquery 101
 
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
Trino: A Ludicrously Fast Query Engine - Pulsar Summit NA 2021
 
2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph2019.06.27 Intro to Ceph
2019.06.27 Intro to Ceph
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
BigQuery walk through.pptx
BigQuery walk through.pptxBigQuery walk through.pptx
BigQuery walk through.pptx
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
MongoDB
MongoDBMongoDB
MongoDB
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Ceph Introduction 2017
Ceph Introduction 2017  Ceph Introduction 2017
Ceph Introduction 2017
 
Apache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the CoversApache Iceberg: An Architectural Look Under the Covers
Apache Iceberg: An Architectural Look Under the Covers
 
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
ClickHouse Data Warehouse 101: The First Billion Rows, by Alexander Zaitsev a...
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 

Similaire à Google Dremel. Concept and Implementations.

project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
Aswini Ashu
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
aswini pilli
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
Steve Min
 

Similaire à Google Dremel. Concept and Implementations. (20)

Google Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacksGoogle Cloud lightning talk @MHacks
Google Cloud lightning talk @MHacks
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
GCP Slide.pptx
GCP Slide.pptxGCP Slide.pptx
GCP Slide.pptx
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
 
Google App Engine 7 9-14
Google App Engine 7 9-14Google App Engine 7 9-14
Google App Engine 7 9-14
 
Google Cloud @ Hackathons (2020)
Google Cloud @ Hackathons (2020)Google Cloud @ Hackathons (2020)
Google Cloud @ Hackathons (2020)
 
A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite A fresh look at Google’s Cloud by Mandy Waite
A fresh look at Google’s Cloud by Mandy Waite
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
project--2 nd review_2
project--2 nd review_2project--2 nd review_2
project--2 nd review_2
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
Google Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teamsGoogle Cloud Platform for Data Science teams
Google Cloud Platform for Data Science teams
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvediFundamental question and answer in cloud computing quiz by animesh chaturvedi
Fundamental question and answer in cloud computing quiz by animesh chaturvedi
 
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
Scale with a smile with Google Cloud Platform At DevConTLV (June 2014)
 
Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)
 
Gluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with HadoopGluent Extending Enterprise Applications with Hadoop
Gluent Extending Enterprise Applications with Hadoop
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Dernier (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Google Dremel. Concept and Implementations.

  • 1. Google Dremel Vicente Orjales - vorjales@gmail.com http://www.linkedin.com/in/vorjales
  • 2. Index ● Introduction. ● Google and Big Data. History. ● Columnar representation of Nested Data. ● Query Execution. ● Google Dremel vs. Map-Reduce. ● Implementations: Google BigQuery and Apache Drill ● Case Studies. ● Demo - BigQuery. ● Appendix (References and Screenshots): SLIDES 16-26
  • 3. Introduction.What is Dremel? Google Dremel is a system developed by Google for interactively querying large data sets. ● Near real time interactive analysis (instead batch processing). SQL- like query language. ● Nested data (most of unstructured data supported) with a column storage representation. ● Tree architecture (as in web search): multi-level execution trees for query processing. It is widely used inside Google and available for the public as Google BigQuery. There is also an open source implementation named Apache Drill (formerly Open Dremel) still in beta phase.
  • 4. Google and Big Data. History. ● 2003: Google File System. Scalable distributed file system for data intensive applications built over cheap commodity hardware. ● 2004: Map-Reduce. Programming model for large data sets. Automatically parallelizes jobs in large cluster of commodity machines. Inspiration for Hadoop. ● 2006: Bigtable. Distributed storage system for structured data. Inspiration for NoSQL databases ● 2010: Percolator. Adds transactions and locks at table and row level. ● 2010: Pregel. Scalable Graph Computing. Mining data from social graphs. ● 2010: Dremel. Ner real time solution with SQL-like language. Model for Google BigQuery.
  • 5. Columnar Representation of Nested Data All values of a nested field such as A.B.C are stored contiguously. Therefore, A.B.C can be retrieved without reading A.E, A.B.D, etc. Columnar storage proved has been successfully used for flat relational data, but making it work for the nested data within Dremel was one the challenges in the Dremel design: how to preserve all structural information and be able to reconstruct records from an arbitrary subset of fields.
  • 7. Columnar Representation of Nested Data record (r=0) has repeated Language (r=0) has repeated
  • 8. Query execution Multi-level serving tree for executing queries: ● Root Server: receive incoming queries, read metadata from tables, route query to next level. ● Leaf Server: Communication and access to the data layer for retrieving the results. ● Each server has an internal execution tree that corresponds with a physical query execution plan.
  • 9. Query execution - Example: count()
  • 10. Google Dremel vs Map-Reduce. M-R Batch Processing Not query language. May be added with extra modules (e.g.: Hive in Hadoop) Data organization depends on the source Hadoop implementation: Considerable admin effort Dremel Interactive query SQL-like query language Nested data. Column oriented data organization. BigQuery implementation: Admin transparent to the user.
  • 11. Google Dremel vs Map-Reduce. ● Emerged at different times and with different motivations: MR focused on distributed processing for large data sets, while Gremel driver was ad hoc. queries in near real time. ● Some authors argue that MR approach is not enough for the new big data requirements companies are confronting today (real time, complex joins, ACID...). ● While web companies has moved forward with their own products (e.g. Google with Dremel or Pregel), most of "conventional" entreprises has decided to buy instead build, selecting and investing in Hadoop based products in most of cases (Cloudera, Hortonworks, EMC). The market tendency is to keep building Hadoop based solutions and enrich them with extra layers providing better real time support and scalability, while web companies keep innovating with new products. In this scenario Dremel and MR may be complementary instead of alternative approachs.
  • 12. Dremel Implementations ● Google BigQuery ○ It is the externalization of the Dremel product, making it available to the public as part of the Google Cloud. ○ Data upload: directly to BigQuery or through Google Cloud Storage (better performance for big large data sets). ○ Data manipulation: web interface / Language API / REST API ● Apache Drill (formerly Open Dremel) ○ Open Source implementation of Google Dremel. ○ Apache License (Apache Incubator). http://incubator.apache.org/drill/ ○ Still Beta (Alpha?) Phase
  • 13. BigQuery - Case studies ● Top advertiser network in Brazil. ● Challenge: process millions of records generated daily (impressions, click-through, views) to be sure system is targeting ads effectively. ● Solution: BigQuery to gain real time insights into more than 3 billion ads in 350K blogs and webs. ● Travel agency with bus ticketing in India. ● Challenge: Analyze booking and inventory from systems of hundreds of bus operators serving more than 10,000 routes. Hadoop takes much time and requires specialized staff. ● Solution: BigQuery allows to quickly detect improvement opportunities (e.g.: routes with available seats) ● its customer has 16 bungalow villages in Europe. ● Challenge: Forecast number of vacationers and determine optimal pricing for accommodations. Limitations of current systems. Several architectures has failed (MS, SAS, BO, Excel). ● Solution: Their own application based in BigQuery performs analysis in seconds instead hours and without extra people. Source: https://developers.google.com/bigquery/case_studies/
  • 14. BigQuery - Demo - Conclusions ● No any installation or configuration of infrastructure is needed. Everything is in the Google Cloud. ● Interactive system. ● The amount of processed data depends on the selected columns (column oriented storage) ● ...and the billing depends on the amount of data we've processed!!
  • 16. BigQuery - Demo - Project Creation 1. Select existing project or create a new one 2. Choose the API we plan to use, e.g.: BigQuery API https://code.google.com/apis/console/ 3. Go to BigQuery section First of all, we have to create a new project into the Google API Console and add the BigQuery API to that project.
  • 17. BigQuery - Demo - Web Console Data sets https://bigquery.cloud.google.com/ Query results / Data set records Query executed over the data sets Query completion measures ● BigQuery offers a web console to upload any data set and execute any query. ● Also test data sets are available to start using the system immediately.
  • 18. BigQuery - Demo - Loading Data ● It will not allow to create data sets neither upload information until we activate the billing for the project. ● Two ways for uploading data: ○ Coming directly from our machine ○ Or we can use Google Cloud Storage as intermediary (better performance for big data sets). ○ Data uploaded as CSV or JSON (flat or nested) files. ● Simple example: Upload list of stations from our machine ○ Create a new dataset named "hubway" and create a new table "stations" ○ Table schema: id:integer, name:string, des:string, install:boolean, locked:boolean, temporary: boolean, lat:float, lng:float List of steps to upload data from local disk Data source for the example: http://hubwaydatachallenge. org/submission/leaderboard/
  • 19. END