SlideShare une entreprise Scribd logo
1  sur  86
Télécharger pour lire hors ligne
How Google Does Big Data
Solving Big Data Problems at Google Scale
James Chittenden
Cloud Platform Engineer

Google confidential │ Do not distribute
Agenda
1

Quick Google Cloud Platform Overview

2

Google & Big Data

3

Running Hadoop on GCE & Leveraging BigQuery

4

Q&A

Google confidential │ Do not distribute
Agenda
1

Quick Google Cloud Platform Overview

2

Google & Big Data

3

Running Hadoop on GCE & Leveraging BigQuery

4

Q&A

Google confidential │ Do not distribute
Enterprise

Google confidential | Do not distribute
Enterprise

Google confidential | Do not distribute
Enterprise

Google confidential | Do not distribute
Google has been running some
of the world’s largest distributed
systems with unique and
stringent requirements.

Images by Connie Zhou

Google confidential │ Do not distribute
"This is what makes Google Google:
its physical network, its thousands of
fiber miles, and those many thousands
of servers that, in aggregate, add up to
the mother of all clouds."
- Wired 'Google Throws Open Doors To Its Top Secret Data Center',

Wired, October

2012

Google confidential | Do not distribute
For the past 15 years, Google
has been building out the
world’s fastest, most powerful,
highest quality cloud
infrastructure on the planet.

Images by Connie Zhou

Google confidential │ Do not distribute
Cloud Platform is built on the same
infrastructure that powers Google

Images by Connie Zhou

Google confidential | Do not distribute
Google's Global OpenFlow Network
Google Innovations in Software
MapReduce

2002

2004

GFS

Cloud Platform

Dremel

2006

Big Table

2008

Colossus

2010

2012

Spanner

Google confidential | Do not distribute
Investing In Our Cloud
$2.9B in additional data center investments worldwide

Cloud Platform

Google confidential | Do not distribute
Compute

Storage

App Services

Compute Engine

Cloud Storage

BigQuery

App Engine

Cloud SQL

Cloud Endpoints

Cloud Datastore

Cloud Platform

Google confidential | Do not distribute
Agenda
1

Quick Google Cloud Platform Overview

2

Google & Big Data

3

Running Hadoop on GCE & Leveraging BigQuery

4

Q&A

Google confidential │ Do not distribute
Complex Big Data Landscape
Big Data Landscape
The Flood of BIG DATA

5.3

US $14.3

191

BILLION EMAILS
SENT EVERY DAY 2

BILLION SPENT IN
U.S. ON MOBILE ADS IN 2013 4

US $109.7

TRILLION ONLINE AD
IMPRESSIONS IN 2012 1

500+

TERABYTES OF DATA
UPLOADED DAILY 5

BILLION ESTIMATED
FOR 2014 3

MOBILE
AD

EMAIL
ONLINE AD
IMPRESSIONS

FACEBOOK
ONLINE
AD SPEND
How does Google process
Big Data?

Google confidential │ Do not distribute
Big Data processing example - application logs

User
requests

Application
servers

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
Big Data processing example - application logs

App persistence

BigTable / GFS

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
Big Data processing example - application logs
MapReduce

App persistence

BigTable / GFS
Dremel
MapReduce
App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
Big Data processing example - application logs
MapReduce

App persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
Big Data processing example - application logs
MapReduce

App persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
How does this apply
to my business?

Google confidential │ Do not distribute
Social Data

Internal Data

WHAT DO CUSTOMERS
SAY ABOUT MY
BUSINESS?

HOW CAN I COMBINE ALL
OF MY TRANSACTIONAL
DATA WITH OTHER
DATA?

Google Data
Sensor Data
HOW CAN I CAPTURE
MORE CUSTOMER
INTERACTIONS?

01001010110010001001000100
100110001110011v10011000100110001
01001010110010001001000100010001
01100011100010 01100011100001110
10011001000011001100101 01

HOW CAN GOOGLE
DATA HELP ME WITH
MULTI-CHANNEL
MODELING?
Sample use cases - BigQuery usage today
Display ads analytics
(global top-5 media agency)

Ads network reporting
(3rd party mobile ads)

Fleet reservations
(online travel operations)

Mobile app statistics
(online reading vendor)

Revenue optimization
(holiday/travel properties)

Analyze global campaign performance for F500 clients
1 client = 20GB/day of DoubleClick impressions logs

Deliver x-platform performance analytics dashboards
1B events/day x 100s of ads customers

Monitor customer demand vs supply shortfalls
10,000 routes x 1000s customers = millions of daily events

Usage analysis on 60M installs; 10M active users
2B API requests/day, 20GB log data/day

Correlate marketing effectiveness vs global reservations
10MBs / day from multiple data warehouses

Interop NYC - Oct 2, 2012
Today we see the same trends happening in industry
Opportunities
Single place to capture data
Combine data from different sources

Challenges
The data is large
Up to 100-200 TB
High rate of growth
Up to 100-200 GB / day

Detect patterns and correlations
Easily share data insights with org
Distributed decision making

Non-relational
Unstructured or semi-structured
Multiple sources
SaaS, Mobile devices, legacy

Interop NYC - Oct 2, 2012
Processing Data: MapReduce
●

2004: Google releases the MapReduce* framework paper

●

2011: Big Data "analysis" using Hadoop and other Google inspired technology
Visualization

1
Google App Engine
1

Data Collection

2

ETL

2

Raw Data Storage

4

6

Auto-scale
web services

Data
Ingestion

4
Google Compute Engine

3

Web Apps

Scalable VMs

BI tools

Aggregation

Collection
Transformation

Data processing
Cleansing

3
Google Cloud Storage
5

Analytics Storage

6

TBs of Data

Visualization

Raw Data Storage
Big Query Staging

Cloud Storage
Hadoop Connector

5
Google Big Query
Raw Data Storage

Interactive
Dashboards + apps

Ad-hoc queries
REST API

Google
Spreadsheets
1

Visualization
Google Compute Engine

1

2

Scalable VMs

6

Data Collection
Collection
Transformation

ETL

Data processing
Cleansing

BI tools
3

Raw Data Storage
2

4

4

Aggregation

5

Analytics Storage

6

Visualization

Interactive
Dashboards + apps

Google
Spreadsheets
1

Visualization
Google Cloud Storage

1

2

TBs of Data

6

Data Collection
Raw Data Storage
Big Query Staging

ETL

BI tools
3

Huge Storage Capacity

Raw Data Storage
3

4

Aggregation

5

Analytics Storage

6

Visualization

Consistent, fast
speeds
Multiple storage
types

Interactive
Dashboards + apps

Google
Spreadsheets
1

Visualization
Google BigQuery

1

2

Ad-hoc queries

6

Data Collection
AdHoc Queries
REST API

ETL

Visualization
Partners

BI tools
3

Raw Data Storage
5

4

Aggregation

5

Analytics Storage

6

Visualization

Interactive
Dashboards + apps

Google
Spreadsheets
Agenda
1

Quick Google Cloud Platform Overview

2

Google & Big Data

3

Running Hadoop on GCE & Leveraging BigQuery

4

Q&A

Google confidential │ Do not distribute
Google Compute Engine

+

Google confidential │ Do not distribute
Demo
Hadoop on Google Compute Engine

Google confidential | Do not distribute
BigData at Google
Running Hadoop on Google Compute Engine

1. Set up Hadoop
./compute_cluster_for_hadoop.py setup google-platform-demo
hadoop-demo-jameschi

Google confidential | Do not distribute
BigData at Google
Running Hadoop on Google Compute Engine

2. Start Cluster
./compute_cluster_for_hadoop.py start google-platform-demo
hadoop-demo-jameschi 500

Google confidential | Do not distribute
BigData at Google
Running Hadoop on Google Compute Engine

3. Start MapReduce
./compute_cluster_for_hadoop.py mapreduce google-platformdemo hadoop-demo-jameschi 
--input gs://hadoop-demo-jameschi-input 
--output gs://hadoop-demo-jameschi-output 
--mapper sample/shortest-to-longest-mapper.pl 
--reducer sample/shortest-to-longest-reducer.pl 
--mapper-count 5 
--reducer-count 1

Google confidential | Do not distribute
BigData at Google
Running Hadoop on Google Compute Engine

4. Shutdown Cluster
./compute_cluster_for_hadoop.py shutdown google-platformdemo

Google confidential | Do not distribute
Google Compute Engine Costs
Total Costs for Running Hadoop:

5 node Hadoop cluster x 20 minutes = $0.21
5,000 node Hadoop cluster x 20 min = $8.67

Google confidential | Do not distribute
How Google Approaches Analytics
●

MapReduce based analysis can be slow for ad-hoc queries

●

Managing data centers and tuning software takes time & money

●

Analytics tools should be services
How Google Approaches Analytics
Dremel
Ad-hoc query system for terabyte datasets
●
●
●

Query execution tree
Column Oriented records
...and a lot of nodes
Big Data processing example - application logs
MapReduce

App persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Interop NYC - Oct 2, 2012
What is BigQuery?
Google BigQuery: An Overview

BigQuery: a fully-managed data analytics service in the cloud.
Unlimited storage. Interactive analysis on multi-terabyte datasets.
Adwords
DCLK
Analytics

Other BI
Tools

Google
Spreadsheets

Scalable Storage
SQL

API

Corporate data
3rd party data
Store all your data
in the cloud

App Engine
App

Co-Workers

Analyze interactively
Mash it up

Securely Share/
distribute the results
Google confidential | Do not distribute
Cloud-based data analytics pipeline
Application
level code

Log data

BI tools

Logstore

Backends +
MapReduce
BigQuery
Structured data

Datastore
SQL

API

Interactive
Dashboards + apps

Hadoop

Unstructured data

Google Spreadsheets
Cloud Storage

Store

Extract &
Transform

Analyze
interactively

Serve

Custom logic & 3rd party
libraries

Interop NYC - Oct 2, 2012
Origins of BigQuery

+

Google confidential │ Do not distribute
Origins of BigQuery
Google BigQuery: An Overview

How can a business
analyst find Top 20
Apps in matter of
seconds?
Google confidential | Do not distribute
Origins of BigQuery
Google BigQuery: An Overview

How can a business analyst find Top 20 Apps in
matter of seconds?

SELECT
top(appId, 20) AS app,
count(*) AS count
FROM installlog.2012;
ORDER BY
count DESC

Result in ~20 seconds!
Google confidential | Do not distribute
Origins of BigQuery
Google BigQuery: An Overview

How do you find slow
running servers from
billions of log entries in seconds?

Google confidential | Do not distribute
Origins of BigQuery
Google BigQuery: An Overview

How do you find slow running servers from
billions of log entries - in seconds?
SELECT
count(*) AS count, source_machine AS
machine
FROM product.product_log.live
WHERE
elapsed_time > 4000
GROUP BY
source_machine
ORDER BY
count DESC

Result in ~20 seconds!
Google confidential | Do not distribute
Origins of BigQuery
Google BigQuery: An Overview

Google BigQuery

Google confidential | Do not distribute
BigQuery -- Sweetspots
Google BigQuery: An Overview

Use cases/applications where
Interactive analysis on large data sets is a requirement
Data mashups at scale and ease is important
Elasticity of environment is important
Minimal administration is required
Empowering Analysts or Data scientist with self service capabilities

Google confidential | Do not distribute
Demo
Ad-hoc Analysis with BigQuery

Google confidential | Do not distribute
Why is BigQuery so Special?

+

Google confidential │ Do not distribute
Why BigQuery is Special?
Google BigQuery: An Overview

0

ZERO Administration for Performance and Scale

No complex data architecture required.
●
●

Simple denormalized data structure
Data can be loaded in csv or JSON format

Supports Open Standards
●
●
●

SQL like language
Standard BI/ETL tools supported
REST API Support for integrating analytics programmatically
Google confidential | Do not distribute
Why BigQuery is Special?
Google BigQuery: An Overview

Nested Field Support
●
●

Analyze details for headers (items of the order, clicks of the session in
single SQL statement)
Ingest Web data (Typically in JSON format) directly into BigQuery

Query Result Caching at Scale
●
●

Caching of query results at Scale (24 hours best effort)
Option to re-generate results / update cache

Real-time Data Ingest and Reporting
●
●

Data ingest at 100 rows/second (with peak 1000 rows/second) into
BigQuery
Support for real-time to near real-time query/reporting
Google confidential | Do not distribute
Why BigQuery is Special?
Google BigQuery: An Overview

And many more...
SQL like Query Language and REST based APIS
High performance queries on strings (use of REGEXP and pattern matching)
Data window functions and many aggregation capabilities
Table Wildcards
Table decorators for time based queries..

Integration with Google Stack
Google Analytics Premium (hit level data) - BigQuery
Google App Engine Datastore to BigQuery - Trusted Tester
Google confidential | Do not distribute
What BigQuery is Not?

+

Google confidential │ Do not distribute
Why BigQuery is Not?
Google BigQuery: An Overview

Relational Database Management System (RDBMS)
●
●

Data is stored in BigQuery in Columnar fashion
BigQuery tables are immutable (no updates or deletes)*

Not a replacement for existing Data warehouse en-mass
●
●

Specifically designed for large data sets
Queries requiring massive processing

On-premise solution or Appliance
●
●

It is available only in Google Cloud in a fully hosted fashion
Data needs to be in Google Cloud (BigQuery) for analysis

* alternate ways to support these capabilities
Google confidential | Do not distribute
Columnar Storage and Tree Architecture of Dremel
Why Dremel can be so drastically fast as the examples show?
The answer can be found in two core technologies which gives Dremel this
unprecedented performance:
● Columnar Storage. Data is stored in a columnar storage fashion which
makes possible to achieve very high compression ratio and scan throughput.
● Tree Architecture is used for dispatching queries and aggregating results
across thousands of machines in a few seconds.

Google confidential | Do not distribute
Dremel: Key to Run Business at “Google Speed”
Google has been using Dremel in production since 2006 and
has been continuously evolving it for the last 6 years.
Examples of applications include:
●
●
●
●
●
●
●
●
●
●
●

Analysis of crawled web documents
Tracking install data for applications in the Android Market
Crash reporting for Google products
OCR results from Google Books
Spam analysis
Debugging of map tiles on Google Maps
Tablet migrations in managed Bigtable instances
Results of tests run on Google’s distributed build system
Disk I/O statistics for hundreds of thousands of disks
Resource monitoring for jobs run in Google’s data centers
Symbols and dependencies in Google’s codebase
Google confidential | Do not distribute
BigQuery Sample Use Cases

Google confidential | Do not distribute
Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics

Google confidential | Do not distribute
Sensor Data Analytics
Data Sensing Lab
Sensors

Generate

Endpoints/App Engine

Ingest/Process

Datastore

Store

Compute Engine/
BigQuery

Compute

Visualize
Analyze
Google confidential | Do not distribute
Mobile and Social Gaming User Analysis

Notice trend change

Slice user data, identify
segments

Compare segments
vs general population
Google confidential | Do not distribute
BigQuery versus MapReduce
● BigQuery is designed as an interactive data analysis tool for
large datasets
● MapReduce is designed as a programming framework to
batch process large datasets

Google confidential | Do not distribute
BigQuery versus MapReduce

Google confidential | Do not distribute
BigQuery versus MapReduce

Google confidential | Do not distribute
BigQuery: Key Differentiators
Agile : BigData Analytic Hosted Environment
● Fully managed cloud service
● Interactive querying on Terabytes of data
● Always on
Cost Effective: Low TCO / TCD
● No CapEx as no hardware/software to own
● No administration or management overhead
● Pay for only what you use.
Non disruptive, fits right into IT landscape
● Expanding ISV ecosystem (BI/ETL)
● Use of standards (SQL like, REST API etc...)
● Integrated with other Google Services (GCE, GAE, GCE etc...)
Google confidential | Do not distribute
BigQuery versus MapReduce
Secure and Reliable
● Uptime SLA (99.9% uptime)
● OAUTH 2.0 Support, User/Group ACL
● Uses Google's core network/data center backbone
Innovative
● Innovation momentum: 2 release since launch
● New capabilities/concepts: e.g. Queryable Nested fields
● e.g. Use of RegExp without performance loss

Google confidential | Do not distribute
Recent Manufacturer POC

Local
Processing

BIME
Dashboards

Fusion
Tables

AppEngine

Cloud Storage

GCE

Cloud Storage

Big Query

CSV
Recent Manufacturer POC
●

●

Achieved a 200,000 rows per
second ingestion rate, and it took
about 10 minutes from when the
file is first dropped into GCS to
when you can query the data in
BigQuery.
Demonstrated ingesting 120
million rows in 10 minutes into
BigQuery.
Recent Manufacturer POC
●

●

Leveraged Google Fusion Tables
to easily collaborate on datasets
with business and partners.
Also discovered manufacturer
had machines that consumed
250+ billion gallons of fuel as of
July 30th.
Recent Manufacturer POC
●

●

Spun up BIME Dashboard to
interactively connect to BigQuery
dataset.
With dashboards connected to BQ
(BIME, Tableau, QlikView, etc),
we can drill into billions of rows
near real-time data with ease.
BigQuery: Ever Expanding Ecosystem of Partners

Google confidential | Do not distribute
Agenda
1

Quick Google Cloud Platform Overview

2

Google & Big Data

3

Running Hadoop on GCE & Leveraging BigQuery

4

Q&A

Google confidential │ Do not distribute
Thank You
cloud.google.com
Images by Connie Zhou

Google confidential │ Do not distribute
cloud.google.com

Google confidential │ Do not distribute
Google confidential | Do not distribute

Contenu connexe

Tendances

An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleData Con LA
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsPatrick Chanezon
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperMárton Kodok
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQueryDharmesh Vaya
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comAlex Van Boxel
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...javier ramirez
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsTokyo University of Science
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesData Driven Innovation
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...Márton Kodok
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDatatdc-globalcode
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data winKen Taylor
 

Tendances (20)

Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of GoogleAn indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
An indepth look at Google BigQuery Architecture by Felipe Hoffa of Google
 
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIsGDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
GDD Brazil 2010 - Google Storage, Bigquery and Prediction APIs
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
BigQuery for Beginners
BigQuery for BeginnersBigQuery for Beginners
BigQuery for Beginners
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...Big Data Analytics with Google BigQuery.  By Javier Ramirez. All your base Co...
Big Data Analytics with Google BigQuery. By Javier Ramirez. All your base Co...
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay PlatformsOn Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
Make your data talk
Make your data talkMake your data talk
Make your data talk
 
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
VoxxedDays Bucharest 2017 - Powering interactive data analysis with Google Bi...
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
BigQuery for the Big Data win
BigQuery for the Big Data winBigQuery for the Big Data win
BigQuery for the Big Data win
 

En vedette

node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute EngineArun Nagarajan
 
Introduction to Web Analytics @ ECS Brussels
Introduction to Web Analytics @ ECS BrusselsIntroduction to Web Analytics @ ECS Brussels
Introduction to Web Analytics @ ECS BrusselsDavid Hachez
 
Intro to Big Data using Hadoop
Intro to Big Data using Hadoop Intro to Big Data using Hadoop
Intro to Big Data using Hadoop Sergejus Barinovas
 
Big data in action
Big data in actionBig data in action
Big data in actionTu Pham
 
Distributed batch processing with Hadoop
Distributed batch processing with HadoopDistributed batch processing with Hadoop
Distributed batch processing with HadoopFerran Galí Reniu
 
Scaling your logging infrastructure using syslog-ng
Scaling your logging infrastructure using syslog-ngScaling your logging infrastructure using syslog-ng
Scaling your logging infrastructure using syslog-ngPeter Czanik
 
RightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud AnalyticsRightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud AnalyticsRightScale
 
PCM_Cloud_Flyer
PCM_Cloud_FlyerPCM_Cloud_Flyer
PCM_Cloud_FlyerIgor Belic
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveThe_IPA
 
RightScale Webinar: Considerations For Choosing Cloud Providers
RightScale Webinar:   Considerations For Choosing Cloud ProvidersRightScale Webinar:   Considerations For Choosing Cloud Providers
RightScale Webinar: Considerations For Choosing Cloud ProvidersRightScale
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerVertiCloud Inc
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Nathan Bijnens
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in SearchAmund Tveit
 
NetTask DE Reseller Models Introduction Cloud Services
NetTask DE Reseller Models Introduction Cloud ServicesNetTask DE Reseller Models Introduction Cloud Services
NetTask DE Reseller Models Introduction Cloud ServicesNetTask GmbH
 
Splunk corporate overview German 2012
Splunk corporate overview German 2012Splunk corporate overview German 2012
Splunk corporate overview German 2012jenny_splunk
 
WT16 - Cloud Services Portfolio
WT16 - Cloud Services Portfolio WT16 - Cloud Services Portfolio
WT16 - Cloud Services Portfolio Cloud_Services
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platformIBM Sverige
 

En vedette (20)

node.js on Google Compute Engine
node.js on Google Compute Enginenode.js on Google Compute Engine
node.js on Google Compute Engine
 
Introduction to Web Analytics @ ECS Brussels
Introduction to Web Analytics @ ECS BrusselsIntroduction to Web Analytics @ ECS Brussels
Introduction to Web Analytics @ ECS Brussels
 
Intro to Big Data using Hadoop
Intro to Big Data using Hadoop Intro to Big Data using Hadoop
Intro to Big Data using Hadoop
 
Big data in action
Big data in actionBig data in action
Big data in action
 
Waldo-gcp
Waldo-gcpWaldo-gcp
Waldo-gcp
 
Distributed batch processing with Hadoop
Distributed batch processing with HadoopDistributed batch processing with Hadoop
Distributed batch processing with Hadoop
 
Scaling your logging infrastructure using syslog-ng
Scaling your logging infrastructure using syslog-ngScaling your logging infrastructure using syslog-ng
Scaling your logging infrastructure using syslog-ng
 
RightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud AnalyticsRightScale News November 2013: Launch of Cloud Analytics
RightScale News November 2013: Launch of Cloud Analytics
 
PCM_Cloud_Flyer
PCM_Cloud_FlyerPCM_Cloud_Flyer
PCM_Cloud_Flyer
 
Google’s cloud strategy
Google’s cloud strategyGoogle’s cloud strategy
Google’s cloud strategy
 
Big Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM PerspectiveBig Data and Analytics: The IBM Perspective
Big Data and Analytics: The IBM Perspective
 
RightScale Webinar: Considerations For Choosing Cloud Providers
RightScale Webinar:   Considerations For Choosing Cloud ProvidersRightScale Webinar:   Considerations For Choosing Cloud Providers
RightScale Webinar: Considerations For Choosing Cloud Providers
 
YARN - Hadoop's Resource Manager
YARN - Hadoop's Resource ManagerYARN - Hadoop's Resource Manager
YARN - Hadoop's Resource Manager
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
Google mesa
Google mesaGoogle mesa
Google mesa
 
Mapreduce in Search
Mapreduce in SearchMapreduce in Search
Mapreduce in Search
 
NetTask DE Reseller Models Introduction Cloud Services
NetTask DE Reseller Models Introduction Cloud ServicesNetTask DE Reseller Models Introduction Cloud Services
NetTask DE Reseller Models Introduction Cloud Services
 
Splunk corporate overview German 2012
Splunk corporate overview German 2012Splunk corporate overview German 2012
Splunk corporate overview German 2012
 
WT16 - Cloud Services Portfolio
WT16 - Cloud Services Portfolio WT16 - Cloud Services Portfolio
WT16 - Cloud Services Portfolio
 
Ibm big data-platform
Ibm big data-platformIbm big data-platform
Ibm big data-platform
 

Similaire à How Google Does Big Data - DevNexus 2014

Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...
Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...
Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...Google Cloud Platform - Japan
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data CenterAbe Usher
 
Social Data Week - London - Google Session
Social Data Week - London - Google SessionSocial Data Week - London - Google Session
Social Data Week - London - Google SessionTableau Software
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataTu Le Dinh
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloudTu Pham
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...Codecamp Romania
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen KeynoteData Con LA
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.Shakir Ali
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudDATAVERSITY
 
Dataiku - google cloud platform roadshow - october 2013
Dataiku  - google cloud platform roadshow - october 2013Dataiku  - google cloud platform roadshow - october 2013
Dataiku - google cloud platform roadshow - october 2013Dataiku
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Vicente Orjales
 
Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)David Torres
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
 
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaSharad Agarwal
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russiarusbase.vc
 

Similaire à How Google Does Big Data - DevNexus 2014 (20)

Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...
Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...
Google Developers Summit Tokyo - Google Cloud Platform で知る Google クラウドの「Googl...
 
BigData Meets the Federal Data Center
BigData Meets the Federal Data CenterBigData Meets the Federal Data Center
BigData Meets the Federal Data Center
 
Social Data Week - London - Google Session
Social Data Week - London - Google SessionSocial Data Week - London - Google Session
Social Data Week - London - Google Session
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Eric Andersen Keynote
Eric Andersen KeynoteEric Andersen Keynote
Eric Andersen Keynote
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.re:Introduce Big Data and Hadoop Eco-system.
re:Introduce Big Data and Hadoop Eco-system.
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
 
Dataiku - google cloud platform roadshow - october 2013
Dataiku  - google cloud platform roadshow - october 2013Dataiku  - google cloud platform roadshow - october 2013
Dataiku - google cloud platform roadshow - october 2013
 
Big Data
Big DataBig Data
Big Data
 
Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.Google Dremel. Concept and Implementations.
Google Dremel. Concept and Implementations.
 
Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)Google not all clouds are created equal - sap sapphire 2014 (1)
Google not all clouds are created equal - sap sapphire 2014 (1)
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
 
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 NoidaBig Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
Big Data Ecosystem at InMobi, Nasscom ATC 2013 Noida
 
Google на конференции Big Data Russia
Google на конференции Big Data RussiaGoogle на конференции Big Data Russia
Google на конференции Big Data Russia
 

Dernier

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

How Google Does Big Data - DevNexus 2014

  • 1. How Google Does Big Data Solving Big Data Problems at Google Scale James Chittenden Cloud Platform Engineer Google confidential │ Do not distribute
  • 2. Agenda 1 Quick Google Cloud Platform Overview 2 Google & Big Data 3 Running Hadoop on GCE & Leveraging BigQuery 4 Q&A Google confidential │ Do not distribute
  • 3. Agenda 1 Quick Google Cloud Platform Overview 2 Google & Big Data 3 Running Hadoop on GCE & Leveraging BigQuery 4 Q&A Google confidential │ Do not distribute
  • 7. Google has been running some of the world’s largest distributed systems with unique and stringent requirements. Images by Connie Zhou Google confidential │ Do not distribute
  • 8. "This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds." - Wired 'Google Throws Open Doors To Its Top Secret Data Center', Wired, October 2012 Google confidential | Do not distribute
  • 9. For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet. Images by Connie Zhou Google confidential │ Do not distribute
  • 10. Cloud Platform is built on the same infrastructure that powers Google Images by Connie Zhou Google confidential | Do not distribute
  • 12. Google Innovations in Software MapReduce 2002 2004 GFS Cloud Platform Dremel 2006 Big Table 2008 Colossus 2010 2012 Spanner Google confidential | Do not distribute
  • 13. Investing In Our Cloud $2.9B in additional data center investments worldwide Cloud Platform Google confidential | Do not distribute
  • 14. Compute Storage App Services Compute Engine Cloud Storage BigQuery App Engine Cloud SQL Cloud Endpoints Cloud Datastore Cloud Platform Google confidential | Do not distribute
  • 15. Agenda 1 Quick Google Cloud Platform Overview 2 Google & Big Data 3 Running Hadoop on GCE & Leveraging BigQuery 4 Q&A Google confidential │ Do not distribute
  • 16. Complex Big Data Landscape Big Data Landscape
  • 17. The Flood of BIG DATA 5.3 US $14.3 191 BILLION EMAILS SENT EVERY DAY 2 BILLION SPENT IN U.S. ON MOBILE ADS IN 2013 4 US $109.7 TRILLION ONLINE AD IMPRESSIONS IN 2012 1 500+ TERABYTES OF DATA UPLOADED DAILY 5 BILLION ESTIMATED FOR 2014 3 MOBILE AD EMAIL ONLINE AD IMPRESSIONS FACEBOOK ONLINE AD SPEND
  • 18. How does Google process Big Data? Google confidential │ Do not distribute
  • 19. Big Data processing example - application logs User requests Application servers Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 20. Big Data processing example - application logs App persistence BigTable / GFS App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 21. Big Data processing example - application logs MapReduce App persistence BigTable / GFS Dremel MapReduce App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 22. Big Data processing example - application logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 23. Big Data processing example - application logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 24. How does this apply to my business? Google confidential │ Do not distribute
  • 25. Social Data Internal Data WHAT DO CUSTOMERS SAY ABOUT MY BUSINESS? HOW CAN I COMBINE ALL OF MY TRANSACTIONAL DATA WITH OTHER DATA? Google Data Sensor Data HOW CAN I CAPTURE MORE CUSTOMER INTERACTIONS? 01001010110010001001000100 100110001110011v10011000100110001 01001010110010001001000100010001 01100011100010 01100011100001110 10011001000011001100101 01 HOW CAN GOOGLE DATA HELP ME WITH MULTI-CHANNEL MODELING?
  • 26. Sample use cases - BigQuery usage today Display ads analytics (global top-5 media agency) Ads network reporting (3rd party mobile ads) Fleet reservations (online travel operations) Mobile app statistics (online reading vendor) Revenue optimization (holiday/travel properties) Analyze global campaign performance for F500 clients 1 client = 20GB/day of DoubleClick impressions logs Deliver x-platform performance analytics dashboards 1B events/day x 100s of ads customers Monitor customer demand vs supply shortfalls 10,000 routes x 1000s customers = millions of daily events Usage analysis on 60M installs; 10M active users 2B API requests/day, 20GB log data/day Correlate marketing effectiveness vs global reservations 10MBs / day from multiple data warehouses Interop NYC - Oct 2, 2012
  • 27. Today we see the same trends happening in industry Opportunities Single place to capture data Combine data from different sources Challenges The data is large Up to 100-200 TB High rate of growth Up to 100-200 GB / day Detect patterns and correlations Easily share data insights with org Distributed decision making Non-relational Unstructured or semi-structured Multiple sources SaaS, Mobile devices, legacy Interop NYC - Oct 2, 2012
  • 28. Processing Data: MapReduce ● 2004: Google releases the MapReduce* framework paper ● 2011: Big Data "analysis" using Hadoop and other Google inspired technology
  • 29. Visualization 1 Google App Engine 1 Data Collection 2 ETL 2 Raw Data Storage 4 6 Auto-scale web services Data Ingestion 4 Google Compute Engine 3 Web Apps Scalable VMs BI tools Aggregation Collection Transformation Data processing Cleansing 3 Google Cloud Storage 5 Analytics Storage 6 TBs of Data Visualization Raw Data Storage Big Query Staging Cloud Storage Hadoop Connector 5 Google Big Query Raw Data Storage Interactive Dashboards + apps Ad-hoc queries REST API Google Spreadsheets
  • 30. 1 Visualization Google Compute Engine 1 2 Scalable VMs 6 Data Collection Collection Transformation ETL Data processing Cleansing BI tools 3 Raw Data Storage 2 4 4 Aggregation 5 Analytics Storage 6 Visualization Interactive Dashboards + apps Google Spreadsheets
  • 31. 1 Visualization Google Cloud Storage 1 2 TBs of Data 6 Data Collection Raw Data Storage Big Query Staging ETL BI tools 3 Huge Storage Capacity Raw Data Storage 3 4 Aggregation 5 Analytics Storage 6 Visualization Consistent, fast speeds Multiple storage types Interactive Dashboards + apps Google Spreadsheets
  • 32. 1 Visualization Google BigQuery 1 2 Ad-hoc queries 6 Data Collection AdHoc Queries REST API ETL Visualization Partners BI tools 3 Raw Data Storage 5 4 Aggregation 5 Analytics Storage 6 Visualization Interactive Dashboards + apps Google Spreadsheets
  • 33. Agenda 1 Quick Google Cloud Platform Overview 2 Google & Big Data 3 Running Hadoop on GCE & Leveraging BigQuery 4 Q&A Google confidential │ Do not distribute
  • 34. Google Compute Engine + Google confidential │ Do not distribute
  • 35. Demo Hadoop on Google Compute Engine Google confidential | Do not distribute
  • 36. BigData at Google Running Hadoop on Google Compute Engine 1. Set up Hadoop ./compute_cluster_for_hadoop.py setup google-platform-demo hadoop-demo-jameschi Google confidential | Do not distribute
  • 37. BigData at Google Running Hadoop on Google Compute Engine 2. Start Cluster ./compute_cluster_for_hadoop.py start google-platform-demo hadoop-demo-jameschi 500 Google confidential | Do not distribute
  • 38. BigData at Google Running Hadoop on Google Compute Engine 3. Start MapReduce ./compute_cluster_for_hadoop.py mapreduce google-platformdemo hadoop-demo-jameschi --input gs://hadoop-demo-jameschi-input --output gs://hadoop-demo-jameschi-output --mapper sample/shortest-to-longest-mapper.pl --reducer sample/shortest-to-longest-reducer.pl --mapper-count 5 --reducer-count 1 Google confidential | Do not distribute
  • 39. BigData at Google Running Hadoop on Google Compute Engine 4. Shutdown Cluster ./compute_cluster_for_hadoop.py shutdown google-platformdemo Google confidential | Do not distribute
  • 40. Google Compute Engine Costs Total Costs for Running Hadoop: 5 node Hadoop cluster x 20 minutes = $0.21 5,000 node Hadoop cluster x 20 min = $8.67 Google confidential | Do not distribute
  • 41. How Google Approaches Analytics ● MapReduce based analysis can be slow for ad-hoc queries ● Managing data centers and tuning software takes time & money ● Analytics tools should be services
  • 42. How Google Approaches Analytics Dremel Ad-hoc query system for terabyte datasets ● ● ● Query execution tree Column Oriented records ...and a lot of nodes
  • 43. Big Data processing example - application logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified Interop NYC - Oct 2, 2012
  • 44. What is BigQuery? Google BigQuery: An Overview BigQuery: a fully-managed data analytics service in the cloud. Unlimited storage. Interactive analysis on multi-terabyte datasets. Adwords DCLK Analytics Other BI Tools Google Spreadsheets Scalable Storage SQL API Corporate data 3rd party data Store all your data in the cloud App Engine App Co-Workers Analyze interactively Mash it up Securely Share/ distribute the results Google confidential | Do not distribute
  • 45. Cloud-based data analytics pipeline Application level code Log data BI tools Logstore Backends + MapReduce BigQuery Structured data Datastore SQL API Interactive Dashboards + apps Hadoop Unstructured data Google Spreadsheets Cloud Storage Store Extract & Transform Analyze interactively Serve Custom logic & 3rd party libraries Interop NYC - Oct 2, 2012
  • 46. Origins of BigQuery + Google confidential │ Do not distribute
  • 47. Origins of BigQuery Google BigQuery: An Overview How can a business analyst find Top 20 Apps in matter of seconds? Google confidential | Do not distribute
  • 48. Origins of BigQuery Google BigQuery: An Overview How can a business analyst find Top 20 Apps in matter of seconds? SELECT top(appId, 20) AS app, count(*) AS count FROM installlog.2012; ORDER BY count DESC Result in ~20 seconds! Google confidential | Do not distribute
  • 49. Origins of BigQuery Google BigQuery: An Overview How do you find slow running servers from billions of log entries in seconds? Google confidential | Do not distribute
  • 50. Origins of BigQuery Google BigQuery: An Overview How do you find slow running servers from billions of log entries - in seconds? SELECT count(*) AS count, source_machine AS machine FROM product.product_log.live WHERE elapsed_time > 4000 GROUP BY source_machine ORDER BY count DESC Result in ~20 seconds! Google confidential | Do not distribute
  • 51. Origins of BigQuery Google BigQuery: An Overview Google BigQuery Google confidential | Do not distribute
  • 52. BigQuery -- Sweetspots Google BigQuery: An Overview Use cases/applications where Interactive analysis on large data sets is a requirement Data mashups at scale and ease is important Elasticity of environment is important Minimal administration is required Empowering Analysts or Data scientist with self service capabilities Google confidential | Do not distribute
  • 53. Demo Ad-hoc Analysis with BigQuery Google confidential | Do not distribute
  • 54. Why is BigQuery so Special? + Google confidential │ Do not distribute
  • 55. Why BigQuery is Special? Google BigQuery: An Overview 0 ZERO Administration for Performance and Scale No complex data architecture required. ● ● Simple denormalized data structure Data can be loaded in csv or JSON format Supports Open Standards ● ● ● SQL like language Standard BI/ETL tools supported REST API Support for integrating analytics programmatically Google confidential | Do not distribute
  • 56. Why BigQuery is Special? Google BigQuery: An Overview Nested Field Support ● ● Analyze details for headers (items of the order, clicks of the session in single SQL statement) Ingest Web data (Typically in JSON format) directly into BigQuery Query Result Caching at Scale ● ● Caching of query results at Scale (24 hours best effort) Option to re-generate results / update cache Real-time Data Ingest and Reporting ● ● Data ingest at 100 rows/second (with peak 1000 rows/second) into BigQuery Support for real-time to near real-time query/reporting Google confidential | Do not distribute
  • 57. Why BigQuery is Special? Google BigQuery: An Overview And many more... SQL like Query Language and REST based APIS High performance queries on strings (use of REGEXP and pattern matching) Data window functions and many aggregation capabilities Table Wildcards Table decorators for time based queries.. Integration with Google Stack Google Analytics Premium (hit level data) - BigQuery Google App Engine Datastore to BigQuery - Trusted Tester Google confidential | Do not distribute
  • 58. What BigQuery is Not? + Google confidential │ Do not distribute
  • 59. Why BigQuery is Not? Google BigQuery: An Overview Relational Database Management System (RDBMS) ● ● Data is stored in BigQuery in Columnar fashion BigQuery tables are immutable (no updates or deletes)* Not a replacement for existing Data warehouse en-mass ● ● Specifically designed for large data sets Queries requiring massive processing On-premise solution or Appliance ● ● It is available only in Google Cloud in a fully hosted fashion Data needs to be in Google Cloud (BigQuery) for analysis * alternate ways to support these capabilities Google confidential | Do not distribute
  • 60. Columnar Storage and Tree Architecture of Dremel Why Dremel can be so drastically fast as the examples show? The answer can be found in two core technologies which gives Dremel this unprecedented performance: ● Columnar Storage. Data is stored in a columnar storage fashion which makes possible to achieve very high compression ratio and scan throughput. ● Tree Architecture is used for dispatching queries and aggregating results across thousands of machines in a few seconds. Google confidential | Do not distribute
  • 61. Dremel: Key to Run Business at “Google Speed” Google has been using Dremel in production since 2006 and has been continuously evolving it for the last 6 years. Examples of applications include: ● ● ● ● ● ● ● ● ● ● ● Analysis of crawled web documents Tracking install data for applications in the Android Market Crash reporting for Google products OCR results from Google Books Spam analysis Debugging of map tiles on Google Maps Tablet migrations in managed Bigtable instances Results of tests run on Google’s distributed build system Disk I/O statistics for hundreds of thousands of disks Resource monitoring for jobs run in Google’s data centers Symbols and dependencies in Google’s codebase Google confidential | Do not distribute
  • 62. BigQuery Sample Use Cases Google confidential | Do not distribute
  • 63. Google confidential | Do not distribute
  • 64. Sensor Data Analytics Google confidential | Do not distribute
  • 65. Sensor Data Analytics Google confidential | Do not distribute
  • 66. Sensor Data Analytics Google confidential | Do not distribute
  • 67. Sensor Data Analytics Google confidential | Do not distribute
  • 68. Sensor Data Analytics Google confidential | Do not distribute
  • 69. Sensor Data Analytics Google confidential | Do not distribute
  • 70. Sensor Data Analytics Data Sensing Lab Sensors Generate Endpoints/App Engine Ingest/Process Datastore Store Compute Engine/ BigQuery Compute Visualize Analyze Google confidential | Do not distribute
  • 71. Mobile and Social Gaming User Analysis Notice trend change Slice user data, identify segments Compare segments vs general population Google confidential | Do not distribute
  • 72. BigQuery versus MapReduce ● BigQuery is designed as an interactive data analysis tool for large datasets ● MapReduce is designed as a programming framework to batch process large datasets Google confidential | Do not distribute
  • 73. BigQuery versus MapReduce Google confidential | Do not distribute
  • 74. BigQuery versus MapReduce Google confidential | Do not distribute
  • 75. BigQuery: Key Differentiators Agile : BigData Analytic Hosted Environment ● Fully managed cloud service ● Interactive querying on Terabytes of data ● Always on Cost Effective: Low TCO / TCD ● No CapEx as no hardware/software to own ● No administration or management overhead ● Pay for only what you use. Non disruptive, fits right into IT landscape ● Expanding ISV ecosystem (BI/ETL) ● Use of standards (SQL like, REST API etc...) ● Integrated with other Google Services (GCE, GAE, GCE etc...) Google confidential | Do not distribute
  • 76. BigQuery versus MapReduce Secure and Reliable ● Uptime SLA (99.9% uptime) ● OAUTH 2.0 Support, User/Group ACL ● Uses Google's core network/data center backbone Innovative ● Innovation momentum: 2 release since launch ● New capabilities/concepts: e.g. Queryable Nested fields ● e.g. Use of RegExp without performance loss Google confidential | Do not distribute
  • 78. Recent Manufacturer POC ● ● Achieved a 200,000 rows per second ingestion rate, and it took about 10 minutes from when the file is first dropped into GCS to when you can query the data in BigQuery. Demonstrated ingesting 120 million rows in 10 minutes into BigQuery.
  • 79. Recent Manufacturer POC ● ● Leveraged Google Fusion Tables to easily collaborate on datasets with business and partners. Also discovered manufacturer had machines that consumed 250+ billion gallons of fuel as of July 30th.
  • 80. Recent Manufacturer POC ● ● Spun up BIME Dashboard to interactively connect to BigQuery dataset. With dashboards connected to BQ (BIME, Tableau, QlikView, etc), we can drill into billions of rows near real-time data with ease.
  • 81. BigQuery: Ever Expanding Ecosystem of Partners Google confidential | Do not distribute
  • 82. Agenda 1 Quick Google Cloud Platform Overview 2 Google & Big Data 3 Running Hadoop on GCE & Leveraging BigQuery 4 Q&A Google confidential │ Do not distribute
  • 84. cloud.google.com Images by Connie Zhou Google confidential │ Do not distribute
  • 86. Google confidential | Do not distribute