SlideShare une entreprise Scribd logo
1  sur  37
Télécharger pour lire hors ligne
Big Data Analytics with
Google Cloud Platform
Speaker
William Vambenepe
Google Cloud Platform
Lead Product Manager for Big Data

Twitter: @vambenepe
Google ran into
“Big Data” problems
in the process of
building its business...
Big data at Google scale
How much video is uploaded
to YouTube every minute?
How many active Gmail
users are there?
How large is Google's web
search index?
How long does it take for
Google to respond to a
search query?

100 hours

500M+

100PB+
(over 100,000 TBs)

0.25 seconds
...now that the technology
is available, its usage is
spreading to all industries.
Conventional Methods for Data & Analytics

Focused on optimizing core business processes and
capture details about high interactions/transactions
Conventional Methods for Data & Analytics
Stopped collecting data at the nearest boundary of business
process:
●
●

Not instrumented for tracking of product usage by customer
Instrumentation for tracking supplier performance based on supplier specific
situation

Businesses decision makers fill data gaps with gut feel...
Rise of Big Data enabled by Tech Innovation

It is Possible to instrument
data capture at any level of
business process...
...and to store and
process it easily
and efficiently.
Big Data enabled by Tech Innovation

Highly scalable, performant and cost effective
computational capabilities in the cloud makes it
possible to store vast amounts of data from many
sources
For the past 15 years, Google
has been building out the
world’s fastest, most
powerful, highest quality
cloud infrastructure on the
planet.
Cloud Platform is built on the
same infrastructure that
powers Google.
Google's
Platform
"[Google's] ability to build,
organize, and operate a huge
network of servers and fiberoptic cables with an efficiency
and speed that rocks physics on
its heels.
This is what makes Google
Google: its physical network, its
thousands of fiber miles, and
those many thousands of
servers that, in aggregate, add
up to the mother of all clouds."
- Wired
Wired, 'Google Throws Open Doors To Its Top Secret Data Center', October 2012

Images by Connie
Zhou
Investing In Our Cloud

$2.9B in additional data center investments worldwide
Google Innovations in Software

MapReduce

2002

2004

GFS

Dremel

2006

Big Table

2008

Colossus

2010

Spanner

2012
Google Cloud Platform
Compute
Compute
Engine
App Engine

Storage

App Services

Cloud Storage

BigQuery

Cloud SQL

Cloud
Endpoints

Cloud
Datastore

Caching

Persistent
Disk

Queues
Google Compute Engine and
Google Cloud Storage
Google Compute Engine
Compute Engine
Virtual Machine

Hosting

Compute

Network

Storage

Tooling

Launch Virtual
Machines on demand

Connect your VMs
together to form
powerful clusters

Store on persistent
disk, local disk or
Cloud Storage

Control your VMs via
REST API or
command line

Scale

Performance

Value
Google Compute Engine
“Google Compute Engine is not just
fast. It’s Google fast. In fact, it’s a class
of fast that enables new service
architectures entirely.”
- Sebastian Stadil, Scalr
Google Compute Engine
MapR on GCE breaks the TeraSort, then MinuteSort records

● sorted 15 billion 100-byte records (1.5TB) in
59 seconds
● used 2,103 instances (n1-standard-4-d)
● 8,412 cores
Computing in the Cloud with GCE
Cancer Research
"Google Compute Engine is just part of what we see as a whole
new way for scientists around the world to work more effectively
in the cloud. This is a paradigm shift that Google services can
help bring about.”
– Hector Rovira, senior software engineer at the Institute for Systems Biology

High Performance Computing
“And we have been very satisfied with the way
Google Compute Engine performs. One of the
great things with Google Compute Engine in
our experience is that the performance is really
reliable, so there is not a lot of variability in
the performance we see.”
— Joe Masters Emison, Founder and VP R&D, BuildFax

Google Confidential, Do not Copy or Distribute
Google Cloud Storage
Google Cloud Storage lets you store and access data on Google's
reliable infrastructure
SPEED

RELIABILITY

SCALABILITY

Global Network

World-Class Reliability

Unlimited Objects

Lowest latency for rapid access

99.9% SLA

There is no limit to # objects

Data Center Efficiency

Availability of your Data

Big Object Size

Maximal service at critical need

Read-Your-Write-Consistency

Up to 5 Terabytes per object

USES
Content Delivery &
File Sharing
Active Archiving
Application Storage
Computation
Specialized Big Data
services of Google
Cloud Platform
Big Data example: app logs

User
requests

Application
servers
Big Data example: app logs

App
persistence

BigTable / GFS

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Big Data example: app logs
MapReduce

App
persistence

BigTable / GFS
Dremel
MapReduce
App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Google Innovations in Software

MapReduce

2002

2004

GFS

Dremel

2006

Big Table

2008

Colossus

2010

Spanner

2012
Google Cloud Platform
Services: BigQuery
Google BigQuery
● Query billions of rows in seconds, with no index
● Uses a SQL-style query syntax
● Bulk and stream data ingestion
● As a service:
○ zero setup/management
○ accessed by a
■ RESTful API
■ console
■ partner tools
○ Pay as you go (per query)
How Google Approaches Analytics
Dremel
Ad-hoc query system for terabyte datasets
●
●
●

Query execution tree
Column Oriented records
...and a lot of nodes
Google BigQuery
BigQuery: a fully-managed data analytics service in the cloud.
Unlimited storage. Interactive analysis on multi-terabyte datasets.

Adwords
DCLK
Analytics

Other BI
Tools

Google
Spreadsheets

Scalable Storage
SQL

API

Corporate data
3rd party data

Store all your data
in the cloud

App Engine
App

Co-Workers

Analyze interactively
Mash it up

Securely Share/
distribute the results
Big Data Processing Pipeline
Application
level code

Log data

BI tools

Logstore
Backends +
MapReduce
Structured
data

BigQuery
Datastore
SQL

API

Interactive
Dashboards + apps

Hadoop
Unstructured
data

Cloud Storage

Store

Extract &
Transform
Custom logic &
3rd party
libraries

Analyze
interactively

Google
Spreadsheets

Serve
Big Data example: app logs
MapReduce

App
persistence

SQL

BigTable / GFS
Dremel
MapReduce

Analysts
& reporting

App logging

User
requests

Application
servers

Logs cluster

Note: diagram is dramatically simplified
Sample BigQuery use cases
Display ads analytics
(global top-5 media agency)

Ads network reporting
(3rd party mobile ads)

Fleet reservations
(online travel operations)

Mobile app statistics
(online reading vendor)

Revenue optimization
(holiday/travel properties)

Analyze global campaign performance for F500 clients
1 client = 20GB/day of DoubleClick impressions logs

Deliver x-platform performance analytics dashboards
1B events/day x 100s of ads customers

Monitor customer demand vs supply shortfalls
10,000 routes x 1000s customers = millions of daily events

Usage analysis on 60M installs; 10M active users
2B API requests/day, 20GB log data/day

Correlate marketing effectiveness vs global reservations
10MBs / day from multiple data warehouses
https://cloud.google.com/resources/starterpack/

Contenu connexe

En vedette

Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSAmazon Web Services
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Stormviirya
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Roberto Hashioka
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민NAVER D2
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우NAVER D2
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영NAVER D2
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkGuido Schmutz
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Spark Summit
 

En vedette (10)

Building a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWSBuilding a Data Processing Pipeline on AWS
Building a Data Processing Pipeline on AWS
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
Real-Time Data Processing Pipeline & Visualization with Docker, Spark, Kafka ...
 
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
[125]react로개발자2명이플랫폼4개를서비스하는이야기 심상민
 
[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우[112]rest에서 graph ql과 relay로 갈아타기 이정우
[112]rest에서 graph ql과 relay로 갈아타기 이정우
 
[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영[236] 카카오의데이터파이프라인 윤도영
[236] 카카오의데이터파이프라인 윤도영
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 
Real-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache SparkReal-Time Analytics with Apache Cassandra and Apache Spark
Real-Time Analytics with Apache Cassandra and Apache Spark
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
 

Plus de BigDataCloud

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsBigDataCloud
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS BigDataCloud
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing ServicesBigDataCloud
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!BigDataCloud
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBigDataCloud
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBigDataCloud
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud PlatformBigDataCloud
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value BigDataCloud
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.BigDataCloud
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideBigDataCloud
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?BigDataCloud
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalBigDataCloud
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBigDataCloud
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBigDataCloud
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookBigDataCloud
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinBigDataCloud
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud
 

Plus de BigDataCloud (20)

Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning PlatformsWebinar - Comparative Analysis of Cloud based Machine Learning Platforms
Webinar - Comparative Analysis of Cloud based Machine Learning Platforms
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS REAL-TIME RECOMMENDATION SYSTEMS
REAL-TIME RECOMMENDATION SYSTEMS
 
Cloud Computing Services
Cloud Computing ServicesCloud Computing Services
Cloud Computing Services
 
Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!Google Enterprise Cloud Platform - Resources & $2000 credit!
Google Enterprise Cloud Platform - Resources & $2000 credit!
 
Big Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & AppsBig Data in the Cloud - Solutions & Apps
Big Data in the Cloud - Solutions & Apps
 
Big Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud PlatformBig Data Analytics in Motorola on the Google Cloud Platform
Big Data Analytics in Motorola on the Google Cloud Platform
 
Streak + Google Cloud Platform
Streak + Google Cloud PlatformStreak + Google Cloud Platform
Streak + Google Cloud Platform
 
Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value Using Advanced Analyics to bring Business Value
Using Advanced Analyics to bring Business Value
 
Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.Creating Business Value from Big Data, Analytics & Technology.
Creating Business Value from Big Data, Analytics & Technology.
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Recommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural GuideRecommendation Engines - An Architectural Guide
Recommendation Engines - An Architectural Guide
 
Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?Why Hadoop is the New Infrastructure for the CMO?
Why Hadoop is the New Infrastructure for the CMO?
 
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, PivotalHadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
Hadoop : A Foundation for Change - Milind Bhandarkar Chief Scientist, Pivotal
 
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDBBig Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
Big Data Cloud Meetup - Jan 29 2013 - Mike Stonebraker & Scott Jarr of VoltDB
 
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - ZettasetBig Data Cloud Meetup - Jan 24 2013 - Zettaset
Big Data Cloud Meetup - Jan 24 2013 - Zettaset
 
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at FacebookA Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
A Survey of Petabyte Scale Databases and Storage Systems Deployed at Facebook
 
What Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will WinWhat Does Big Data Mean and Who Will Win
What Does Big Data Mean and Who Will Win
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentationBigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
BigDataCloud meetup Feb 16th - Microsoft's Saptak Sen's presentation
 

Dernier

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceMartin Humpolec
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 

Dernier (20)

UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Things you didn't know you can use in your Salesforce
Things you didn't know you can use in your SalesforceThings you didn't know you can use in your Salesforce
Things you didn't know you can use in your Salesforce
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 

Big Data Analytics with Google Cloud Platform

  • 1. Big Data Analytics with Google Cloud Platform
  • 2. Speaker William Vambenepe Google Cloud Platform Lead Product Manager for Big Data Twitter: @vambenepe
  • 3. Google ran into “Big Data” problems in the process of building its business...
  • 4. Big data at Google scale How much video is uploaded to YouTube every minute? How many active Gmail users are there? How large is Google's web search index? How long does it take for Google to respond to a search query? 100 hours 500M+ 100PB+ (over 100,000 TBs) 0.25 seconds
  • 5. ...now that the technology is available, its usage is spreading to all industries.
  • 6. Conventional Methods for Data & Analytics Focused on optimizing core business processes and capture details about high interactions/transactions
  • 7. Conventional Methods for Data & Analytics Stopped collecting data at the nearest boundary of business process: ● ● Not instrumented for tracking of product usage by customer Instrumentation for tracking supplier performance based on supplier specific situation Businesses decision makers fill data gaps with gut feel...
  • 8. Rise of Big Data enabled by Tech Innovation It is Possible to instrument data capture at any level of business process...
  • 9. ...and to store and process it easily and efficiently.
  • 10. Big Data enabled by Tech Innovation Highly scalable, performant and cost effective computational capabilities in the cloud makes it possible to store vast amounts of data from many sources
  • 11. For the past 15 years, Google has been building out the world’s fastest, most powerful, highest quality cloud infrastructure on the planet.
  • 12. Cloud Platform is built on the same infrastructure that powers Google.
  • 13. Google's Platform "[Google's] ability to build, organize, and operate a huge network of servers and fiberoptic cables with an efficiency and speed that rocks physics on its heels. This is what makes Google Google: its physical network, its thousands of fiber miles, and those many thousands of servers that, in aggregate, add up to the mother of all clouds." - Wired Wired, 'Google Throws Open Doors To Its Top Secret Data Center', October 2012 Images by Connie Zhou
  • 14. Investing In Our Cloud $2.9B in additional data center investments worldwide
  • 15. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  • 16. Google Cloud Platform Compute Compute Engine App Engine Storage App Services Cloud Storage BigQuery Cloud SQL Cloud Endpoints Cloud Datastore Caching Persistent Disk Queues
  • 17. Google Compute Engine and Google Cloud Storage
  • 18. Google Compute Engine Compute Engine Virtual Machine Hosting Compute Network Storage Tooling Launch Virtual Machines on demand Connect your VMs together to form powerful clusters Store on persistent disk, local disk or Cloud Storage Control your VMs via REST API or command line Scale Performance Value
  • 19. Google Compute Engine “Google Compute Engine is not just fast. It’s Google fast. In fact, it’s a class of fast that enables new service architectures entirely.” - Sebastian Stadil, Scalr
  • 20. Google Compute Engine MapR on GCE breaks the TeraSort, then MinuteSort records ● sorted 15 billion 100-byte records (1.5TB) in 59 seconds ● used 2,103 instances (n1-standard-4-d) ● 8,412 cores
  • 21. Computing in the Cloud with GCE Cancer Research "Google Compute Engine is just part of what we see as a whole new way for scientists around the world to work more effectively in the cloud. This is a paradigm shift that Google services can help bring about.” – Hector Rovira, senior software engineer at the Institute for Systems Biology High Performance Computing “And we have been very satisfied with the way Google Compute Engine performs. One of the great things with Google Compute Engine in our experience is that the performance is really reliable, so there is not a lot of variability in the performance we see.” — Joe Masters Emison, Founder and VP R&D, BuildFax Google Confidential, Do not Copy or Distribute
  • 22. Google Cloud Storage Google Cloud Storage lets you store and access data on Google's reliable infrastructure SPEED RELIABILITY SCALABILITY Global Network World-Class Reliability Unlimited Objects Lowest latency for rapid access 99.9% SLA There is no limit to # objects Data Center Efficiency Availability of your Data Big Object Size Maximal service at critical need Read-Your-Write-Consistency Up to 5 Terabytes per object USES Content Delivery & File Sharing Active Archiving Application Storage Computation
  • 23. Specialized Big Data services of Google Cloud Platform
  • 24. Big Data example: app logs User requests Application servers
  • 25. Big Data example: app logs App persistence BigTable / GFS App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  • 26. Big Data example: app logs MapReduce App persistence BigTable / GFS Dremel MapReduce App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  • 27. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  • 28. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  • 29. Google Innovations in Software MapReduce 2002 2004 GFS Dremel 2006 Big Table 2008 Colossus 2010 Spanner 2012
  • 31. Google BigQuery ● Query billions of rows in seconds, with no index ● Uses a SQL-style query syntax ● Bulk and stream data ingestion ● As a service: ○ zero setup/management ○ accessed by a ■ RESTful API ■ console ■ partner tools ○ Pay as you go (per query)
  • 32. How Google Approaches Analytics Dremel Ad-hoc query system for terabyte datasets ● ● ● Query execution tree Column Oriented records ...and a lot of nodes
  • 33. Google BigQuery BigQuery: a fully-managed data analytics service in the cloud. Unlimited storage. Interactive analysis on multi-terabyte datasets. Adwords DCLK Analytics Other BI Tools Google Spreadsheets Scalable Storage SQL API Corporate data 3rd party data Store all your data in the cloud App Engine App Co-Workers Analyze interactively Mash it up Securely Share/ distribute the results
  • 34. Big Data Processing Pipeline Application level code Log data BI tools Logstore Backends + MapReduce Structured data BigQuery Datastore SQL API Interactive Dashboards + apps Hadoop Unstructured data Cloud Storage Store Extract & Transform Custom logic & 3rd party libraries Analyze interactively Google Spreadsheets Serve
  • 35. Big Data example: app logs MapReduce App persistence SQL BigTable / GFS Dremel MapReduce Analysts & reporting App logging User requests Application servers Logs cluster Note: diagram is dramatically simplified
  • 36. Sample BigQuery use cases Display ads analytics (global top-5 media agency) Ads network reporting (3rd party mobile ads) Fleet reservations (online travel operations) Mobile app statistics (online reading vendor) Revenue optimization (holiday/travel properties) Analyze global campaign performance for F500 clients 1 client = 20GB/day of DoubleClick impressions logs Deliver x-platform performance analytics dashboards 1B events/day x 100s of ads customers Monitor customer demand vs supply shortfalls 10,000 routes x 1000s customers = millions of daily events Usage analysis on 60M installs; 10M active users 2B API requests/day, 20GB log data/day Correlate marketing effectiveness vs global reservations 10MBs / day from multiple data warehouses