SlideShare une entreprise Scribd logo
1  sur  27
Data @ Google
Use the same tools Google uses to manage its Big Data
Eric Anderson - Product Manager, @ericmander
2
Street nameStreet number
Street View
Sign
Business facade
Sign
Business name
Traffic light
Traffic signStreet number
3
Managed Data Services - Focus on Insight vs Infrastructure
PB+ Scale, No-Ops, Batch & Streaming of Data
Insights/
Analytics
Resource
Provisioning
Performance
Tuning
Monitoring
Reliability
Deployment &
Configuration
Handling
Growing Scale
Utilization
Improvements
Insights/
Analytics
4
15 Years of Tackling Big Data Problems
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Open
Source
2005
Google
Cloud
Product
s
BigTable Spanner
2016
Millwheel TensorflowDataflowFlume JavaDremel
5
15 Years of Tackling Big Data Problems
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Open
Source
2005
Google
Cloud
Product
s
BigTable Millwheel TensorflowSpanner
2016
DataflowFlume JavaDremel
6
“ ”
Google is living a few years in the future and
sending the rest of us messages.
Doug Cutting
Hadoop Co-Creator
7
15 Years of Tackling Big Data Problems
Google
Papers
20082002 2004 2006 2010 2012 2014 2015
GFS
Map
Reduce
Flume Java
Open
Source
2005
Google
Cloud
Product
s
BigQuery Pub/Sub Dataflow Bigtable
BigTable Dremel Spanner
ML
2016
Millwheel TensorflowDataflow
8
Google
Analytics
Premium
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
Capture Store Analyze
Google
Stackdriver
Process
Stream
Use
Data
Scientists
Business
Analysts
Cloud Dataproc
Cloud
Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
Cloud ML
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
Serverless platform, auto-optimized usage, across the entire data lifecycle
9
Google
Analytics
Premium
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
Capture Store Analyze
Google
Stackdriver
Process
Stream
Use
Data
Scientists
Business
Analysts
Cloud Dataproc
Cloud
Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
Cloud ML
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
Serverless platform, auto-optimized usage, across the entire data lifecycle
10
Cloud Pub/Sub
Fast, reliable, event delivery. Serverless, autoscaling, pay for what you use.
11
Out of order data
The hard part of stream processing
12
Cloud Dataflow
Unified batch/stream pipelines.
13
Beam=Batch+Stream
Apache Beam (incubating)
Cloud Dataflow
Based on Apache Beam. Pipelines are portable to your favorite runtime.
14
Compute and Storage
Unbounded
Bounded
Resource Management
Resource Auto-scaler
Dynamic Work
Rebalancer
Work Scheduler
Monitoring
Log Collection
Graph Optimization
Auto-Healing
Intelligent WatermarkingS
O
U
R
C
E
S
I
N
K
Cloud Dataflow
Serverless, autoscaling, pay for what you use.
15
“
”
We are very excited about the productivity
benefits offered by Cloud Dataflow and
Cloud Pub/Sub. It took half a day to rewrite
something that had previously taken over
six months to build using Spark.
Paul Clarke
Director of Technology, Ocado
“From traditional batch processing to rock-solid event delivery to the nearly
magical abilities of BigQuery, building on Google’s data infrastructure
provides us with a significant advantage where it matters the most.”
Nicholas Harteau
VP of Engineering and Infrastructure
16
17
Google
Analytics
Premium
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
Capture Store Analyze
Google
Stackdriver
Process
Stream
Use
Data
Scientists
Business
Analysts
Cloud Dataproc
Cloud
Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
Cloud ML
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
Serverless platform, auto-optimized usage, across the entire data lifecycle
18
Cloud Dataproc
Ready-to use Spark and Hadoop clusters in 90 seconds
Integrated with Cloud
Storage, Cloud Logging,
Cloud Monitoring, and more.
While active, Dataproc
clusters are billed minute-by-
minute.
Dataproc clusters can make
use of low-cost preemptible
Compute Engine VMs.
Minute-by-Minute Billing Preemptible VMsNative Spark and Hadoop Cloud Integrated
Run Spark and Hadoop
applications out of the box
without modification.
19
Cloud Dataproc
Demonstrably more cost-effective
Source: Michael Li & Ariel
M'ndange-Pfupfu on O’Reilly:
https://www.oreilly.com/ideas/
spark-comparison-aws-vs-gcp
20
Google
Analytics
Premium
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
Capture Store Analyze
Google
Stackdriver
Process
Stream
Use
Data
Scientists
Business
Analysts
Cloud Dataproc
Cloud
Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
Cloud ML
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
Serverless platform, auto-optimized usage, across the entire data lifecycle
21
Google Cloud Bigtable
Google confidential │ Do not distribute
Overview:
Data to process: Data in the Consolidated Audit Trail (CAT).
A data repository of all equities and options orders, quotes,
and events
Challenges:
How to process the CAT and organize 100 billion market
events into an “order lifecycle” in a 4 hour window
Store 6 years (~30PB) of data
Cloud Bigtable to process and run queries
and tolerate volume increases
6 BILLION
MARKET EVENTS
WRITTEN PER HOUR
1.7 GIGs
PER SECOND
PER HOUR
6 TBs
10 BN
WRITTEN
PER HOUR BURSTS
1.7 GIGABYTES
PER SECOND
10 TERABYTES
PER HOUR
23
Google
Analytics
Premium
Cloud
Pub/Sub
BigQuery Storage
(tables)
Cloud Bigtable
(NoSQL)
Cloud Storage
(files)
Cloud Dataflow
BigQuery Analytics
Capture Store Analyze
Google
Stackdriver
Process
Stream
Use
Data
Scientists
Business
Analysts
Cloud Dataproc
Cloud
Datalab
Real-time analytics
Real-time
dashboard
Real-time
alerts
Cloud ML
Batch
Firebase
Storage
Transfer
Service
Cloud
Dataflow
Serverless platform, auto-optimized usage, across the entire data lifecycle
24
– Mattias P Johansson, Software Engineer, Spotify
“With Google Cloud Platform, we benefitted by having a
virtual supercomputer on demand, without having to deal
with all the usual space, power, cooling and networking
issues.
Just a few years ago, we would have needed to use the
largest supercomputers on the planet to do what we’re
now able to do with Google”
– Mark Johnson, CEO, Descartes Labs
“Right at the start of the partnership we were able
to reduce time to insight from 96 hours to 30
minutes by using BigQuery.”
– Gary Sanders, Head of Digital Analytics, Lloyds Banking Group
“Everyone involved unanimously picked GCP. It came
down to this: we believe the core technology is better.”
– Peter Bakkam, Platform Lead, Quizlet
Do you feel this way about your Data Warehouse?
25
BigQuery
Now with full support for Enterprise Data Warehouse
SQL
Flat-rate Pricing
Standard
SQL
ODBC
Connector
DML - Beta
Identity Access and
Management
26
BETA
BETA
GAGA
Use your own data to train models
Cloud Datalab
Cloud Machine Learning
Cloud Storage Google BigQuery Develop/Model/Test
Cloud Dataflow
GA
Train
Predict
Features:
● Fully Managed NewSQL database service with relational
semantics and global consistency
● Global replication options for low-latency reads across the
globe
● Consistent transactions across millions of rows
Use Cases:
● Large application workloads with very high write volumes or
large datasets (3 TB+)
● Geographically distributed control planes with global
consistency guarantees
Pricing:
● Pay per hour of node compute for throughput
● Pay per GB/month of data stored
Cloud Spanner
Google Cloud Platform

Contenu connexe

Tendances

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics Datavail
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinarJean-Michel Franco
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiDatabricks
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the CloudRob Thomas
 
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Cloudera, Inc.
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAmazon Web Services
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with QlikMischa van Werkhoven
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyCloudera, Inc.
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York EventHARMAN Services
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesDataStax
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreSoftweb Solutions
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureAgilisium Consulting
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldDATAVERSITY
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcarePerficient, Inc.
 
Overcoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in ProductionOvercoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in ProductionSandeep Uttamchandani
 

Tendances (20)

Journey to Cloud Analytics
Journey to Cloud Analytics Journey to Cloud Analytics
Journey to Cloud Analytics
 
Talend winter 2017 overview webinar
Talend winter 2017 overview webinarTalend winter 2017 overview webinar
Talend winter 2017 overview webinar
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Accelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali GhodsiAccelerating Innovation with Unified Analytics with Ali Ghodsi
Accelerating Innovation with Unified Analytics with Ali Ghodsi
 
IBM Big Data in the Cloud
IBM Big Data in the CloudIBM Big Data in the Cloud
IBM Big Data in the Cloud
 
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
Digital Government: Data + Government Isn't Enough | Wrangle Conference 2017
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
AWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AIAWS Initiate Day Dublin 2019 – Big Data Meets AI
AWS Initiate Day Dublin 2019 – Big Data Meets AI
 
Data Driven Possibilities with Qlik
Data Driven Possibilities with QlikData Driven Possibilities with Qlik
Data Driven Possibilities with Qlik
 
Modernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data StrategyModernizing Architecture for a Complete Data Strategy
Modernizing Architecture for a Complete Data Strategy
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
#DataOnCloud New York Event
#DataOnCloud New York Event#DataOnCloud New York Event
#DataOnCloud New York Event
 
Webinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph DatabasesWebinar - Bringing Game Changing Insights with Graph Databases
Webinar - Bringing Game Changing Insights with Graph Databases
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)The Scout24 Data Platform (A Technical Deep Dive)
The Scout24 Data Platform (A Technical Deep Dive)
 
Why Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data ArchitectureWhy Data Lake should be the foundation of Enterprise Data Architecture
Why Data Lake should be the foundation of Enterprise Data Architecture
 
The Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud WorldThe Importance of DataOps in a Multi-Cloud World
The Importance of DataOps in a Multi-Cloud World
 
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in HealthcareMoving to the Cloud: Modernizing Data Architecture in Healthcare
Moving to the Cloud: Modernizing Data Architecture in Healthcare
 
Overcoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in ProductionOvercoming DataOps hurdles for ML in Production
Overcoming DataOps hurdles for ML in Production
 

Similaire à Eric Andersen Keynote

Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudDATAVERSITY
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudDATAVERSITY
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Big data in action
Big data in actionBig data in action
Big data in actionTu Pham
 
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataprocAlluxio, Inc.
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStaxWebinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStaxDataStax
 
Dimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarDimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarKeao Caindec
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTJames Chittenden
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataTu Le Dinh
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloudTu Pham
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformGoDataDriven
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...Codecamp Romania
 

Similaire à Eric Andersen Keynote (20)

Slides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-CloudSlides: Success Stories for Data-to-Cloud
Slides: Success Stories for Data-to-Cloud
 
Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?Google Cloud Data Platform - Why Google for Data Analysis?
Google Cloud Data Platform - Why Google for Data Analysis?
 
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudCase Study - Gordon Foods Delivers Fresh Data to the Cloud
Case Study - Gordon Foods Delivers Fresh Data to the Cloud
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Big data in action
Big data in actionBig data in action
Big data in action
 
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the EnterpriseMongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
MongoDB World 2016: Lunch & Learn: Google Cloud for the Enterprise
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Hybrid data lake on google cloud with alluxio and dataproc
Hybrid data lake on google cloud  with alluxio and dataprocHybrid data lake on google cloud  with alluxio and dataproc
Hybrid data lake on google cloud with alluxio and dataproc
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStaxWebinar | From Zero to 1 Million with Google Cloud Platform and DataStax
Webinar | From Zero to 1 Million with Google Cloud Platform and DataStax
 
Dimension Data Saugatuk Webinar
Dimension Data Saugatuk WebinarDimension Data Saugatuk Webinar
Dimension Data Saugatuk Webinar
 
IoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoTIoT NY - Google Cloud Services for IoT
IoT NY - Google Cloud Services for IoT
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
 
Workshop on Google Cloud Data Platform
Workshop on Google Cloud Data PlatformWorkshop on Google Cloud Data Platform
Workshop on Google Cloud Data Platform
 
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...Bogdan botea, dmitry nefedkin   no fiddle, efficient development on the googl...
Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...
 

Plus de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Plus de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Eric Andersen Keynote

  • 1. Data @ Google Use the same tools Google uses to manage its Big Data Eric Anderson - Product Manager, @ericmander
  • 2. 2 Street nameStreet number Street View Sign Business facade Sign Business name Traffic light Traffic signStreet number
  • 3. 3 Managed Data Services - Focus on Insight vs Infrastructure PB+ Scale, No-Ops, Batch & Streaming of Data Insights/ Analytics Resource Provisioning Performance Tuning Monitoring Reliability Deployment & Configuration Handling Growing Scale Utilization Improvements Insights/ Analytics
  • 4. 4 15 Years of Tackling Big Data Problems Google Papers 20082002 2004 2006 2010 2012 2014 2015 GFS Map Reduce Open Source 2005 Google Cloud Product s BigTable Spanner 2016 Millwheel TensorflowDataflowFlume JavaDremel
  • 5. 5 15 Years of Tackling Big Data Problems Google Papers 20082002 2004 2006 2010 2012 2014 2015 GFS Map Reduce Open Source 2005 Google Cloud Product s BigTable Millwheel TensorflowSpanner 2016 DataflowFlume JavaDremel
  • 6. 6 “ ” Google is living a few years in the future and sending the rest of us messages. Doug Cutting Hadoop Co-Creator
  • 7. 7 15 Years of Tackling Big Data Problems Google Papers 20082002 2004 2006 2010 2012 2014 2015 GFS Map Reduce Flume Java Open Source 2005 Google Cloud Product s BigQuery Pub/Sub Dataflow Bigtable BigTable Dremel Spanner ML 2016 Millwheel TensorflowDataflow
  • 8. 8 Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics Capture Store Analyze Google Stackdriver Process Stream Use Data Scientists Business Analysts Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts Cloud ML Batch Firebase Storage Transfer Service Cloud Dataflow Serverless platform, auto-optimized usage, across the entire data lifecycle
  • 9. 9 Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics Capture Store Analyze Google Stackdriver Process Stream Use Data Scientists Business Analysts Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts Cloud ML Batch Firebase Storage Transfer Service Cloud Dataflow Serverless platform, auto-optimized usage, across the entire data lifecycle
  • 10. 10 Cloud Pub/Sub Fast, reliable, event delivery. Serverless, autoscaling, pay for what you use.
  • 11. 11 Out of order data The hard part of stream processing
  • 13. 13 Beam=Batch+Stream Apache Beam (incubating) Cloud Dataflow Based on Apache Beam. Pipelines are portable to your favorite runtime.
  • 14. 14 Compute and Storage Unbounded Bounded Resource Management Resource Auto-scaler Dynamic Work Rebalancer Work Scheduler Monitoring Log Collection Graph Optimization Auto-Healing Intelligent WatermarkingS O U R C E S I N K Cloud Dataflow Serverless, autoscaling, pay for what you use.
  • 15. 15 “ ” We are very excited about the productivity benefits offered by Cloud Dataflow and Cloud Pub/Sub. It took half a day to rewrite something that had previously taken over six months to build using Spark. Paul Clarke Director of Technology, Ocado
  • 16. “From traditional batch processing to rock-solid event delivery to the nearly magical abilities of BigQuery, building on Google’s data infrastructure provides us with a significant advantage where it matters the most.” Nicholas Harteau VP of Engineering and Infrastructure 16
  • 17. 17 Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics Capture Store Analyze Google Stackdriver Process Stream Use Data Scientists Business Analysts Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts Cloud ML Batch Firebase Storage Transfer Service Cloud Dataflow Serverless platform, auto-optimized usage, across the entire data lifecycle
  • 18. 18 Cloud Dataproc Ready-to use Spark and Hadoop clusters in 90 seconds Integrated with Cloud Storage, Cloud Logging, Cloud Monitoring, and more. While active, Dataproc clusters are billed minute-by- minute. Dataproc clusters can make use of low-cost preemptible Compute Engine VMs. Minute-by-Minute Billing Preemptible VMsNative Spark and Hadoop Cloud Integrated Run Spark and Hadoop applications out of the box without modification.
  • 19. 19 Cloud Dataproc Demonstrably more cost-effective Source: Michael Li & Ariel M'ndange-Pfupfu on O’Reilly: https://www.oreilly.com/ideas/ spark-comparison-aws-vs-gcp
  • 20. 20 Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics Capture Store Analyze Google Stackdriver Process Stream Use Data Scientists Business Analysts Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts Cloud ML Batch Firebase Storage Transfer Service Cloud Dataflow Serverless platform, auto-optimized usage, across the entire data lifecycle
  • 22. Google confidential │ Do not distribute Overview: Data to process: Data in the Consolidated Audit Trail (CAT). A data repository of all equities and options orders, quotes, and events Challenges: How to process the CAT and organize 100 billion market events into an “order lifecycle” in a 4 hour window Store 6 years (~30PB) of data Cloud Bigtable to process and run queries and tolerate volume increases 6 BILLION MARKET EVENTS WRITTEN PER HOUR 1.7 GIGs PER SECOND PER HOUR 6 TBs 10 BN WRITTEN PER HOUR BURSTS 1.7 GIGABYTES PER SECOND 10 TERABYTES PER HOUR
  • 23. 23 Google Analytics Premium Cloud Pub/Sub BigQuery Storage (tables) Cloud Bigtable (NoSQL) Cloud Storage (files) Cloud Dataflow BigQuery Analytics Capture Store Analyze Google Stackdriver Process Stream Use Data Scientists Business Analysts Cloud Dataproc Cloud Datalab Real-time analytics Real-time dashboard Real-time alerts Cloud ML Batch Firebase Storage Transfer Service Cloud Dataflow Serverless platform, auto-optimized usage, across the entire data lifecycle
  • 24. 24 – Mattias P Johansson, Software Engineer, Spotify “With Google Cloud Platform, we benefitted by having a virtual supercomputer on demand, without having to deal with all the usual space, power, cooling and networking issues. Just a few years ago, we would have needed to use the largest supercomputers on the planet to do what we’re now able to do with Google” – Mark Johnson, CEO, Descartes Labs “Right at the start of the partnership we were able to reduce time to insight from 96 hours to 30 minutes by using BigQuery.” – Gary Sanders, Head of Digital Analytics, Lloyds Banking Group “Everyone involved unanimously picked GCP. It came down to this: we believe the core technology is better.” – Peter Bakkam, Platform Lead, Quizlet Do you feel this way about your Data Warehouse?
  • 25. 25 BigQuery Now with full support for Enterprise Data Warehouse SQL Flat-rate Pricing Standard SQL ODBC Connector DML - Beta Identity Access and Management
  • 26. 26 BETA BETA GAGA Use your own data to train models Cloud Datalab Cloud Machine Learning Cloud Storage Google BigQuery Develop/Model/Test Cloud Dataflow GA Train Predict
  • 27. Features: ● Fully Managed NewSQL database service with relational semantics and global consistency ● Global replication options for low-latency reads across the globe ● Consistent transactions across millions of rows Use Cases: ● Large application workloads with very high write volumes or large datasets (3 TB+) ● Geographically distributed control planes with global consistency guarantees Pricing: ● Pay per hour of node compute for throughput ● Pay per GB/month of data stored Cloud Spanner Google Cloud Platform

Notes de l'éditeur

  1. a lot of info in pictures 23 billion words in Wikipedia 40 billion textual lines in StreetView Make the point that we didn’t realize how valuable the pictures were originally and later we revisited and extracted all this additional value. Willing to make a bet all the audience have similarly valuable data.
  2. What really tipped the scales towards Google for spotify was their experience with Google’s data platform and tools. “Good infrastructure isn’t just about keeping things up and running, it’s about making all of our teams more efficient and more effective, and Google’s data stack does that for us in spades. From traditional batch processing with Dataproc, to rock-solid event delivery with Pub/Sub to the nearly magical abilities of BigQuery, building on Google’s data infrastructure provides us with a significant advantage where it matters the most. We have a large and complex backend, so this is a large and complex project that will take us some time to complete. We’re looking forward to sharing our experiences with you as we go, so watch our engineering blog for more information on what we learn, build and break along the way. We’re pretty excited about our Googley future and hope you’ll find it interesting too.