SlideShare une entreprise Scribd logo
1  sur  51
What does it take
to make Google
work at scale?
Malte Schwarzkopf [@ms705]
Cambridge Systems at Scale
http://camsas.org – @CamSysAtScale
CamSa
S
CamSa
S
2http://camsas.org @CamSysAtScale
CamSa
S
3http://camsas.org @CamSysAtScale
CamSa
S
4http://camsas.org @CamSysAtScale
What happens here?
CamSa
S
5http://camsas.org @CamSysAtScale
What happens in those 139ms?
Internet
Google
datacenter
Your
computer
Front-end web
server
G+ idx
places
idx
web idx
? ?
?
? ?
?
CamSa
S
6http://camsas.org @CamSysAtScale
What we’ll chat about
1.Datacenter hardware
2.Scalable software stacks
a.Google
b.Facebook
3.Differences compared to HPC
4.Current research directions
Please ask questions - any
time! :)
CamSa
S
7http://camsas.org @CamSysAtScale
CamSa
S
8http://camsas.org @CamSysAtScale
CamSa
S
9http://camsas.org @CamSysAtScale
CamSa
S
10http://camsas.org @CamSysAtScale
CamSa
S
11http://camsas.org @CamSysAtScale
10,000+ machines
“Machine”
no chassis!
DC battery
~6 disk drives
mostly custom-made
Network
ToR switch
multi-path core
e.g., 3-stage fat tree or
CamSa
S
12http://camsas.org @CamSysAtScale
How’s this different to HPC?
Emphasis on commodity hardware
No expensive interconnect
Mid-range machines
Energy/performance/cost trade-off essential
Massive automation
Very small number of on-site staff
Automated software bootstrap
Fault tolerant design
Each component can fail
Software must be aware and compensate
CamSa
S
13http://camsas.org @CamSysAtScale
The software side
Machine MachineMachine
Linux kernel
(customized)
Linux kernel
(customized)
Linux kernel
(customized)
Transparent distributed systems
Containers
Containers
We’ll look at what goes here!
CamSa
S
14http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
15http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
16http://camsas.org @CamSysAtScale
GFS/Colossus
Bulk data block storage system
Optimized for large files (GB-size)
Supports small files, but not common case
Read, write, append modes
Colossus = GFSv2, adds some improvements
e.g., Reed-Solomon-based erasure coding
better support for latency-sensitive applications
sharded meta-data layer, rather than single master
CamSa
S
17http://camsas.org @CamSysAtScale
Application
GFS/Colossus: architecture
Chunkserver
GFS library
Chunkserver
ChunkserverGFS Master
Namespace
Metadata
CamSa
S
18http://camsas.org @CamSysAtScale
GFS/Colossus: writes
Application
GFS Master
Chunkserver
GFS library
Chunkserver
Chunkserver
Namespace
Metadata
Chain replication
CamSa
S
19http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
20http://camsas.org @CamSysAtScale
BigTable (2006)
• ‘Three-dimensional‘ key-value store:
– <row key, column key, timestamp> → value
• Effectively a distributed, sorted, sparse map
row
(key: string)
column
(key: string)
timestamp
(key: int64)
cell
<“larry.page“, “websearch“, 133746428> → “cat pictures“
CamSa
S
21http://camsas.org @CamSysAtScale
BigTable (2006)
• Distributed tablets (~1 GB) hold subsets of map
• On top of Colossus, which handles replication and
fault tolerance: only one (active) server per tablet!
• Reads & writes within a row are transactional
– Independently of the number of columns touched
– But: no cross-row transactions possible
– Turns out users find this hard to deal with
CamSa
S
22http://camsas.org @CamSysAtScale
BigTable (2006)
• META0 tablet is “root“ for name resolution
– Like a coordinator/leader
– Nested META1 tablets improve meta-data lookup
• Elect master (META0 tablet server) using
Chubby, and to maintain list of tablet servers &
schemas
– 5-way replicated Paxos consensus on data in Chubby
• Built atop GFS; building block for MegaStore
– “Next gen” with transactions: Spanner
CamSa
S
23http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
24http://camsas.org @CamSysAtScale
Spanner (2012)
• BigTable insufficient for some consistency needs
• Often have transactions across >1 data centres
– May buy app on Play Store while travelling in the U.S.
– Hit U.S. server, but customer billing data is in U.K.
– Or may need to update several replicas for fault tolerance
• Wide-area consistency is hard
– due to long delays and clock skew
– no global, universal notion of time
– NTP not accurate enough, PTP doesn’t work (jittery links)
CamSa
S
25http://camsas.org @CamSysAtScale
Spanner (2012)
• Spanner offers transactional consistency: full RDBMS
power, ACID properties, at global scale!
• Secret sauce: hardware-assisted clock sync
– Using GPS and atomic clocks in data centres
• Use global timestamps and Paxos to reach consensus
– Still have a period of uncertainty for write TX: wait it out!
– Each timestamp is an interval:
Definitely in
the future
tt.latest
Definitely in
the past
tt.earliest
tabs
CamSa
S
26http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
27http://camsas.org @CamSysAtScale
MapReduce (2004)
• Parallel programming framework for scale
– Run a program on 100’s to 10,000’s machines
• Framework takes care of:
– Parallelization, distribution, load-balancing, scaling up
(or down) & fault-tolerance
• Accessible: programmer provides two methods ;-)
– map(key, value) → list of <key’, value’> pairs
– reduce(key’, value’) → result
– Inspired by functional programming
CamSa
S
28http://camsas.org @CamSysAtScale
MapReduce
Input
Map
Reduce
Output
Shuffle
X: 5 X: 3 Y: 1 Y: 7
Y: 8X: 8
Results: X: 8, Y: 8
Perform Map() query against local data
matching input specification
Aggregate gathered results for each
intermediate key using Reduce()
End user can query results via
distributed key/value store
Slide originally due to S. Hand’s distributed systems lecture course at Cambridge:
http://www.cl.cam.ac.uk/teaching/1112/ConcDisSys/DistributedSystems-1B-H4.pdf
CamSa
S
29http://camsas.org @CamSysAtScale
MapReduce: Pros & Cons
• Extremely simple, and:
– Can auto-parallelize (since operations on every
element in input are independent)
– Can auto-distribute (since rely on underlying
Colossus/BigTable distributed storage)
– Gets fault-tolerance (since tasks are idempotent, i.e.
can just re-execute if a machine crashes)
• Doesn’t really use any sophisticated distributed
systems algorithms (except storage replication)
• However, not a panacea:
– Limited to batch jobs, and computations which are
expressible as a map() followed by a reduce()
CamSa
S
30http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
31http://camsas.org @CamSysAtScale
Dremel (2010)
Column-oriented store
For quick, interactive queries
...
Row-oriented storage
Column-oriented storage
CamSa
S
32http://camsas.org @CamSysAtScale
Dremel (2010)
Stores protocol buffers
Google’s universal serialization format
Nested messages → nested columns
Repeated fields → repeated records
Efficient encoding
Many sparse records: don’t store NULL fields
Record re-assembly
Need to put results back together into records
Use a Finite State Machine (FSM) defined by
protocol buffer structure
CamSa
S
33http://camsas.org @CamSysAtScale
The Google Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/google-stack.pdf
CamSa
S
34http://camsas.org @CamSysAtScale
Borg/Omega
Cluster manager and scheduler
Tracks machine and task liveness
Decides where to run what
Consolidates workloads onto machines
Efficiency gain, cost savings
Need fewer clusters
CamSa
S
35http://camsas.org @CamSysAtScale
BorgMaster
Borg/Omega
BorgMasterBorgMaster
Link shard
Borglet Borglet Borglet
Paxos-replicated
persistent store
Scheduler
Figure reproduced after A. Verma et al., “Large-scale cluster management at
Google with Borg”, Proceedings of EuroSys 2015.
CamSa
S
36http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Jobs/tasks: counts
CPU/RAM: resource seconds [i.e. resource * job runtime in sec.]
Cluster A
Medium size
Medium utilization
Cluster B
Large size
Medium utilization
Cluster C
Medium (12k mach.)
High utilization
Public trace
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
CamSa
S
37http://camsas.org @CamSysAtScale
Borg/Omega: workloads
FractionofjobsrunningforlessthanX
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
Service jobs run for much longer than batch jobs:
long-term user-facing services vs. one-off analytics.
CamSa
S
38http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Fractionofinter-arrivalgapslessthanX
Figure from M. Schwarzkopf et al., “Omega: flexible, scalable schedulers for large
compute clusters”, Proceedings of EuroSys 2013.
Batch jobs arrive more frequently than service jobs:
more numerous, shorter duration, fail more.
CamSa
S
39http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Batch jobs have a longer-tailed CPI distribution:
lower scheduling priority in kernel scheduler.
Figures from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
40http://camsas.org @CamSysAtScale
Borg/Omega: workloads
Service workloads access memory more frequently:
larger working sets, less I/O.
Figures from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
CamSa
S
41http://camsas.org @CamSysAtScale
How’s this different to HPC?
Generally avoid tight synchronization
No barrier-syncs!
Asynchronous replication, eventual consistency
Much more data-intensive
Lower compute-to-data ratio
Aggressive resource sharing for utilization
CamSa
S
42http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
43http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
44http://camsas.org @CamSysAtScale
The facebook Stack
Figure from M. Schwarzkopf, “Operating system support for warehouse-scale
computing”, PhD thesis, University of Cambridge, 2015 (to appear).
Details & Bibliography: http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
CamSa
S
45http://camsas.org @CamSysAtScale
Haystack & f4
Blob stores, hold photos, videos
not: status updates, messages, like counts
Items have a level of hotness
How many users are currently accessing this?
Baseline “cold” storage: MySQL
Want to cache close to users
Reduces network traffic
Reduces latency
But cache capacity is limited!
Replicate for performance, not resilience
CamSa
S
46http://camsas.org @CamSysAtScale
What about
other companies’ stacks?
CamSa
S
47http://camsas.org @CamSysAtScale
How about other companies?
Very similar stacks.
Microsoft, Yahoo, Twitter all similar in principle.
Typical set-up:
Front-end serving systems and fast back-ends.
Batch data processing systems.
Multi-tier structured/unstructured storage hierarchy.
Coordination system and cluster scheduler.
Minor differences owed to business focus
e.g., Amazon focused on inventory/shopping cart.
CamSa
S
48http://camsas.org @CamSysAtScale
Open source software
Lots of open reimplementations!
MapReduce → Hadoop, Spark, Metis
GFS → HDFS
BigTable → HBase, Cassandra
Borg/Omega → Mesos, Firmament
But also some releases from companies…
Presto (Facebook)
Kubernetes (Google)
CamSa
S
49http://camsas.org @CamSysAtScale
Current research
Many hot topics!
This distributed infrastructure is still new.
Make many-core machines go further
Increase utilization, schedule better.
Deal with co-location interference.
Rapid insights on huge datasets
Fast, incremental stream processing, e.g. Naiad.
Approximate analytics, e.g. BlinkDB.
Distributed algorithms
New consensus systems, e.g. RAFT.
CamSa
S
50http://camsas.org @CamSysAtScale
Conclusions
1.Running at huge (10k+ machines) scale
requires different software stacks.
2.Pretty interesting systems
a.Do read the papers!
b.(Partial) Bibliography:
http://malteschwarzkopf.de/research/assets/google-stack.pdf
http://malteschwarzkopf.de/research/assets/facebook-stack.pdf
1.Lots of activity in this area!
a.Open source software
b.Research initiatives
Cambridge
Systems at Scale
@CamSysAtScale
camsas.org

Contenu connexe

Tendances

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad ranaData Con LA
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceJoydeep Sen Sarma
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperDataWorks Summit
 
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetupSteve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetupbigdatalondon
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsliqiang xu
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Hadoop User Group
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopHadoop User Group
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigLester Martin
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeDatabricks
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCPaco Nathan
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012Joydeep Sen Sarma
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 

Tendances (20)

Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Qubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant ConferenceQubole Overview at the Fifth Elephant Conference
Qubole Overview at the Fifth Elephant Conference
 
Cloud Optimized Big Data
Cloud Optimized Big DataCloud Optimized Big Data
Cloud Optimized Big Data
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetupSteve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup
 
Xldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalyticsXldb2011 tue 0940_facebook_realtimeanalytics
Xldb2011 tue 0940_facebook_realtimeanalytics
 
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
Yahoo! Hadoop User Group - May Meetup - HBase and Pig: The Hadoop ecosystem a...
 
Building a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with HadoopBuilding a Scalable Web Crawler with Hadoop
Building a Scalable Web Crawler with Hadoop
 
Transformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs PigTransformation Processing Smackdown; Spark vs Hive vs Pig
Transformation Processing Smackdown; Spark vs Hive vs Pig
 
HBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a FlurryHBaseCon 2015: HBase Operations in a Flurry
HBaseCon 2015: HBase Operations in a Flurry
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at RuntimeAdaptive Query Execution: Speeding Up Spark SQL at Runtime
Adaptive Query Execution: Speeding Up Spark SQL at Runtime
 
Voldemort
VoldemortVoldemort
Voldemort
 
March 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling HadoopMarch 2011 HUG: Scaling Hadoop
March 2011 HUG: Scaling Hadoop
 
Datacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DCDatacenter Computing with Apache Mesos - BigData DC
Datacenter Computing with Apache Mesos - BigData DC
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012The Meta of Hadoop - COMAD 2012
The Meta of Hadoop - COMAD 2012
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Polyalgebra
PolyalgebraPolyalgebra
Polyalgebra
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 

Similaire à What does it take to make google work at scale

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...DataStax Academy
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteMatt Massie
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudGeorge Ang
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage systemArunit Gupta
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1Joe Stein
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Anton Nazaruk
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka MeetupCliff Gilmore
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceDataWorks Summit/Hadoop Summit
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Ian Pointer
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Amazon Web Services
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsAsis Mohanty
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftAmazon Web Services
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to ProductionDataStax
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData WebinarSnappyData
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)zznate
 

Similaire à What does it take to make google work at scale (20)

Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Ga4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger instituteGa4 gh meeting at the the sanger institute
Ga4 gh meeting at the the sanger institute
 
Big Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the CloudBig Data on EC2: Mashing Technology in the Cloud
Big Data on EC2: Mashing Technology in the Cloud
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
The Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open SourceThe Next Generation of Data Processing and Open Source
The Next Generation of Data Processing and Open Source
 
Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916Introduction To Spark - Durham LUG 20150916
Introduction To Spark - Durham LUG 20150916
 
Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3Interactively Querying Large-scale Datasets on Amazon S3
Interactively Querying Large-scale Datasets on Amazon S3
 
Cloud Lambda Architecture Patterns
Cloud Lambda Architecture PatternsCloud Lambda Architecture Patterns
Cloud Lambda Architecture Patterns
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
From PoCs to Production
From PoCs to ProductionFrom PoCs to Production
From PoCs to Production
 
Intro to SnappyData Webinar
Intro to SnappyData WebinarIntro to SnappyData Webinar
Intro to SnappyData Webinar
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)Introduciton to Apache Cassandra for Java Developers (JavaOne)
Introduciton to Apache Cassandra for Java Developers (JavaOne)
 

Plus de xlight

淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告xlight
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filterxlight
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0xlight
 
Oracle ha
Oracle haOracle ha
Oracle haxlight
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述xlight
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned tablexlight
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010xlight
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程xlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systemsxlight
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Servicexlight
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQLxlight
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeksxlight
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems xlight
 
sector-sphere
sector-spheresector-sphere
sector-spherexlight
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004xlight
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Workxlight
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickrxlight
 

Plus de xlight (20)

淘宝无线电子商务数据报告
淘宝无线电子商务数据报告淘宝无线电子商务数据报告
淘宝无线电子商务数据报告
 
New zealand bloom filter
New zealand bloom filterNew zealand bloom filter
New zealand bloom filter
 
Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0Product manager-chrissyuan v1.0
Product manager-chrissyuan v1.0
 
Oracle ha
Oracle haOracle ha
Oracle ha
 
Oracle 高可用概述
Oracle 高可用概述Oracle 高可用概述
Oracle 高可用概述
 
Stats partitioned table
Stats partitioned tableStats partitioned table
Stats partitioned table
 
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
Optimizing Drupal Performance Zend Acquia Whitepaper Feb2010
 
C/C++与Lua混合编程
C/C++与Lua混合编程C/C++与Lua混合编程
C/C++与Lua混合编程
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed SystemsGoogle: The Chubby Lock Service for Loosely-Coupled Distributed Systems
Google: The Chubby Lock Service for Loosely-Coupled Distributed Systems
 
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile ServiceHigh Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
High Availability MySQL with DRBD and Heartbeat MTV Japan Mobile Service
 
PgSQL vs MySQL
PgSQL vs MySQLPgSQL vs MySQL
PgSQL vs MySQL
 
SpeedGeeks
SpeedGeeksSpeedGeeks
SpeedGeeks
 
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems GOOGLE: Designs, Lessons and Advice from Building Large   Distributed Systems
GOOGLE: Designs, Lessons and Advice from Building Large Distributed Systems
 
UDT
UDTUDT
UDT
 
sector-sphere
sector-spheresector-sphere
sector-sphere
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004Gfarm Fs Tatebe Tip2004
Gfarm Fs Tatebe Tip2004
 
Make Your web Work
Make Your web WorkMake Your web Work
Make Your web Work
 
Capacity Management from Flickr
Capacity Management from FlickrCapacity Management from Flickr
Capacity Management from Flickr
 

Dernier

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHC Sai Kiran
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - GuideGOPINATHS437943
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 

Dernier (20)

IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Introduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECHIntroduction to Machine Learning Unit-3 for II MECH
Introduction to Machine Learning Unit-3 for II MECH
 
Vishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documentsVishratwadi & Ghorpadi Bridge Tender documents
Vishratwadi & Ghorpadi Bridge Tender documents
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
Transport layer issues and challenges - Guide
Transport layer issues and challenges - GuideTransport layer issues and challenges - Guide
Transport layer issues and challenges - Guide
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 

What does it take to make google work at scale