SlideShare une entreprise Scribd logo
1  sur  38
Data & Infrastructure
Brenden Matthews
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/airbnb-data-infrastructure

InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Presented at QCon San Francisco
www.qconsf.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Alternative Titles

● Datacentres of the future
● Building HA infrastructure
● Building automated HA infrastructure
● Data & Infrastructure
A Quick Survey
● Google Borg
● Google MapReduce
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
A Quick Survey
● Google Borg
● Google MapReduce
● Apache Hadoop
● Apache Mesos
○
○
○
○

Chronos
Marathon
Storm
Apache Aurora (incubator)
Apache Mesos
Distributed computing platform
Or, a distributed operating system
Apache Mesos
●
●
●
●

Master/slave architecture
One master elected among
themselves
Most of the state is contained in
the slaves themselves
Master doesn’t do much:
○ Manages resources
○ Acts as a go-between for
slaves and frameworks

Master

Master

Slave
Slave
Slave
Slave

ZooKeeper
ZooKeeper
ZooKeeper
Apache Mesos: Components
●

●

●

●

libprocess
○ Components communicate using async messaging
○ Messages are immutable; internals easily parallelized
Master
○ Offers slave resources to frameworks
○ Launches tasks on slaves for accepted offers
○ Forwards status messages between tasks and frameworks
○ Task reconciliation for frameworks
Slave
○ Monitors individual tasks, reports status to master
○ Performs resource monitoring on tasks
○ Ensures tasks don’t exceed resource limits (cgroups)
Framework (i.e., your application)
○ Receives resource offers from master
○ Launches tasks for acceptable offers
Apache Mesos: Slave Detail
●
●
●
●
●

●

Slaves are configured with a resource
policy
Slaves execute tasks, which are submitted
by frameworks
Task resource limits are enforced with
cgroups
Tasks that exceed memory limit will be
killed (OOM’d)
Resources:
○ CPU, mem, ports (‘standard’)
○ network, and user defined parameters
Recovery: slaves can be restarted without
killing tasks (cool!)

Framework

CPU

Memory

Share

Chronos

1

1

3%

Storm

5

5

15%

Marathon

16

30

50%

*

32

60

100%
Apache Mesos: Framework Detail
●
●
●
●
●
●

Frameworks are applications that run on Mesos
The framework runs as a separate process, either on it’s own or as
a Mesos task itself (more on this later)
Frameworks must decide whether resource offers are sufficient
before launching a task
Once tasks are launched, frameworks must wait for status updates
and monitor the state of tasks
Task state can be reconciled with the Mesos master
Framework state may be stored using the Mesos State API (a keyvalue store)
Apache Mesos: Framework Detail
A sample resource offer
--id: 201310221926-2276627466-5050-24060-52872
framework_id: 201310152336-200446986-5050-29272-0000
slave_id: 201310182038-2276627466-5050-2945-0
hostname: i-babc911a
resources:
ports:
range:
begin: 31002
Type
end: 32000
role: *
CPUs
cpus:
value: 16
Memory
role: marathon
mem:
value: 30720
Ports
role: marathon
slave_load_hint: 0.53

Value

Role

16

Marathon

30GiB

Marathon

[31002,32000]

*
Apache Mesos: Framework Detail
Resource offer handling sample in JavaScala
public void resourceOffers(SchedulerDriver schedulerDriver,

continued…

List<Offer> offers) {
for (offer <- offers) { // this is actually Scala

final boolean sufficient = computeSlots();
if (!sufficient) {

// Launch TaskTrackers to satisfy the slot requirements.

schedulerDriver.declineOffer(offer.getId());

// Pull out the cpus, memory, disk, and 2 ports from the
offer.

continue;

for (Resource resource : offer.getResourcesList()) {

}

if (resource.getName().equals("cpus")

schedulerDriver.launchTasks(offer.getId(),

&& resource.getType() == Value.Type.SCALAR) {
cpus = resource.getScalar().getValue();
cpuRole = resource.getRole();
} else if (resource.getName().equals("mem")
&& resource.getType() == Value.Type.SCALAR) {
mem = resource.getScalar().getValue();
memRole = resource.getRole();
} else if (resource.getName().equals("disk")
&& resource.getType() == Value.Type.SCALAR) {
//...

Arrays.asList(info));
}
Apache Mesos: Framework Detail
●
●

:(

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos
Apache Mesos: Framework Detail
●
●

Writing frameworks is not for everyone! (it’s a bit tricky)
Frameworks like Marathon and Apache Aurora make it possible to
write applications atop Mesos without having to worry about Mesos

●
●

The Mesos framework ecosystem is alive and well!
A quadfecta of frameworks cover most use cases:
○ Hadoop - batch processing
○ Storm - stream processing
○ Chronos - task scheduling
○ Marathon or Aurora - long running services
Frameworks: Hadoop
● Hadoop on Mesos behaves like any other
Hadoop (except, perhaps, YARN)
● Code lives at https://github.
com/mesos/hadoop
Frameworks: Storm
● Storm is a distributed stream processing
framework
● ‘doing for realtime processing what Hadoop
did for batch processing’ — Nathan Marz
● Storm runs on Mesos at Twitter, but does
not ship with a Mesos scheduler
● Code lives at https://github.
com/brndnmtthws/storm
Frameworks: Chronos
● Chronos is a task scheduler that runs on
Mesos
● Could be thought of as ‘distributed cron on
Mesos’
● Code lives at https://github.
com/airbnb/chronos
Frameworks: Apache Aurora
● Aurora is a service framework developed at
Twitter - a significant portion of Twitter’s
infrastructure runs atop Aurora
● Aurora was announced as an Apache
Incubator project on Oct 1st, 2013
● Code lives at https://github.
com/twitter/aurora
Frameworks: Marathon
● Marathon is a framework for running
services on Mesos, similar to Aurora
● Marathon can be thought of as a meta
framework (more on this later)
● Project was created by many of the folks
behind Chronos
● Code lives at https://github.
com/mesosphere/marathon
Marathon
Marathon as a Meta-Framework
● Marathon is designed to run tasks and
guarantee they stay running
● Why not run Marathon on top of itself in
addition to other frameworks?
● Frameworks like Hadoop and Chronos can
be run atop Marathon today
Let’s talk about what this means
High Availability
● Slaves execute tasks, and the slaves
themselves are independent of each other
● You may run frameworks as tasks on slaves
● A high availability cluster might consist of
having 1 or more Mesos masters, in addition
to frameworks, running as Mesos tasks
High Availability
Typical Mesos cluster
●
2 masters, 1 elected
●
2 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T
High Availability
HA Mesos cluster w/ Marathon
●
3 masters, 1 elected
●
3 instances of framework A,
1 elected

Master

Slave

T

T

T

Master

Master

Slave

Framework A

T

T

T

Slave

Framework A

T

T

T

Framework A
High Availability
● Split cluster across datacentres
○ us-east-1a
○ us-east-1b
○ us-east-1e

● Replication factor of 3 with rack awareness
reduces sleepless nights
Automated Infrastructure
● Every machine is exactly the same! (except
masters)
● Maintenance becomes as simple as
start/stopping slaves
● Application experts have greater control over
deployment, without the need for worrying
about resources
Seeing is believing
Airpad
● A small ruby library for deploying
applications (i.e., services) on Mesos with
Marathon
● Depends upon SmartStack, Airbnb’s service
discovery tool
Airpad
● Things we run (experimentally) with Airpad
○
○
○
○
○
○
○

Kafka
Cassandra
Presto
Chronos
Marathon
Hadoop JobTracker
Other internal tools
Airpad Demonstration
Other Lessons I’ve Learned
● Figure out how to manage state early on
○ Depend upon replicated services (Cassandra, Kafka,
HDFS)
○ Use replicated storage (S3, HDFS)
○ Create backups and restore processes

● Better to over-provision than under-provision
○ It’s easier to scale up than scale down
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/airbnbdata-infrastructure

Contenu connexe

Plus de C4Media

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 

Plus de C4Media (20)

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Data & Infrastructure at Airbnb

  • 2. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /airbnb-data-infrastructure InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month
  • 3. Presented at QCon San Francisco www.qconsf.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Alternative Titles ● Datacentres of the future ● Building HA infrastructure ● Building automated HA infrastructure ● Data & Infrastructure
  • 5. A Quick Survey ● Google Borg ● Google MapReduce
  • 6. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop
  • 7. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos
  • 8. A Quick Survey ● Google Borg ● Google MapReduce ● Apache Hadoop ● Apache Mesos ○ ○ ○ ○ Chronos Marathon Storm Apache Aurora (incubator)
  • 9. Apache Mesos Distributed computing platform Or, a distributed operating system
  • 10. Apache Mesos ● ● ● ● Master/slave architecture One master elected among themselves Most of the state is contained in the slaves themselves Master doesn’t do much: ○ Manages resources ○ Acts as a go-between for slaves and frameworks Master Master Slave Slave Slave Slave ZooKeeper ZooKeeper ZooKeeper
  • 11. Apache Mesos: Components ● ● ● ● libprocess ○ Components communicate using async messaging ○ Messages are immutable; internals easily parallelized Master ○ Offers slave resources to frameworks ○ Launches tasks on slaves for accepted offers ○ Forwards status messages between tasks and frameworks ○ Task reconciliation for frameworks Slave ○ Monitors individual tasks, reports status to master ○ Performs resource monitoring on tasks ○ Ensures tasks don’t exceed resource limits (cgroups) Framework (i.e., your application) ○ Receives resource offers from master ○ Launches tasks for acceptable offers
  • 12. Apache Mesos: Slave Detail ● ● ● ● ● ● Slaves are configured with a resource policy Slaves execute tasks, which are submitted by frameworks Task resource limits are enforced with cgroups Tasks that exceed memory limit will be killed (OOM’d) Resources: ○ CPU, mem, ports (‘standard’) ○ network, and user defined parameters Recovery: slaves can be restarted without killing tasks (cool!) Framework CPU Memory Share Chronos 1 1 3% Storm 5 5 15% Marathon 16 30 50% * 32 60 100%
  • 13. Apache Mesos: Framework Detail ● ● ● ● ● ● Frameworks are applications that run on Mesos The framework runs as a separate process, either on it’s own or as a Mesos task itself (more on this later) Frameworks must decide whether resource offers are sufficient before launching a task Once tasks are launched, frameworks must wait for status updates and monitor the state of tasks Task state can be reconciled with the Mesos master Framework state may be stored using the Mesos State API (a keyvalue store)
  • 14. Apache Mesos: Framework Detail A sample resource offer --id: 201310221926-2276627466-5050-24060-52872 framework_id: 201310152336-200446986-5050-29272-0000 slave_id: 201310182038-2276627466-5050-2945-0 hostname: i-babc911a resources: ports: range: begin: 31002 Type end: 32000 role: * CPUs cpus: value: 16 Memory role: marathon mem: value: 30720 Ports role: marathon slave_load_hint: 0.53 Value Role 16 Marathon 30GiB Marathon [31002,32000] *
  • 15. Apache Mesos: Framework Detail Resource offer handling sample in JavaScala public void resourceOffers(SchedulerDriver schedulerDriver, continued… List<Offer> offers) { for (offer <- offers) { // this is actually Scala final boolean sufficient = computeSlots(); if (!sufficient) { // Launch TaskTrackers to satisfy the slot requirements. schedulerDriver.declineOffer(offer.getId()); // Pull out the cpus, memory, disk, and 2 ports from the offer. continue; for (Resource resource : offer.getResourcesList()) { } if (resource.getName().equals("cpus") schedulerDriver.launchTasks(offer.getId(), && resource.getType() == Value.Type.SCALAR) { cpus = resource.getScalar().getValue(); cpuRole = resource.getRole(); } else if (resource.getName().equals("mem") && resource.getType() == Value.Type.SCALAR) { mem = resource.getScalar().getValue(); memRole = resource.getRole(); } else if (resource.getName().equals("disk") && resource.getType() == Value.Type.SCALAR) { //... Arrays.asList(info)); }
  • 16. Apache Mesos: Framework Detail ● ● :( Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos
  • 17. Apache Mesos: Framework Detail ● ● Writing frameworks is not for everyone! (it’s a bit tricky) Frameworks like Marathon and Apache Aurora make it possible to write applications atop Mesos without having to worry about Mesos ● ● The Mesos framework ecosystem is alive and well! A quadfecta of frameworks cover most use cases: ○ Hadoop - batch processing ○ Storm - stream processing ○ Chronos - task scheduling ○ Marathon or Aurora - long running services
  • 18. Frameworks: Hadoop ● Hadoop on Mesos behaves like any other Hadoop (except, perhaps, YARN) ● Code lives at https://github. com/mesos/hadoop
  • 19. Frameworks: Storm ● Storm is a distributed stream processing framework ● ‘doing for realtime processing what Hadoop did for batch processing’ — Nathan Marz ● Storm runs on Mesos at Twitter, but does not ship with a Mesos scheduler ● Code lives at https://github. com/brndnmtthws/storm
  • 20. Frameworks: Chronos ● Chronos is a task scheduler that runs on Mesos ● Could be thought of as ‘distributed cron on Mesos’ ● Code lives at https://github. com/airbnb/chronos
  • 21. Frameworks: Apache Aurora ● Aurora is a service framework developed at Twitter - a significant portion of Twitter’s infrastructure runs atop Aurora ● Aurora was announced as an Apache Incubator project on Oct 1st, 2013 ● Code lives at https://github. com/twitter/aurora
  • 22. Frameworks: Marathon ● Marathon is a framework for running services on Mesos, similar to Aurora ● Marathon can be thought of as a meta framework (more on this later) ● Project was created by many of the folks behind Chronos ● Code lives at https://github. com/mesosphere/marathon
  • 24. Marathon as a Meta-Framework ● Marathon is designed to run tasks and guarantee they stay running ● Why not run Marathon on top of itself in addition to other frameworks? ● Frameworks like Hadoop and Chronos can be run atop Marathon today
  • 25. Let’s talk about what this means
  • 26. High Availability ● Slaves execute tasks, and the slaves themselves are independent of each other ● You may run frameworks as tasks on slaves ● A high availability cluster might consist of having 1 or more Mesos masters, in addition to frameworks, running as Mesos tasks
  • 27. High Availability Typical Mesos cluster ● 2 masters, 1 elected ● 2 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  • 28. High Availability HA Mesos cluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Slave Framework A T T T Slave Framework A T T T
  • 29. High Availability HA Mesos cluster w/ Marathon ● 3 masters, 1 elected ● 3 instances of framework A, 1 elected Master Slave T T T Master Master Slave Framework A T T T Slave Framework A T T T Framework A
  • 30. High Availability ● Split cluster across datacentres ○ us-east-1a ○ us-east-1b ○ us-east-1e ● Replication factor of 3 with rack awareness reduces sleepless nights
  • 31. Automated Infrastructure ● Every machine is exactly the same! (except masters) ● Maintenance becomes as simple as start/stopping slaves ● Application experts have greater control over deployment, without the need for worrying about resources
  • 33. Airpad ● A small ruby library for deploying applications (i.e., services) on Mesos with Marathon ● Depends upon SmartStack, Airbnb’s service discovery tool
  • 34. Airpad ● Things we run (experimentally) with Airpad ○ ○ ○ ○ ○ ○ ○ Kafka Cassandra Presto Chronos Marathon Hadoop JobTracker Other internal tools
  • 36. Other Lessons I’ve Learned ● Figure out how to manage state early on ○ Depend upon replicated services (Cassandra, Kafka, HDFS) ○ Use replicated storage (S3, HDFS) ○ Create backups and restore processes ● Better to over-provision than under-provision ○ It’s easier to scale up than scale down
  • 37.
  • 38. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/airbnbdata-infrastructure