SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
Messaging
“Just pick something”
A little about myself
● Sean Kelly
○ Also known as Stabby
● I went from .NET to Ruby to Go
○ But my favorite language is SQL
● Core maintainer of Tapjoys:
○ Chore - https://github.com/Tapjoy/chore
○ Dynamiq - https://github.com/Tapjoy/dynamiq
● I love IPAs
Speaking of Tapjoy...
We do…
● 1.8 Billion Requests per minute
○ And almost as many messages per day
● ~170 Million Jobs per day
● All on ~750 EC2 instances and private
servers
● A stocked double-kegerator
○ Right now: Pinner IPA / Cisco Summer of Lager
What is “Messaging”
Like, jobs and stuff?
Messaging is...
● A way to share important events, without
needing to know who's listening
● A way to handle processing events and
information at a larger scale
● Not all that unlike “background jobs”
○ Jobs: “I’ll do this later”
○ Messaging: “Other people will do this later”
How does this fit into my
app?
It sure sounds cool
Messaging and You
Let’s say you’ve got a great new app, and for a
while things are fine
Monolith
1.0
Messaging and You
Eventually, you need to push work out of band
Monolith
1.2
Jobs
Messaging and You
Now you have several services, and they all
need to share info
Monolith
1.5
Jobs
One-off which
becomes a
core part of
your business
Jobs
Failed
attempt at
Micro
Service
Reporting
System
Sure, but how can you
actually use Messaging?
Those weren’t even very good drawings
They didn’t have lines or anything
What types of Messaging are there?
● 1:1, traditional “Queueing”
○ Basic push / pull model of doing work
○ Common with asynchronous job processing
○ RabbitMQ, ActiveMQ, SQS, Disqueue, Dynamiq, NSQ
● Fanout
○ Broadcast style publishing, all listeners get a copy
○ Ex: A game pushing out notifications of an update
○ Most technologies with 1:1 queues support this in some way
What types of Messaging are there?
● Routing
○ Intelligent fanout, routes to listeners based on message
metadata
○ Newsgroups: Subscribe to food.charcuterie.*, get bressola
○ RabbitMQ does this pretty well
● Streaming
○ Persistent connection, constant source of raw bytes
○ Twitter's Firehose is one example
○ Kafka is a current popular choice
○ Really popular with the Scala / Spark crowd
OK, so my Apps and
Services need to talk
Can’t I just stick it all in a shared database
and be done with it?
No
You certainly cannot
Some things, maybe
But not everything, it just doesn’t scale that
way
Why not just stick it all in a DB?
● You can some share of your data this way
○ Depends on the use case, type of information
○ This is outside the scope of this talk
● Databases are not designed for delivering
messages
○ Any “queue” tables will be ridiculously contended
○ No atomic “pull” options
So, what does Tapjoy do?
You guys must have solved this, right?
At Tapjoy, we use...
● RabbitMQ
○ Moves analytics events to reporting endpoints by way of complex filesystem / s3 approach
○ Single node with sharded queues
○ Rabbit HA cannot handle our scale
● SNS / SQS
○ SNS in some newer projects, mostly for fanout
○ SQS for all traditional background jobs
● Kinesis
○ Pilot integration for a new analytics pipeline
○ Being supplanted with Kafka
● Kafka
○ New analytics pipeline
○ Used to distribute metrics to both the new endpoint as well as the existing one for
verification
● Dynamiq
○ Inhouse Open Source SNS/SQS-alike built on top of Riak 2.0
○ Currently used to circumvent complicated and slow legacy messaging service
But I’m not really here to
talk about Tapjoy
Not entirely
I’m more interested in you
So, what do I pick?
There are so many choices, and they all
seem like they’d work
I’m not really here to tell
you what to pick, either!
I’d rather talk to you about how to pick, and
how you integrate your choices
Distributed Systems are all about tradeoffs
Ask: What are my actual needs?
● Planning for 2 years down the road is smart
○ But solutions right now get shit done
○ Include a cost projection with scale estimates
● Build a prototype (or two)
○ Try to iterate quickly
○ Understand how you’d use whatever you choose
○ Don’t be afraid to move on
○ Look at multiple client libraries
■ Look for: Docs, Active repos, Idiomatic
Ask: What is my latency tolerance?
● Publishing Messages
○ How much time can your app tolerate for publishing?
○ What does publish latency look like during an issue?
○ Consider the worst-case scenario when planning
● Consuming Messages
○ Can you run multiple consumers without impacting
the service?
● End to End
○ How fast is the whole experience, round trip?
Ask: What level of durability?
● Client
○ Batched VS Unbatched / Streaming
○ Acknowledged writes
● Server
○ Messages held in memory VS disk
○ Messages highly-available?
○ Recover from network partitions safely?
○ At-Most-Once VS At-Least-Once
■ Exactly-Once is something of a myth
Ask: What about throughput?
● How many producing clients do you have
● How many messages per second will they submit
○ Does message size impact performance?
● What size should the cluster be?
○ Super cluster VS specialized clusters
● How many consumers it takes to keep pace
○ With room to grow
Ask: What does failure mean?
● What does a message publishing error
mean?
● What does a delay in the processing pipeline
mean?
● What does a “lost” or failed message mean?
● What does a total failure of the messaging
system mean?
Ask: What behavior do I want?
Is it…
● CA?
○ Not distributed, will be difficult to scale past 1 box
○ Traditional RDBMS systems are typically CA
● CP?
○ Good if you need strongly consistent data
○ Partitions can cause data unavailability
● AP?
○ Good if you need complete availability
○ Eventual consistency can often be “good enough”
Okay, so I lied a little bit
I’ll give you one recommendation
Do you have...
● Relatively small (< 256kb) message sizes?
● Not so strict (~50ms) latency requirements?
● Throughput on the order of 100m or less per
month?
● A tolerance or capability to handle the
occasional duplicate message?
● No concern around being locked into a
vendor-specific technology?
Go use SNS and SQS
immediately
Leave here now and just do it
It’s easy, it’s cheap (at that scale), and you
don’t need to maintain it
Ok, so I picked
“something”
Anything else to know?
You don’t have to choose just 1
● It’s a falsehood that you need 1 perfect
technology
○ Each has strengths, weaknesses, and ideal use
cases
● Don’t be afraid to use something else
○ If you’re lucky, your app lives long enough to see
many different infrastructure needs
Avoid direct implementations
● Wrap the notion of Publishing in an interface
○ Most technologies look surprisingly similar to publish
○ You can wrap this in a simple interface, and switch
implementations as needed
● Consuming is usually unique per technology
○ Just write a new one
○ Trying to interface this part is probably more trouble
than it’s worth
○ Play to the unique strengths of the technology
Interfacing your Messaging choices
● Sending messages is often as simple as a name and a chunk of
data
○ Define a simple interface for pushing arbitrary data towards a
named endpoint
○ A name and a string of JSON is usually enough to get going
○ At Tapjoy, we use our Chore library to handle abstracting
message publishing from messaging technologies
● Destinations are independent from messages
○ You could need to switch sending messages to a new
technology
○ You could have 2 or more different systems depending on the
information in a given message
How do I change messages safely?
● Wrap messages in a simple envelope
○ Keep metadata about the message distinct from metadata
about the event it describes
● Define schemas for message bodies
○ Schemas give you a catalogue of message definitions, and the
ability to version them
○ At Tapjoy, we use our TOLL to build endpoint-agnostic clients
based on schemas, and register them to use Chore publishers.
● Consumers need older schemas
○ Lets them reason about how to handle older messages
○ Keep a backlog of N older versions, drop support for > N
In Conclusion
Keep in mind
● Distributed Systems - all about tradeoffs
○ Never trade “P”
● Understand your needs
○ Latency, Throughput, Availability, Durability
● Understand how it fits into your architecture
● Interfaces are your friend
○ They can give you a lot of flexibility
Keep in mind
● Use schemas and versioning to support changes to
messages themselves
● Just pick something
○ Build a prototype, or two (or three)
○ Your second try will probably go better
○ SNS/SQS is a decent choice, if latency isn’t a
concern
● Tapjoy is a great place to work on these kinds of
problems at huge scale
Messaging
“Just pick something”
Sean Kelly
@StabbyCutyou

Contenu connexe

Tendances

Advanced web application architecture - PHP Barcelona
Advanced web application architecture  - PHP BarcelonaAdvanced web application architecture  - PHP Barcelona
Advanced web application architecture - PHP Barcelona
Matthias Noback
 

Tendances (19)

Brutal refactoring, lying code, the Churn, and other emotional stories from L...
Brutal refactoring, lying code, the Churn, and other emotional stories from L...Brutal refactoring, lying code, the Churn, and other emotional stories from L...
Brutal refactoring, lying code, the Churn, and other emotional stories from L...
 
Php : Why and When!
Php : Why and When!Php : Why and When!
Php : Why and When!
 
Share the insight of ServiceInsight
Share the insight of ServiceInsightShare the insight of ServiceInsight
Share the insight of ServiceInsight
 
Microservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problemsMicroservices - opportunities, dilemmas and problems
Microservices - opportunities, dilemmas and problems
 
Document-Driven transactions
Document-Driven transactionsDocument-Driven transactions
Document-Driven transactions
 
Architecture of web servers
Architecture of web serversArchitecture of web servers
Architecture of web servers
 
OSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles JudithOSMC 2019 | How to improve database Observability by Charles Judith
OSMC 2019 | How to improve database Observability by Charles Judith
 
Dynamo Amazon’s Highly Available Key-value Store
Dynamo Amazon’s Highly Available Key-value StoreDynamo Amazon’s Highly Available Key-value Store
Dynamo Amazon’s Highly Available Key-value Store
 
Advanced web application architecture Way2Web
Advanced web application architecture Way2WebAdvanced web application architecture Way2Web
Advanced web application architecture Way2Web
 
The Economics of Microservices (2017 CraftConf)
The Economics of Microservices  (2017 CraftConf)The Economics of Microservices  (2017 CraftConf)
The Economics of Microservices (2017 CraftConf)
 
Introduction to Reactjs
Introduction to ReactjsIntroduction to Reactjs
Introduction to Reactjs
 
The Next Generation of Microservices
The Next Generation of MicroservicesThe Next Generation of Microservices
The Next Generation of Microservices
 
Introduction to MDC Logging in Scala.pdf
Introduction to MDC Logging in Scala.pdfIntroduction to MDC Logging in Scala.pdf
Introduction to MDC Logging in Scala.pdf
 
The Next Generation of Microservices — YOW 2017 Brisbane
The Next Generation of Microservices — YOW 2017 BrisbaneThe Next Generation of Microservices — YOW 2017 Brisbane
The Next Generation of Microservices — YOW 2017 Brisbane
 
Kong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N TellKong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N Tell
 
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQuearyNATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
 
Reactive Principles and Microservices
Reactive Principles and MicroservicesReactive Principles and Microservices
Reactive Principles and Microservices
 
Advanced web application architecture - PHP Barcelona
Advanced web application architecture  - PHP BarcelonaAdvanced web application architecture  - PHP Barcelona
Advanced web application architecture - PHP Barcelona
 
How blockchain could give us a cleaner grid
How blockchain could give us a cleaner gridHow blockchain could give us a cleaner grid
How blockchain could give us a cleaner grid
 

En vedette

CYBERdisk WORMdisk SLIDES 2016-HACK PROOF DATA
CYBERdisk  WORMdisk SLIDES  2016-HACK PROOF DATACYBERdisk  WORMdisk SLIDES  2016-HACK PROOF DATA
CYBERdisk WORMdisk SLIDES 2016-HACK PROOF DATA
Keith P. Melvey
 

En vedette (7)

CYBERdisk WORMdisk SLIDES 2016-HACK PROOF DATA
CYBERdisk  WORMdisk SLIDES  2016-HACK PROOF DATACYBERdisk  WORMdisk SLIDES  2016-HACK PROOF DATA
CYBERdisk WORMdisk SLIDES 2016-HACK PROOF DATA
 
Comments: Why not What
Comments: Why not WhatComments: Why not What
Comments: Why not What
 
Messaging
MessagingMessaging
Messaging
 
JavaScript framework overview
JavaScript framework overviewJavaScript framework overview
JavaScript framework overview
 
Локализация Spree Сommerce
Локализация Spree СommerceЛокализация Spree Сommerce
Локализация Spree Сommerce
 
Catalogo prezi
Catalogo preziCatalogo prezi
Catalogo prezi
 
Milkpak versus olpers
Milkpak versus olpers Milkpak versus olpers
Milkpak versus olpers
 

Similaire à Messaging

What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
Stefano Fago
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
NYversity
 

Similaire à Messaging (20)

Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
A DevOps Checklist for Startups
A DevOps Checklist for StartupsA DevOps Checklist for Startups
A DevOps Checklist for Startups
 
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
Scaling a Core Banking Engine Using Apache Kafka | Peter Dudbridge, Thought M...
 
I Know What You Did Last Summer
I Know What You Did Last SummerI Know What You Did Last Summer
I Know What You Did Last Summer
 
Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022Going Multiplayer With Kafka With Ben Gamble | Current 2022
Going Multiplayer With Kafka With Ben Gamble | Current 2022
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Activity feeds (and more) at mate1
Activity feeds (and more) at mate1Activity feeds (and more) at mate1
Activity feeds (and more) at mate1
 
Computer network (7)
Computer network (7)Computer network (7)
Computer network (7)
 
Scalability broad strokes
Scalability   broad strokesScalability   broad strokes
Scalability broad strokes
 
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]ASA Trial Workshop Slides for Archives NZ [2016-09-28]
ASA Trial Workshop Slides for Archives NZ [2016-09-28]
 
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
1. 'Interoperability. A quick chat, a few war stories'. Carl Wilson, Open Pla...
 
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Writing clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancodingWriting clean scientific software Murphy cleancoding
Writing clean scientific software Murphy cleancoding
 
TDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-LanguageTDC 2020 - Implementing a Mini-Language
TDC 2020 - Implementing a Mini-Language
 
Log Management: AtlSecCon2015
Log Management: AtlSecCon2015Log Management: AtlSecCon2015
Log Management: AtlSecCon2015
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Dernier (20)

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 

Messaging

  • 2. A little about myself ● Sean Kelly ○ Also known as Stabby ● I went from .NET to Ruby to Go ○ But my favorite language is SQL ● Core maintainer of Tapjoys: ○ Chore - https://github.com/Tapjoy/chore ○ Dynamiq - https://github.com/Tapjoy/dynamiq ● I love IPAs
  • 3. Speaking of Tapjoy... We do… ● 1.8 Billion Requests per minute ○ And almost as many messages per day ● ~170 Million Jobs per day ● All on ~750 EC2 instances and private servers ● A stocked double-kegerator ○ Right now: Pinner IPA / Cisco Summer of Lager
  • 5. Messaging is... ● A way to share important events, without needing to know who's listening ● A way to handle processing events and information at a larger scale ● Not all that unlike “background jobs” ○ Jobs: “I’ll do this later” ○ Messaging: “Other people will do this later”
  • 6. How does this fit into my app? It sure sounds cool
  • 7. Messaging and You Let’s say you’ve got a great new app, and for a while things are fine Monolith 1.0
  • 8. Messaging and You Eventually, you need to push work out of band Monolith 1.2 Jobs
  • 9. Messaging and You Now you have several services, and they all need to share info Monolith 1.5 Jobs One-off which becomes a core part of your business Jobs Failed attempt at Micro Service Reporting System
  • 10. Sure, but how can you actually use Messaging? Those weren’t even very good drawings They didn’t have lines or anything
  • 11. What types of Messaging are there? ● 1:1, traditional “Queueing” ○ Basic push / pull model of doing work ○ Common with asynchronous job processing ○ RabbitMQ, ActiveMQ, SQS, Disqueue, Dynamiq, NSQ ● Fanout ○ Broadcast style publishing, all listeners get a copy ○ Ex: A game pushing out notifications of an update ○ Most technologies with 1:1 queues support this in some way
  • 12. What types of Messaging are there? ● Routing ○ Intelligent fanout, routes to listeners based on message metadata ○ Newsgroups: Subscribe to food.charcuterie.*, get bressola ○ RabbitMQ does this pretty well ● Streaming ○ Persistent connection, constant source of raw bytes ○ Twitter's Firehose is one example ○ Kafka is a current popular choice ○ Really popular with the Scala / Spark crowd
  • 13. OK, so my Apps and Services need to talk Can’t I just stick it all in a shared database and be done with it?
  • 14. No You certainly cannot Some things, maybe But not everything, it just doesn’t scale that way
  • 15. Why not just stick it all in a DB? ● You can some share of your data this way ○ Depends on the use case, type of information ○ This is outside the scope of this talk ● Databases are not designed for delivering messages ○ Any “queue” tables will be ridiculously contended ○ No atomic “pull” options
  • 16. So, what does Tapjoy do? You guys must have solved this, right?
  • 17. At Tapjoy, we use... ● RabbitMQ ○ Moves analytics events to reporting endpoints by way of complex filesystem / s3 approach ○ Single node with sharded queues ○ Rabbit HA cannot handle our scale ● SNS / SQS ○ SNS in some newer projects, mostly for fanout ○ SQS for all traditional background jobs ● Kinesis ○ Pilot integration for a new analytics pipeline ○ Being supplanted with Kafka ● Kafka ○ New analytics pipeline ○ Used to distribute metrics to both the new endpoint as well as the existing one for verification ● Dynamiq ○ Inhouse Open Source SNS/SQS-alike built on top of Riak 2.0 ○ Currently used to circumvent complicated and slow legacy messaging service
  • 18. But I’m not really here to talk about Tapjoy Not entirely I’m more interested in you
  • 19. So, what do I pick? There are so many choices, and they all seem like they’d work
  • 20. I’m not really here to tell you what to pick, either! I’d rather talk to you about how to pick, and how you integrate your choices Distributed Systems are all about tradeoffs
  • 21. Ask: What are my actual needs? ● Planning for 2 years down the road is smart ○ But solutions right now get shit done ○ Include a cost projection with scale estimates ● Build a prototype (or two) ○ Try to iterate quickly ○ Understand how you’d use whatever you choose ○ Don’t be afraid to move on ○ Look at multiple client libraries ■ Look for: Docs, Active repos, Idiomatic
  • 22. Ask: What is my latency tolerance? ● Publishing Messages ○ How much time can your app tolerate for publishing? ○ What does publish latency look like during an issue? ○ Consider the worst-case scenario when planning ● Consuming Messages ○ Can you run multiple consumers without impacting the service? ● End to End ○ How fast is the whole experience, round trip?
  • 23. Ask: What level of durability? ● Client ○ Batched VS Unbatched / Streaming ○ Acknowledged writes ● Server ○ Messages held in memory VS disk ○ Messages highly-available? ○ Recover from network partitions safely? ○ At-Most-Once VS At-Least-Once ■ Exactly-Once is something of a myth
  • 24. Ask: What about throughput? ● How many producing clients do you have ● How many messages per second will they submit ○ Does message size impact performance? ● What size should the cluster be? ○ Super cluster VS specialized clusters ● How many consumers it takes to keep pace ○ With room to grow
  • 25. Ask: What does failure mean? ● What does a message publishing error mean? ● What does a delay in the processing pipeline mean? ● What does a “lost” or failed message mean? ● What does a total failure of the messaging system mean?
  • 26. Ask: What behavior do I want? Is it… ● CA? ○ Not distributed, will be difficult to scale past 1 box ○ Traditional RDBMS systems are typically CA ● CP? ○ Good if you need strongly consistent data ○ Partitions can cause data unavailability ● AP? ○ Good if you need complete availability ○ Eventual consistency can often be “good enough”
  • 27. Okay, so I lied a little bit I’ll give you one recommendation
  • 28. Do you have... ● Relatively small (< 256kb) message sizes? ● Not so strict (~50ms) latency requirements? ● Throughput on the order of 100m or less per month? ● A tolerance or capability to handle the occasional duplicate message? ● No concern around being locked into a vendor-specific technology?
  • 29. Go use SNS and SQS immediately Leave here now and just do it It’s easy, it’s cheap (at that scale), and you don’t need to maintain it
  • 30. Ok, so I picked “something” Anything else to know?
  • 31. You don’t have to choose just 1 ● It’s a falsehood that you need 1 perfect technology ○ Each has strengths, weaknesses, and ideal use cases ● Don’t be afraid to use something else ○ If you’re lucky, your app lives long enough to see many different infrastructure needs
  • 32. Avoid direct implementations ● Wrap the notion of Publishing in an interface ○ Most technologies look surprisingly similar to publish ○ You can wrap this in a simple interface, and switch implementations as needed ● Consuming is usually unique per technology ○ Just write a new one ○ Trying to interface this part is probably more trouble than it’s worth ○ Play to the unique strengths of the technology
  • 33. Interfacing your Messaging choices ● Sending messages is often as simple as a name and a chunk of data ○ Define a simple interface for pushing arbitrary data towards a named endpoint ○ A name and a string of JSON is usually enough to get going ○ At Tapjoy, we use our Chore library to handle abstracting message publishing from messaging technologies ● Destinations are independent from messages ○ You could need to switch sending messages to a new technology ○ You could have 2 or more different systems depending on the information in a given message
  • 34. How do I change messages safely? ● Wrap messages in a simple envelope ○ Keep metadata about the message distinct from metadata about the event it describes ● Define schemas for message bodies ○ Schemas give you a catalogue of message definitions, and the ability to version them ○ At Tapjoy, we use our TOLL to build endpoint-agnostic clients based on schemas, and register them to use Chore publishers. ● Consumers need older schemas ○ Lets them reason about how to handle older messages ○ Keep a backlog of N older versions, drop support for > N
  • 36. Keep in mind ● Distributed Systems - all about tradeoffs ○ Never trade “P” ● Understand your needs ○ Latency, Throughput, Availability, Durability ● Understand how it fits into your architecture ● Interfaces are your friend ○ They can give you a lot of flexibility
  • 37. Keep in mind ● Use schemas and versioning to support changes to messages themselves ● Just pick something ○ Build a prototype, or two (or three) ○ Your second try will probably go better ○ SNS/SQS is a decent choice, if latency isn’t a concern ● Tapjoy is a great place to work on these kinds of problems at huge scale