Eric will be presenting on SimpleReach's use of message architectures and why they an important part of a distributed system stack. They are often overlooked because the prevailing sentiment is that the storage and processing engines are the most important aspects of the system. Without the highways, the data won’t be able to get to its destination.
2. Message Architectures in Distributed Systems Eric Lubow @elubow
Overview
• SimpleReach
• Why is messaging important
• Goals
• Explanations
• Questions
3. Message Architectures in Distributed Systems Eric Lubow @elubow
Personal Vanity
• CTO of SimpleReach
• Co-author of Practical Cassandra
• Skydiver, Mixed Martial Artist,
Motorcyclist, Dog dad, NY Giants fan
• IronMatt Foundation for Pediatric Brian
Tumors (ironmatt.org)
6. Message Architectures in Distributed Systems Eric Lubow @elubow
• Millions of URLs per day
• Over 3.75 billion page views per month
• 7b events per day (~80k events/second)
• Auto-scale 175-190 machines depending on traffic
• Built a predictive measurement algorithm for the social web
SimpleReach
7. Message Architectures in Distributed Systems Eric Lubow @elubow
Why is Messaging Important?
• Most large scale systems discussions only talk about storage
• Direct high volumes of data around your infrastructure
• Control flow of data through your infrastructure
• Decouple important systems
• Scalability, Elasticity, Deliverability, and Redundancy
• Buffering and Asynchronous communication
8. Message Architectures in Distributed Systems Eric Lubow @elubow
The database is NOT a transport layer
10. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow Patterns
• Enrichment/In-stream Modification Schemes
• Monitoring and Instrumentation
11. Message Architectures in Distributed Systems Eric Lubow @elubow
Messaging Systems
• RabbitMQ
• ZeroMQ
• Kafka
• Amazon SQS
• NSQ
• ActiveMQ
• Resque
• Custom
12. Message Architectures in Distributed Systems Eric Lubow @elubow
What Did SimpleReach Choose?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
13. Message Architectures in Distributed Systems Eric Lubow @elubow
NSQ
• Distributed and de-centralized topology
• At least once delivery guaranteed
• Multicast style message routing
• Simple to configure and deploy
• Allow for maintenance windows with no downtime
• Ephemeral channels for testing
• Channel sampling
github.com/bitly/nsq
14. Message Architectures in Distributed Systems Eric Lubow @elubow
separate hosts
• a topic is a distinct stream of messages
(a single nsqd instance can have multiple
topics)
• a channel is an independent queue for a
topic (a topic can have multiple
channels)
• consumers discover producers by
querying nsqlookupd (a discovery
service for topics)
• topics and channels are created at
runtime (just start publishing/
subscribing)
nsqd
“metrics”
Channels
“event”
Topics
“enrichment”
“writer”
Consumers
AAABBB
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
Topics and Channels
15. Message Architectures in Distributed Systems Eric Lubow @elubow
Everyone Speaks The Same Language
http:// + {“content-type”: “application/json”}
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
16. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
17. Message Architectures in Distributed Systems Eric Lubow @elubow
• nsqadmin provides a web interface to
administrate and introspect an NSQ cluster at
runtime (and empty, pause, or delete topics/
channels)
• nsq_to_http - utility that helps transport an
aggregate stream over HTTP
• nsq_to_file - utility that safely persists an
aggregated stream to disk
• nsq_stat - iostat like utility for a topic/channel
• nsq_tail - tail like utility for a topic/channel
NSQ Tools
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
19. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
20. Message Architectures in Distributed Systems Eric Lubow @elubow
NSQ
NSQD
API
consumer
NSQ
NSQD
API
NSQ
NSQD
API
consumer
nsqlookupd
nsqlookupd
PUBLISH
REGISTER
DISCOVER
SUBSCRIBE
How Does It Work?
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
22. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
23. Message Architectures in Distributed Systems Eric Lubow @elubow
Simple Deployment & Automation
• Chef cookbook - github.com/simplereach/chef-nsq
• Written in Go
• Easily distributable binaries
• Deploy lookup nodes
• Nsqd’s installed locally
24. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
25. Message Architectures in Distributed Systems Eric Lubow @elubow
nsqlookupd nsqlookupd
consumer
➊ regularly poll for topic producers
➋ connect to all producers
HTTP requests
Runtime Discovery
Message Architectures in Distributed Systems Eric Lubow @elubow #ddtx14
26. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
27. Message Architectures in Distributed Systems Eric Lubow @elubow
Path of a Packet
Internet
EC
InternalAPI
Solr
C*
Mongo
Redis
Vertica
API
Fire
Hose
SC
Consumers
Queue
29. Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
Collector
Social Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
30. Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
Collector
Social Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
31. Message Architectures in Distributed Systems Eric Lubow @elubow
Broadcast Importance for Polyglottany
Aggregator
Mongo Writer
Broadcast
Redis Writer
Cassandra Writer
Solr Writer
Calculator
NSQ
Vertica Writer
33. Message Architectures in Distributed Systems Eric Lubow @elubow
Controlled Data Flow
Social Event
Collector
Social Data
Batch & Write
Processed Data
Batch & Write
Raw Data
Calculate Score Write
NSQ Broadcast NSQ
34. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
35. Message Architectures in Distributed Systems Eric Lubow @elubow
What Is Enrichment?
A mechanism to add
value to a message to
enhance processing in
your system
36. Message Architectures in Distributed Systems Eric Lubow @elubow
How Do We Enrich
Raw Event
Enriched
Event
Consumer A
Consumer B
Consumer C
NSQ Broadcast
37. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
38. Message Architectures in Distributed Systems Eric Lubow @elubow
Monitoring / Instrumentation
• Comes with statsd support built-in
• Statsd talks to both Graphite and nsqadmin
• Nsqadmin comes with graphs for message processing stats
• Nagios plugins available for monitoring topic/channel depth
• Average end to end latency calculations are done on a per-channel basis
39. Message Architectures in Distributed Systems Eric Lubow @elubow
Goals
• Consistent interfaces between systems
• Allow access to many toolsets
• Minimize downtime/Minimize cost of downtime
• High availability
• Clients should have minimal architecture knowledge
• Horizontal Scaling
• Controlled Data Flow
• Enrichment
• Monitoring and Instrumentation
40. Message Architectures in Distributed Systems Eric Lubow @elubow
Summary
• Large Systems are more than just storage
• Abstraction
• Highly Available
• Controlled Data Flow Patterns
• Monitoring & Automation
42. Message Architectures in Distributed Systems Eric Lubow @elubow
Questions are guaranteed in life.
Answers aren’t.
Eric Lubow
@elubow
elubow@simplereach.com
Cassandra Day, New York
Thank you.