1. About us:
Name: Tarjei Romtveit
Born: 1983-02-23
Current Title: Data Management Director and Software
Developer at Integrasco A/S
Title period: June 2006 – Present
Education: M.Sc., ICT, System Development, University of Agder
(2010)
Specialties:
Java,Lean Production, RabbitMQ , Spring Framework, Web
Services, Hibernate, Maven, Python, MySQL, Scala, XQuery, XPath,
Linux (Rights,Scripting,Security,Apache and MySQL)
1
2. About us:
Name: Enok K. Eskeland
Born: 1986-06-18
Current Title: Software Developer at Integrasco A/S
Title period: June 2011 – Present
Education: B.Sc., ICT, System Development, University of Agder
(2011)
Specialties:
Java, Maven, Python, MySQL, XQuery, XPath
2
3. Scaling together with social
media: RabbitMQ
A scalability story
Tarjei Romtveit & Enok Eskeland
13. What was wrong…
Storage Cloud
Storage Storage Storage Storage Storage Storage
Agent Agent Agent Agent Agent Agent
Storage Storage Storage Storage Storage Storage
Service Service Service Service Service Service
Buffer
Buffer Buffer Buffer Buffer Buffer
/
/ / / / /
Stage
Stage Stage Stage Stage Stage
CRM Forum Blogs YouTub Twitter Facebo
e ok
14. • Start patching the old
solution
• Build from scratch
• Start looking for
external solutions
15. …. so what should we look for?
A
Storage Agent –
Each pipeline - queue SM – producer tweet/post/discussion
consumer
– a message
Facebook
Storage Agent
?
23. Additional features
Language support:
• Java Spring client C#
erlang
java
php
• Lots of clients Python
ruby
Supported platforms: Perl
Solaris C++
BSD List
Linux Haskell
MacOSX
TRU64
Windows NT/2000/XP/Vista/Windows 7
Windows Server 2003/2008
Windows 95, 98
VxWorks
24. … we made our client from scratch
• Configuration
• Failover
• Publisher Confirms
31. DEMO 2 : How we first started out
• git://github.com/esk/rabbitmq-example-
clients.git
32. Extra features: Clustering
• Easy to setup
– rabbitmqctl cluster rabbit@rabbit1 rabbit@rabbit2
• DISC node OR RAM node
• Replicates the queues and messages
• NB! No sync protocol
• Enables mirrored queues
33. DEMO 3: Clustering and mirrored
queues
• http://www.rabbitmq.com/clustering.html
34. Extra features: Publisher confirms
• Solution for guaranteed consumer – broker
delivery
• Non AMQP
• Asynchronous – faster than transactional
• Not supported in Spring client
• Requires extra handling in the client
36. Additional Components:
• SMS and e-mail alert process
– Management REST API
– Surveillance of incoming/outgoing
• Central distribution of configuration
– KISS: HTTP
– Considering to use ZooKeeper
37. Main experience
• Do not trust persistence/durability entirely
• There are no sync protocol in clusters
• Minimize the broker interaction in client
• Failover and connection pooling is hard
• Use the mailing list
38. So what did we accomplish
• Stabilize and scale the staging component
• Enabling us to focus on core processes
• 50 % less maintenance
Notes de l'éditeur
Built as story Please comment and ask during the presentationIntegrasco a social media analytics firm that specializes in mobile telecoStore and index data in a social media search system22.6796185 kg
Human kind have an tendency to jump on the most popular wagonResturantsQueuesStocksSheep mentality,
If you recall this autumn: Steve Jobse: 5. oktIphone 4s: October 4, 2011 / October 14, 2011Gadaffi: 20 October 2011
It never rain, but it poursBlack Berry Outtakes : Monday 10th October - Monday 17th October RIM acknowledges ongoing email and messaging problem for customers in Europe, Middle East and Africa5-21… 30 million tweets = Normal 12 million
Legacy Javasystem built and rebuilt continously over many yearsMySQLHibernateSOAP (CXF)
Supported by a cutting edge HW architectureHW from HP
Twitter US heavy. Most active : 23:00 CET -> Can vary from 100 – 25 000 m/mRelated to fluctuations in traffic30 – 100 million
Simple and good timer utilsHave a logWarn the management
- Dependency between components was too high- Unpredictable memory/CPU consumption- Clients struggled with SOAP Difficult to scale up jetty, tomcat etcDifficult to restart the components
To me it looks like an message queuesolution
Do not repeat yourself -> Do not repeat what others do betterKafka – LinkedinKestrel – Twitter, Facebook (Scala)Digg - RabbitMQPintrest - RabbitMQ
- Erlang- ~ 15 000 – 20 000 lines of code
WebSocket, is under developmentJavaScript for running on node.jsNot mainframes
- Why we selected the
AMQP 0-9-1 is a programmable protocol in the sense that AMQP entities and routing schemes are defined by applications themselves, not a broker administrator‘A low-level interface. It typically refers to programming interfaces (APIs) in a network directly above the physical layer that are used strictly for transport or interconnection. It often refers to protocols that invoke functions such as CORBA, DCOM, RMI and SOAP. It may also refer to database and other such interfaces.
- Exchanges: Direct exchange(Empty string) and amq.direct DEFAULT Fanout exchange amq.fanout Topic exchangeamq.topic Headers exchange amq.match (and amq.headers in RabbitMQ)Durability (exchanges survive broker restart)Auto-delete (exchange is deleted when all queues have finished using it)Additional arguments that is broker dependent
Publish/Subscribe- Massively multi-player online (MMO) games can use it for leaderboard updates or other global events- Sport news sites can use fanout exchanges for distributing score updates to mobile clients in near real-time- Distributed systems can broadcast various state and configuration updates- Group chats can distribute messages between participants using a fanout exchange (although AMQP does not have a built-in concept of presence, so XMPP may be a better choice)Topic - Background task processing done by multiple workers, each capable of handling specific set of tasks- Stocks price updates (and updates on other kinds of financial data)- News updates that involve categorization or tagging (for example, only for a particular sport or team)- Orchestration of services of different kinds in the cloudDistributed architecture/OS-specific software builds or packaging where each builder can handle only one architecture or OSDistributing data relevant to specific geographic location, for example, points of salePublish/Subscribe - All receive
Messages should not be duplicated Producers: Diverse Crawlers Input Streams Content AgnosticConsumers: Parallel processes - Parses and organize the content. - Same codebase
Redelivery of failed tasksMessage durabilityQueue durabilityMessage durabilityFair dispatch- PrefetchCount = X