SlideShare une entreprise Scribd logo
1  sur  58
Télécharger pour lire hors ligne
Designing for Scale
    Knut Nesheim @knutin
   Paolo Negri @hungryblank
About this talk

2 developers and erlang
           vs.
  1 million daily users
Social Games
Flash client (game)   HTTP API
Social Games
Flash client


               • Game actions need to be
                 persisted and validated

               • 1 API call every 2 secs
Social Games
                            HTTP API



• @ 1 000 000 daily users
• 5000 HTTP reqs/sec
• more than 90% writes
The hard nut


http://www.flickr.com/photos/mukluk/315409445/
Users we expect
                                        DAU

                       1000000



                        750000
  “Monster World”
       daily users
july - december 2010    500000



                        250000



                             0
                                 July         December
Users we have
                                  DAU



   New game
   daily users
march - june 2011



                    50
                     0
                         march april    may   june
What to do?


1 Simulate users
Simulating users

• Must not be too synthetic (like
  apachebench)
• Must look like a meaningful game session
• Users must come online at a given rate and
  play
Tsung


         •    Multi protocol (HTTP, XMPP) benchmarking tool

         •    Able to test non trivial call sequences

         •    Can actually simulate a scripted gaming session




http://tsung.erlang-projects.org/
Tsung - configuration
       Fixed content                    Dynamic parameter
       <request subst="true">
       <http url="http://server.wooga.com/users/%
       %ts_user_server:get_unique_id%%/resources/column/5/
       row/14?%%_routing_key%%"
       method="POST" contents='{"parameter1":"value1"}'>
       </http>
       </request>



http://tsung.erlang-projects.org/
Tsung - configuration
         • Not something you fancy writing
         • We’re in development, calls change and we
              constantly add new calls
         • A session might contain hundreds of
              requests
         • All the calls must refer to a consistent game
              state

http://tsung.erlang-projects.org/
Tsung - configuration
         • From our ruby test code
         user.resources(:column => 5, :row => 14)

         • Same as
          <request subst="true">
          <http url="http://server.wooga.com/users/%
          %ts_user_server:get_unique_id%%/resources/column/5/
          row/14?%%_routing_key%%"
          method="POST" contents='{"parameter1":"value1"}'>
          </http>
          </request>
http://tsung.erlang-projects.org/
Tsung - configuration

         • Session                    A session is a
          • requests                group of requests

         • Arrival phase            Sessions arrive in
          • duration                  phases with a
                                     specific arrival
          • arrival rate                  rate


http://tsung.erlang-projects.org/
Tsung - setup
            Application                         Benchmarking
                                                   cluster
             app server                             tsung
                                    HTTP reqs      worker
                                                        ssh
             app server
                                                    tsung
                                                   master

             app server

http://tsung.erlang-projects.org/
Tsung

         • Generates ~ 2500 reqs/sec on AWS
              m1.large
         • Flexible but hard to extend
         • Code base rather obscure

http://tsung.erlang-projects.org/
What to do?


2 Collect metrics
Tsung-metrics

         • Tsung collects measures and provides
              reports
         • But these measure include tsung network/
              cpu congestion itself
         • Tsung machines aren’t a good point of view

http://tsung.erlang-projects.org/
HAproxy
Application                         Benchmarking
                                       cluster
app server                              tsung
                        HTTP reqs      worker
              haproxy                       ssh
app server
                                        tsung
                                       master

app server
HAproxy

  “The Reliable, High Performance TCP/
  HTTP Load Balancer”
• Placed in front of http servers
• Load balancing
• Fail over
HAproxy - syslog


• Easy to setup
• Efficient (UDP)
• Provides 5 timings per each request
HAproxy
  • Time to receive request from client
Application                        Benchmarking
                                      cluster
app server                                 tsung
                   haproxy                worker
                                               ssh
app server
                                           tsung
                                          master
HAproxy
    • Time spent in HAproxy queue
Application                     Benchmarking
                                   cluster
app server                           tsung
                 haproxy            worker
                                         ssh
app server
                                     tsung
                                    master
HAproxy
    • Time to connect to the server
Application                       Benchmarking
                                     cluster
app server                             tsung
                  haproxy             worker
                                           ssh
app server
                                       tsung
                                      master
HAproxy
• Time to receive response headers from server
Application                        Benchmarking
                                      cluster
 app server                             tsung
                   haproxy             worker
                                            ssh
 app server
                                        tsung
                                       master
HAproxy
• Total session duration time
Application                     Benchmarking
                                   cluster
 app server                         tsung
                    haproxy        worker
                                        ssh
 app server
                                    tsung
                                   master
HAproxy - syslog

• Application urls identify directly server call
• Application urls are easy to parse
• Processing haproxy syslog gives per call
  metric
What to do?


3 Understand metrics
Reading/aggregating
       metrics

• Python to parse/normalize syslog
• R language to analyze/visualize data
• R language console to interactively explore
  benchmarking results
R is a free software environment for
 statistical computing and graphics.
What you get

• Aggregate performance levels (throughput,
  latency)
• Detailed performance per call type
• Statistical analysis (outliers, trends,
  regression, correlation, frequency, standard
  deviation)
What you get
What to do?


4 go deeper
Digging into the data

• From HAproxy log analisys one call
  emerged as exceptionally slow
• Using eprof we were able to determine
  that most of the time was spent in a redis
  query fetching many keys (MGET)
Tracing erldis query
• More than 60% of runtime is spent
  manipulating the socket
• gen_tcp:recv/2 is the culprit
• But why is it called so many times?
Understanding the
     redis protocol
C: LRANGE mylist 0 2
                       <<"*2rn
s: *2                     $5rn
s: $5                     Hellorn
                          $5rn
s: Hello                  Worldrn">>
s: $5
s: World
Understanding erldis
• recv_value/2 is used in the protocol parser
  to get the next data to parse
A different approach
• Two ways to use gen_tcp: active or passive
• In passive, use gen_tcp:recv to explicitly ask
  for data, blocking
• In active, gen_tcp will send the controlling
  process a message when there is data
• Hybrid: active once
A different approach

• Is active sockets faster?
• Proof-of-concept proved active socket
  faster
• Change erldis or write a new driver?
A different approach

• Radical change => new driver
• Keep Erldis queuing approach
• Think about error handling from the start
• Use active sockets
A different approach
• Active socket, parse partial replies
Circuit breaker
• eredis has a simple circuit breaker for when
  Redis is down/unreachable
• eredis returns immediately to clients if
  connection is down
• Reconnecting is done outside request/
  response handling
• Robust handling of errors
Benchmarking eredis

• Redis driver critical for our application
• Must perform well
• Must be stable
• How do we test this?
Basho bench

• Basho produces the Riak KV store
• Basho build a tool to test KV servers
• Basho bench
• We used Basho bench to test eredis
Basho bench
• Create callback module
Basho bench
• Configuration term-file
Basho bench output
eredis is open source




https://github.com/wooga/eredis
What to do?


5 measure internals
Measure internals

HAproxy point of view is valid but how to
measure internals of our application, while
we are live, without the overhead of
tracing?
Think Basho bench

• Basho bench can benchmark a redis driver
• Redis is very fast, 100K ops/sec
• Basho bench overhead is acceptable
• The code is very simple
Cherry pick ideas from
    Basho Bench
• Creates a histogram of timings on the fly,
  reducing the number of data points
• Dumps to disk every N seconds
• Allows statistical tools to work on already
  aggregated data
• Near real-time, from event to stats in N+5
  seconds
Homegrown stats
• Measures latency from the edges of our
  system (excludes HTTP handling)
• And at interesting points inside the system
• Statistical analysis using R
• Correlate with HAproxy data
• Produces graphs and data specific to our
  application
Homegrown stats
Recap

  Measure:
• From an external point of view (HAproxy)
• At the edge of the system (excluding
  HTTP handling)
• Internals in the single process (eprof)
Recap
  Analyze:
• Aggregated measures
• Statistical properties of measures
 • standard deviation
 • distribution
 • trends
Thanks!

http://www.wooga.com/jobs

knut.nesheim@wooga.com       @knutin
paolo.negri@wooga.com    @hungryblank

Contenu connexe

Tendances

Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormEugene Dvorkin
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm毅 吕
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at ParseTravis Redman
 
Message:Passing - lpw 2012
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012Tomas Doran
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesLightbend
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldKonrad Malawski
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Chris Fregly
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
Python Raster Function - Esri Developer Conference - 2015
Python Raster Function - Esri Developer Conference - 2015Python Raster Function - Esri Developer Conference - 2015
Python Raster Function - Esri Developer Conference - 2015akferoz07
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)Tibo Beijen
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Andrei KUCHARAVY
 
spray: REST on Akka (Scala Days)
spray: REST on Akka (Scala Days)spray: REST on Akka (Scala Days)
spray: REST on Akka (Scala Days)sirthias
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Webinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellCodemotion
 

Tendances (17)

Storm
StormStorm
Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Message:Passing - lpw 2012
Message:Passing - lpw 2012Message:Passing - lpw 2012
Message:Passing - lpw 2012
 
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous ArchitecturesUnderstanding Akka Streams, Back Pressure, and Asynchronous Architectures
Understanding Akka Streams, Back Pressure, and Asynchronous Architectures
 
Akka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming WorldAkka-chan's Survival Guide for the Streaming World
Akka-chan's Survival Guide for the Streaming World
 
Apache Storm
Apache StormApache Storm
Apache Storm
 
Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016Atlanta Hadoop Users Meetup 09 21 2016
Atlanta Hadoop Users Meetup 09 21 2016
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Python Raster Function - Esri Developer Conference - 2015
Python Raster Function - Esri Developer Conference - 2015Python Raster Function - Esri Developer Conference - 2015
Python Raster Function - Esri Developer Conference - 2015
 
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)Kubernetes at NU.nl   (Kubernetes meetup 2019-09-05)
Kubernetes at NU.nl (Kubernetes meetup 2019-09-05)
 
Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1Introduction to the intermediate Python - v1.1
Introduction to the intermediate Python - v1.1
 
spray: REST on Akka (Scala Days)
spray: REST on Akka (Scala Days)spray: REST on Akka (Scala Days)
spray: REST on Akka (Scala Days)
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Webinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna MitchellWebinar: Queues with RabbitMQ - Lorna Mitchell
Webinar: Queues with RabbitMQ - Lorna Mitchell
 

Similaire à Erlang factory 2011 london

Distributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromqDistributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromqRuben Tan
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormJungtaek Lim
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDataWorks Summit
 
How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)Salvatore Orlando
 
How to write a Neutron plugin (stadium edition)
How to write a Neutron plugin (stadium edition)How to write a Neutron plugin (stadium edition)
How to write a Neutron plugin (stadium edition)salv_orlando
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Lohika_Odessa_TechTalks
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debuggingAndrey Kolodnitsky
 
Push jobs: an orchestration building block for private Chef
Push jobs: an orchestration building block for private ChefPush jobs: an orchestration building block for private Chef
Push jobs: an orchestration building block for private ChefChef Software, Inc.
 

Similaire à Erlang factory 2011 london (20)

Distributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromqDistributed app development with nodejs and zeromq
Distributed app development with nodejs and zeromq
 
Scalable Web Apps
Scalable Web AppsScalable Web Apps
Scalable Web Apps
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
Api crash
Api crashApi crash
Api crash
 
REST APIs
REST APIsREST APIs
REST APIs
 
Introduction to Apache NiFi And Storm
Introduction to Apache NiFi And StormIntroduction to Apache NiFi And Storm
Introduction to Apache NiFi And Storm
 
Deploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analyticsDeploying Apache Flume to enable low-latency analytics
Deploying Apache Flume to enable low-latency analytics
 
How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)How to build a Neutron Plugin (stadium edition)
How to build a Neutron Plugin (stadium edition)
 
How to write a Neutron plugin (stadium edition)
How to write a Neutron plugin (stadium edition)How to write a Neutron plugin (stadium edition)
How to write a Neutron plugin (stadium edition)
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
slides (PPT)
slides (PPT)slides (PPT)
slides (PPT)
 
webservers
webserverswebservers
webservers
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debugging
 
Push jobs: an orchestration building block for private Chef
Push jobs: an orchestration building block for private ChefPush jobs: an orchestration building block for private Chef
Push jobs: an orchestration building block for private Chef
 

Plus de Paolo Negri

Turning the web stack upside down rethinking how data flows through systems
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systemsPaolo Negri
 
AWS Lambda in infrastructure
AWS Lambda in infrastructureAWS Lambda in infrastructure
AWS Lambda in infrastructurePaolo Negri
 
Erlang introduction geek2geek Berlin
Erlang introduction geek2geek BerlinErlang introduction geek2geek Berlin
Erlang introduction geek2geek BerlinPaolo Negri
 
Getting real with erlang
Getting real with erlangGetting real with erlang
Getting real with erlangPaolo Negri
 
Scaling Social Games
Scaling Social GamesScaling Social Games
Scaling Social GamesPaolo Negri
 
Mongrel2, a short introduction
Mongrel2, a short introductionMongrel2, a short introduction
Mongrel2, a short introductionPaolo Negri
 
RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009Paolo Negri
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Paolo Negri
 
SimpleDb, an introduction
SimpleDb, an introductionSimpleDb, an introduction
SimpleDb, an introductionPaolo Negri
 
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim CrontabsPaolo Negri
 

Plus de Paolo Negri (10)

Turning the web stack upside down rethinking how data flows through systems
Turning the web stack upside down  rethinking how data flows through systemsTurning the web stack upside down  rethinking how data flows through systems
Turning the web stack upside down rethinking how data flows through systems
 
AWS Lambda in infrastructure
AWS Lambda in infrastructureAWS Lambda in infrastructure
AWS Lambda in infrastructure
 
Erlang introduction geek2geek Berlin
Erlang introduction geek2geek BerlinErlang introduction geek2geek Berlin
Erlang introduction geek2geek Berlin
 
Getting real with erlang
Getting real with erlangGetting real with erlang
Getting real with erlang
 
Scaling Social Games
Scaling Social GamesScaling Social Games
Scaling Social Games
 
Mongrel2, a short introduction
Mongrel2, a short introductionMongrel2, a short introduction
Mongrel2, a short introduction
 
RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009RabbitMQ with python and ruby RuPy 2009
RabbitMQ with python and ruby RuPy 2009
 
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
Distributed and concurrent programming with RabbitMQ and EventMachine Rails U...
 
SimpleDb, an introduction
SimpleDb, an introductionSimpleDb, an introduction
SimpleDb, an introduction
 
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
%w(map reduce).first - A Tale About Rabbits, Latency, and Slim Crontabs
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 

Erlang factory 2011 london

  • 1. Designing for Scale Knut Nesheim @knutin Paolo Negri @hungryblank
  • 2. About this talk 2 developers and erlang vs. 1 million daily users
  • 3. Social Games Flash client (game) HTTP API
  • 4. Social Games Flash client • Game actions need to be persisted and validated • 1 API call every 2 secs
  • 5. Social Games HTTP API • @ 1 000 000 daily users • 5000 HTTP reqs/sec • more than 90% writes
  • 7. Users we expect DAU 1000000 750000 “Monster World” daily users july - december 2010 500000 250000 0 July December
  • 8. Users we have DAU New game daily users march - june 2011 50 0 march april may june
  • 9. What to do? 1 Simulate users
  • 10. Simulating users • Must not be too synthetic (like apachebench) • Must look like a meaningful game session • Users must come online at a given rate and play
  • 11. Tsung • Multi protocol (HTTP, XMPP) benchmarking tool • Able to test non trivial call sequences • Can actually simulate a scripted gaming session http://tsung.erlang-projects.org/
  • 12. Tsung - configuration Fixed content Dynamic parameter <request subst="true"> <http url="http://server.wooga.com/users/% %ts_user_server:get_unique_id%%/resources/column/5/ row/14?%%_routing_key%%" method="POST" contents='{"parameter1":"value1"}'> </http> </request> http://tsung.erlang-projects.org/
  • 13. Tsung - configuration • Not something you fancy writing • We’re in development, calls change and we constantly add new calls • A session might contain hundreds of requests • All the calls must refer to a consistent game state http://tsung.erlang-projects.org/
  • 14. Tsung - configuration • From our ruby test code user.resources(:column => 5, :row => 14) • Same as <request subst="true"> <http url="http://server.wooga.com/users/% %ts_user_server:get_unique_id%%/resources/column/5/ row/14?%%_routing_key%%" method="POST" contents='{"parameter1":"value1"}'> </http> </request> http://tsung.erlang-projects.org/
  • 15. Tsung - configuration • Session A session is a • requests group of requests • Arrival phase Sessions arrive in • duration phases with a specific arrival • arrival rate rate http://tsung.erlang-projects.org/
  • 16. Tsung - setup Application Benchmarking cluster app server tsung HTTP reqs worker ssh app server tsung master app server http://tsung.erlang-projects.org/
  • 17. Tsung • Generates ~ 2500 reqs/sec on AWS m1.large • Flexible but hard to extend • Code base rather obscure http://tsung.erlang-projects.org/
  • 18. What to do? 2 Collect metrics
  • 19. Tsung-metrics • Tsung collects measures and provides reports • But these measure include tsung network/ cpu congestion itself • Tsung machines aren’t a good point of view http://tsung.erlang-projects.org/
  • 20. HAproxy Application Benchmarking cluster app server tsung HTTP reqs worker haproxy ssh app server tsung master app server
  • 21. HAproxy “The Reliable, High Performance TCP/ HTTP Load Balancer” • Placed in front of http servers • Load balancing • Fail over
  • 22. HAproxy - syslog • Easy to setup • Efficient (UDP) • Provides 5 timings per each request
  • 23. HAproxy • Time to receive request from client Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 24. HAproxy • Time spent in HAproxy queue Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 25. HAproxy • Time to connect to the server Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 26. HAproxy • Time to receive response headers from server Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 27. HAproxy • Total session duration time Application Benchmarking cluster app server tsung haproxy worker ssh app server tsung master
  • 28. HAproxy - syslog • Application urls identify directly server call • Application urls are easy to parse • Processing haproxy syslog gives per call metric
  • 29. What to do? 3 Understand metrics
  • 30. Reading/aggregating metrics • Python to parse/normalize syslog • R language to analyze/visualize data • R language console to interactively explore benchmarking results
  • 31. R is a free software environment for statistical computing and graphics.
  • 32. What you get • Aggregate performance levels (throughput, latency) • Detailed performance per call type • Statistical analysis (outliers, trends, regression, correlation, frequency, standard deviation)
  • 34. What to do? 4 go deeper
  • 35. Digging into the data • From HAproxy log analisys one call emerged as exceptionally slow • Using eprof we were able to determine that most of the time was spent in a redis query fetching many keys (MGET)
  • 36. Tracing erldis query • More than 60% of runtime is spent manipulating the socket • gen_tcp:recv/2 is the culprit • But why is it called so many times?
  • 37. Understanding the redis protocol C: LRANGE mylist 0 2 <<"*2rn s: *2 $5rn s: $5 Hellorn $5rn s: Hello Worldrn">> s: $5 s: World
  • 38. Understanding erldis • recv_value/2 is used in the protocol parser to get the next data to parse
  • 39. A different approach • Two ways to use gen_tcp: active or passive • In passive, use gen_tcp:recv to explicitly ask for data, blocking • In active, gen_tcp will send the controlling process a message when there is data • Hybrid: active once
  • 40. A different approach • Is active sockets faster? • Proof-of-concept proved active socket faster • Change erldis or write a new driver?
  • 41. A different approach • Radical change => new driver • Keep Erldis queuing approach • Think about error handling from the start • Use active sockets
  • 42. A different approach • Active socket, parse partial replies
  • 43. Circuit breaker • eredis has a simple circuit breaker for when Redis is down/unreachable • eredis returns immediately to clients if connection is down • Reconnecting is done outside request/ response handling • Robust handling of errors
  • 44. Benchmarking eredis • Redis driver critical for our application • Must perform well • Must be stable • How do we test this?
  • 45. Basho bench • Basho produces the Riak KV store • Basho build a tool to test KV servers • Basho bench • We used Basho bench to test eredis
  • 46. Basho bench • Create callback module
  • 49. eredis is open source https://github.com/wooga/eredis
  • 50. What to do? 5 measure internals
  • 51. Measure internals HAproxy point of view is valid but how to measure internals of our application, while we are live, without the overhead of tracing?
  • 52. Think Basho bench • Basho bench can benchmark a redis driver • Redis is very fast, 100K ops/sec • Basho bench overhead is acceptable • The code is very simple
  • 53. Cherry pick ideas from Basho Bench • Creates a histogram of timings on the fly, reducing the number of data points • Dumps to disk every N seconds • Allows statistical tools to work on already aggregated data • Near real-time, from event to stats in N+5 seconds
  • 54. Homegrown stats • Measures latency from the edges of our system (excludes HTTP handling) • And at interesting points inside the system • Statistical analysis using R • Correlate with HAproxy data • Produces graphs and data specific to our application
  • 56. Recap Measure: • From an external point of view (HAproxy) • At the edge of the system (excluding HTTP handling) • Internals in the single process (eprof)
  • 57. Recap Analyze: • Aggregated measures • Statistical properties of measures • standard deviation • distribution • trends
  • 58. Thanks! http://www.wooga.com/jobs knut.nesheim@wooga.com @knutin paolo.negri@wooga.com @hungryblank