SlideShare une entreprise Scribd logo
1  sur  47
Flipkart Website Architecture

      Mistakes & Learnings

          Siddhartha Reddy
          Architect, Flipkart
June 2007
November 2007
December 2012
www.flipkart.com
• Started in 2007
• Current Architecture from mid 2010
• Evolution of the architecture presented as…

       Issue[1]             RCA[2]   Actions   Learnings




•   *1+ Issue: Website is “slow”
•   [2] RCA = Root Cause Analysis
Surviving & reacting to the environment

INFANCY (2007 – MID-2010)
Website is “slow”!
RCA
• Why?
  – MySQL queries taking too long
• Why?
  – Too many queries
  – Many slow queries
  – Queries locking tables
• Why?
  – Capacity
• Hmm…
Fixing it
• Get beefier servers (the obvious)
• Separate master_db, slave_db
  – Writes go to master_db
  – Reads from slave_db
  – Critical reads from master_db
                              Writes                 Reads
   Reads           Writes

           MySQL              MySQL                  MySQL
                                       Replication   Slave
                              Master
Learning from it
• Scale-out databases reads by distributing load
  across systems
• Isolate database writes from reads
  – Writes are (usually) more critical
Website is “slow”!
    (Again)
RCA
• Why?
  – MySQL queries taking too long (on slave_db)
• Why?
  – Too many queries
  – Many slow queries
• Why?
  – Queries from analytics / reporting and other
    backend jobs
• Urm…
Fixing it
• Analytics / reporting DB (archival_db)
    – Use MyISAM — optimized for reads
    – Additional indexes for quicker reporting
                                           Website                  Website
                                           Writes                    Reads
Website                 Website
Writes                   Reads

                                           MySQL                    MySQL
                                                      Replication   Slave 1
                                           Master
MySQL                   MySQL
          Replication   Slave
Master                                          Replication

                        Analytics           MySQL                   Analytics
                         Reads              Slave 2                  Reads
Learning from it
• Isolate the databases being used for serving
  website traffic from those being used for
  analytical/reporting
• Isolate systems being used by production
  website from those being used for background
  processing
Learning the basics

BABY (2010 – 2011)
Website is “slow”!
RCA
• Why?
• How?
  – Instrumentation
RCA - 1
• Why?
     – Logging a lot
     – PHP processes blocking on writing logs
               Request2
              -> Process2




                                                                                      Writing
                                          Waiting




                                                                Waiting
Request1                    Request3                Request2              Request2              Request3
-> Process1                 -> Process3             :Process1             :Process2             :Process3

              Log file
RCA - 2
• Why?
  – Service Oriented Architecture (SOA)
  – Too many calls to remote services per request
     • Creating fresh connection for each call
     • All the calls are made in serial order


                     Connect to   Request    Connect    Request      Send
   Receive request
                      Service1    Service1   Service2   Service2   response
RCA - 3
• Why?
  – Configurability
  – Fetch a lot of “config” from database for serving
    each request
     Receive    Fetch     Fetch     Fetch     Fetch      Send
     request   Config1   Config2   Config3   Config4   response
RCA – 1,2,3
• Why?
  – Logging a lot
  – SOA
  – Configurability
• Why?
  – PHP’s process model
• Argh!
Fixing it
• fk-w3-agent
  – Simple Java “middleware” daemon
  – Deployed on each web server
  – PHP communicates to it through local socket
  – Hosts pluggable “handlers”
fk-w3-agent: LoggingHandler

               Request2                                 Request2
              -> Process2                               -> Process2
Request1                    Request3      Request1                     Request3
-> Process1                 -> Process3   -> Process1                 -> Process3


                                                         fk-w3-
              Log file                                    agent

                                                                 Async / buffered




                                                        Log file
fk-w3-agent: ServiceHandler(s)
                  Connect to     Request           Connect         Request       Send
Receive request
                   Service1      Service1          Service2        Service2    response




                                            Call
         Receive request                                             Send response
                                      fk-w3-agent


                                        fk-w3-
                                        agent

                      Service1                                Service2
fk-w3-agent: ConfigHandler
Receive      Fetch     Fetch        Fetch          Fetch      Send
request     Config1   Config2      Config3        Config4   response




                             Database

                       Fetch all config from
    Receive request                                Send response
                           fk-w3-agent

                           fk-w3-
                            agent
                                 Poll and cache



                          Database
Learning from it
• PHP — good for frontend and templating
  – Gives a lot of agility
  – Limiting process model
     • Hurdle for high performance
• Java — stability and performance
• Horses for courses
Website is “slow”!
    (Again)
RCA
• Why?
  – PHP processes taking up too much time
  – PHP processes taking up too much CPU
• Why?
  – Product info deserialization taking up time/CPU
  – View construction taking up time/CPU
Fixing it
• Caching!
• Cache fully constructed pages
  – For a few minutes
  – Only for highly trafficked pages (Homepage)
• Cache PHP serialized Product objects
  – ~20 million objects
  – Memcache
• Yeah! But…
  – Add caching => add complexity
Caching: Complications (1)
• “Caching fully constructed pages”
• But parts of pages still need to be dynamic
     • Example: Logged-in user’s name
• Impossible to do effective bucket testing
     • Or at least makes it prohibitively complex
Caching: Complications (2)
• “Caching PHP serialized Product objects”
• Without caching:
              getProductInfo()            Fetch from CMS

• With caching, cache hit:
              getProductInfo()           Fetch from Cache

• With caching, cache miss:
                         Fetch from   Fetch from
      getProductInfo()                             Set in Cache
                           Cache         CMS
Caching: Complications (3)
• TTL: ∞ (i.e. no invalidation)
• Pro-actively repopulate products in the cache
  – Receive “notifications” about product updates
     • Notification Server — pushes notifications raised by
       CMS
• Use a persistent, distributed cache
  – Memcache => Membase, Couchbase
Learning from it
• Caching is a powerful tool for performance
  optimization
• Caching adds complexities
  – Reduced by keeping cache close to data source
  – Think deeply about TTL, invalidation
• Use caching to go from “acceptable
  performance” to “awesome performance”
  – Don’t rely on it to get to “acceptable
    performance”
Growing up

KID (2012)
Website is “slow”!
RCA
• Why?
  – Search-service is slow (or Reviews-service is slow
    or Recommendations-service is slow)
• But why is rest of website slow?
  – Requests to the slow service are blocking
    processing threads
• Eh?!
Let’s do some math
• Let’s say
   – Mean (or median) response time: 100 ms
   – 8-core server
   – All requests are CPU bound
• Throughput: 80 requests per second (rps)
• Let’s also say
   – 95th Percentile response time: 1000 ms
       • Call them “bad requests”
• 4 bad requests in a second
   – Throughput down to 44 rps
• 8 bad requests in a second?
   – Throughput down to 8 rps
Fixing it
• Aggressive timeouts for all service calls
  – Isolate impact of a slow service
     • only to pages that depend on it
• Very aggressive timeouts for non-critical
  services
  – Example: Recommendations
     • On a Product page, Search results page etc.
     • Not on My Recommendations page
• Load non-critical parts of pages through AJAX
Learning from it
• Isolate the impact of a poorly performing
  services / systems
• Isolate the required from the good-to-have
Website is “slow”!
    (Again)
RCA
• Why?
  – Load average of web servers has spiked
• Why?
  – Requests per second has spiked
     • From 1000 rps to 1500 rps
• Why?
  – Large number of notifications of product
    information updates
Fixing it
• Separate cluster for receiving product info
  update notifications from the cluster that
  serves users
• Admission control: Don’t let a system receive
  more requests than it can handle
  – Throttling
• Batch the notifications
Learning from it
• Isolate the systems serving internal requests
  from those serving production traffic
• Admission control to ensure that a system is
  isolated from the over-enthusiasm of a client
• Look at the granularity at which we’re working
Increasing complexity

TEENAGER
THANK YOU
Mistake?
• Sub-optimal decision
  – Not all information/scenarios considered
  – Insufficient information
  – Built for a different scenario
• Due to focus on “functional” aspects
• A mistake is a mistake
  – … in retrospect

Contenu connexe

Tendances

Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache PinotAltinity Ltd
 
GraphQL vs REST
GraphQL vs RESTGraphQL vs REST
GraphQL vs RESTGreeceJS
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan confluent
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningGuido Schmutz
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataPavel Hardak
 
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례Hyoungjun Kim
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryKai Wähner
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaKai Wähner
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Spark Summit
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

Tendances (20)

Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with  Apache Pulsar and Apache PinotBuilding a Real-Time Analytics Application with  Apache Pulsar and Apache Pinot
Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot
 
GraphQL vs REST
GraphQL vs RESTGraphQL vs REST
GraphQL vs REST
 
Rest in flask
Rest in flaskRest in flask
Rest in flask
 
Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan Stream Processing with Kafka in Uber, Danny Yuan
Stream Processing with Kafka in Uber, Danny Yuan
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Lightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional dataLightning-fast Analytics for Workday transactional data
Lightning-fast Analytics for Workday transactional data
 
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례Presto, Zeppelin을 이용한 초간단 BI 구축 사례
Presto, Zeppelin을 이용한 초간단 BI 구축 사례
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel IndustryApache Kafka in the Airline, Aviation and Travel Industry
Apache Kafka in the Airline, Aviation and Travel Industry
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache KafkaBest Practices for Streaming IoT Data with MQTT and Apache Kafka
Best Practices for Streaming IoT Data with MQTT and Apache Kafka
 
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)Integrating Spark and Solr-(Timothy Potter, Lucidworks)
Integrating Spark and Solr-(Timothy Potter, Lucidworks)
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Building Netty Servers
Building Netty ServersBuilding Netty Servers
Building Netty Servers
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 

En vedette

Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web appsDirecti Group
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...slashn
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White BreadGaurav Lochan
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkartPankaj Kaushal
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M usersJongyoon Choi
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web Appscothis
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjeeslashn
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...slashn
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...slashn
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...slashn
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketingslashn
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBbackslash451
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singhslashn
 
Soa design pattern
Soa design patternSoa design pattern
Soa design patternLap Doan
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMMilan49
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMtigerjayadev
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...Robert Mederer
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Regunath B
 

En vedette (20)

How Flipkart scales PHP
How Flipkart scales PHPHow Flipkart scales PHP
How Flipkart scales PHP
 
Building a Scalable Architecture for web apps
Building a Scalable Architecture for web appsBuilding a Scalable Architecture for web apps
Building a Scalable Architecture for web apps
 
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
Slash n: Tech Talk Track 2 – Distributed Transactions in SOA - Yogi Kulkarni,...
 
Fungus on White Bread
Fungus on White BreadFungus on White Bread
Fungus on White Bread
 
Continuous deployment-at-flipkart
Continuous deployment-at-flipkartContinuous deployment-at-flipkart
Continuous deployment-at-flipkart
 
facebook architecture for 600M users
facebook architecture for 600M usersfacebook architecture for 600M users
facebook architecture for 600M users
 
Flipkart
FlipkartFlipkart
Flipkart
 
Architecture of a Modern Web App
Architecture of a Modern Web AppArchitecture of a Modern Web App
Architecture of a Modern Web App
 
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok BanerjeeSlash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
Slash n: Tech Talk Track 1 – Experimentation Platform - Ashok Banerjee
 
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal,  V...
Slash n: Technical Session 2 - Messaging as a Platform - Shashwat Agarwal, V...
 
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
Slash n: Technical Session 6 - Keeping a commercial site secure – A case stud...
 
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
Slash n: Technical Session 7 - Fraudsters are smart, Frank is smarter - Vivek...
 
Driving User Growth Through Online Marketing
Driving User Growth Through Online MarketingDriving User Growth Through Online Marketing
Driving User Growth Through Online Marketing
 
Introduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDBIntroduction to NoSQL db and mongoDB
Introduction to NoSQL db and mongoDB
 
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay SinghSlash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
Slash n: Technical Session 8 - Making Time - minute by minute - Janmejay Singh
 
Soa design pattern
Soa design patternSoa design pattern
Soa design pattern
 
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COMINFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
INFORMATION SYSTEM FOR MANAGER CONCEPTS RELATED TO FLIPKART.COM
 
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEMFlipkartFLIPKART USE IT AND INFORMATION SYSTEM
FlipkartFLIPKART USE IT AND INFORMATION SYSTEM
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
 
Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3Aadhaar at 5th_elephant_v3
Aadhaar at 5th_elephant_v3
 

Similaire à Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Productionconfluent
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows AzureDamir Dobric
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...SQLExpert.pl
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Yogi Kulkarni
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutSander Temme
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?Jagadish Venkatraman
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...HostedbyConfluent
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Nitin S
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationNitin Sharma
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffyAnuradha
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Lucidworks
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionJBUG London
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to ProductionC2B2 Consulting
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitinbloomreacheng
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NETDavid Giard
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi
 

Similaire à Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy (20)

Streaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in ProductionStreaming in Practice - Putting Apache Kafka in Production
Streaming in Practice - Putting Apache Kafka in Production
 
Architectures with Windows Azure
Architectures with Windows AzureArchitectures with Windows Azure
Architectures with Windows Azure
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0Perfomance tuning on Go 2.0
Perfomance tuning on Go 2.0
 
Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
 
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure Solr Compute Cloud - An Elastic SolrCloud Infrastructure
Solr Compute Cloud - An Elastic SolrCloud Infrastructure
 
Solr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin PresentationSolr Lucene Conference 2014 - Nitin Presentation
Solr Lucene Conference 2014 - Nitin Presentation
 
Sql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffySql server 2012 - always on deep dive - bob duffy
Sql server 2012 - always on deep dive - bob duffy
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
Solr Compute Cloud – An Elastic Solr Infrastructure: Presented by Nitin Sharm...
 
Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
 
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - NitinSolr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
Solr Lucene Revolution 2014 - Solr Compute Cloud - Nitin
 
Cdn cs6740
Cdn cs6740Cdn cs6740
Cdn cs6740
 
Scaling habits of ASP.NET
Scaling habits of ASP.NETScaling habits of ASP.NET
Scaling habits of ASP.NET
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 

Dernier

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Dernier (20)

Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Slash n: Tech Talk Track 2 – Website Architecture-Mistakes & Learnings - Siddhartha Reddy

  • 1. Flipkart Website Architecture Mistakes & Learnings Siddhartha Reddy Architect, Flipkart
  • 5. www.flipkart.com • Started in 2007 • Current Architecture from mid 2010 • Evolution of the architecture presented as… Issue[1] RCA[2] Actions Learnings • *1+ Issue: Website is “slow” • [2] RCA = Root Cause Analysis
  • 6. Surviving & reacting to the environment INFANCY (2007 – MID-2010)
  • 8. RCA • Why? – MySQL queries taking too long • Why? – Too many queries – Many slow queries – Queries locking tables • Why? – Capacity • Hmm…
  • 9. Fixing it • Get beefier servers (the obvious) • Separate master_db, slave_db – Writes go to master_db – Reads from slave_db – Critical reads from master_db Writes Reads Reads Writes MySQL MySQL MySQL Replication Slave Master
  • 10. Learning from it • Scale-out databases reads by distributing load across systems • Isolate database writes from reads – Writes are (usually) more critical
  • 12. RCA • Why? – MySQL queries taking too long (on slave_db) • Why? – Too many queries – Many slow queries • Why? – Queries from analytics / reporting and other backend jobs • Urm…
  • 13. Fixing it • Analytics / reporting DB (archival_db) – Use MyISAM — optimized for reads – Additional indexes for quicker reporting Website Website Writes Reads Website Website Writes Reads MySQL MySQL Replication Slave 1 Master MySQL MySQL Replication Slave Master Replication Analytics MySQL Analytics Reads Slave 2 Reads
  • 14. Learning from it • Isolate the databases being used for serving website traffic from those being used for analytical/reporting • Isolate systems being used by production website from those being used for background processing
  • 15. Learning the basics BABY (2010 – 2011)
  • 17. RCA • Why? • How? – Instrumentation
  • 18. RCA - 1 • Why? – Logging a lot – PHP processes blocking on writing logs Request2 -> Process2 Writing Waiting Waiting Request1 Request3 Request2 Request2 Request3 -> Process1 -> Process3 :Process1 :Process2 :Process3 Log file
  • 19. RCA - 2 • Why? – Service Oriented Architecture (SOA) – Too many calls to remote services per request • Creating fresh connection for each call • All the calls are made in serial order Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response
  • 20. RCA - 3 • Why? – Configurability – Fetch a lot of “config” from database for serving each request Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response
  • 21. RCA – 1,2,3 • Why? – Logging a lot – SOA – Configurability • Why? – PHP’s process model • Argh!
  • 22. Fixing it • fk-w3-agent – Simple Java “middleware” daemon – Deployed on each web server – PHP communicates to it through local socket – Hosts pluggable “handlers”
  • 23. fk-w3-agent: LoggingHandler Request2 Request2 -> Process2 -> Process2 Request1 Request3 Request1 Request3 -> Process1 -> Process3 -> Process1 -> Process3 fk-w3- Log file agent Async / buffered Log file
  • 24. fk-w3-agent: ServiceHandler(s) Connect to Request Connect Request Send Receive request Service1 Service1 Service2 Service2 response Call Receive request Send response fk-w3-agent fk-w3- agent Service1 Service2
  • 25. fk-w3-agent: ConfigHandler Receive Fetch Fetch Fetch Fetch Send request Config1 Config2 Config3 Config4 response Database Fetch all config from Receive request Send response fk-w3-agent fk-w3- agent Poll and cache Database
  • 26. Learning from it • PHP — good for frontend and templating – Gives a lot of agility – Limiting process model • Hurdle for high performance • Java — stability and performance • Horses for courses
  • 28. RCA • Why? – PHP processes taking up too much time – PHP processes taking up too much CPU • Why? – Product info deserialization taking up time/CPU – View construction taking up time/CPU
  • 29. Fixing it • Caching! • Cache fully constructed pages – For a few minutes – Only for highly trafficked pages (Homepage) • Cache PHP serialized Product objects – ~20 million objects – Memcache • Yeah! But… – Add caching => add complexity
  • 30. Caching: Complications (1) • “Caching fully constructed pages” • But parts of pages still need to be dynamic • Example: Logged-in user’s name • Impossible to do effective bucket testing • Or at least makes it prohibitively complex
  • 31. Caching: Complications (2) • “Caching PHP serialized Product objects” • Without caching: getProductInfo() Fetch from CMS • With caching, cache hit: getProductInfo() Fetch from Cache • With caching, cache miss: Fetch from Fetch from getProductInfo() Set in Cache Cache CMS
  • 32. Caching: Complications (3) • TTL: ∞ (i.e. no invalidation) • Pro-actively repopulate products in the cache – Receive “notifications” about product updates • Notification Server — pushes notifications raised by CMS • Use a persistent, distributed cache – Memcache => Membase, Couchbase
  • 33. Learning from it • Caching is a powerful tool for performance optimization • Caching adds complexities – Reduced by keeping cache close to data source – Think deeply about TTL, invalidation • Use caching to go from “acceptable performance” to “awesome performance” – Don’t rely on it to get to “acceptable performance”
  • 36. RCA • Why? – Search-service is slow (or Reviews-service is slow or Recommendations-service is slow) • But why is rest of website slow? – Requests to the slow service are blocking processing threads • Eh?!
  • 37. Let’s do some math • Let’s say – Mean (or median) response time: 100 ms – 8-core server – All requests are CPU bound • Throughput: 80 requests per second (rps) • Let’s also say – 95th Percentile response time: 1000 ms • Call them “bad requests” • 4 bad requests in a second – Throughput down to 44 rps • 8 bad requests in a second? – Throughput down to 8 rps
  • 38. Fixing it • Aggressive timeouts for all service calls – Isolate impact of a slow service • only to pages that depend on it • Very aggressive timeouts for non-critical services – Example: Recommendations • On a Product page, Search results page etc. • Not on My Recommendations page • Load non-critical parts of pages through AJAX
  • 39. Learning from it • Isolate the impact of a poorly performing services / systems • Isolate the required from the good-to-have
  • 41. RCA • Why? – Load average of web servers has spiked • Why? – Requests per second has spiked • From 1000 rps to 1500 rps • Why? – Large number of notifications of product information updates
  • 42. Fixing it • Separate cluster for receiving product info update notifications from the cluster that serves users • Admission control: Don’t let a system receive more requests than it can handle – Throttling • Batch the notifications
  • 43. Learning from it • Isolate the systems serving internal requests from those serving production traffic • Admission control to ensure that a system is isolated from the over-enthusiasm of a client • Look at the granularity at which we’re working
  • 45.
  • 47. Mistake? • Sub-optimal decision – Not all information/scenarios considered – Insufficient information – Built for a different scenario • Due to focus on “functional” aspects • A mistake is a mistake – … in retrospect

Notes de l'éditeur

  1. “This has basically given us lots of opportunities to make mistakes. And make mistakes we did.”
  2. Website Architecture diagram goes here
  3. No