SlideShare une entreprise Scribd logo
1  sur  46
Télécharger pour lire hors ligne
感谢您参加本次Ar h u
         c S mmi全球架构师峰会!
               t
大会官方网站与资料下载地址:
www. c um m i . om
   ar hs    tc
Scaling




                          Marty Weiner                                   Evrhet Milam
                          Krypton                                        Batcave




12年8月10⽇日星期五
TODO:
Title page names
Pass on page title consistency
Fill out numbers
Put "always test in production" pin in screenshot of website
Pinterest is . . .
          An online pinboard to organize
                        and
             share what inspires you.



  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
12年8月10⽇日星期五

Images should be full-bleed when possible. Captions should be succinct and appear in bold
in the bottom right corner. If necessary, you can make this white to make it legible (like this
one).
12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
Relationships




                      Marty Weiner
                      Grayskull, Eternia




                                           Yashh Nelapati
                                           Gotham City
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Page Views / Day


                       ·   RackSpace
                       ·   1 small Web Engine
                       ·
          Mar 2010                       Jan 2011                 Jan 2012
                           1 small MySQL DB
                       ·   1 Engineer

            Mar 2010          Jan 2011                 Jan 2012     May 2012




  Scaling Pinterest

12年8月10⽇日星期五

4-5 mts.
·   Amazon EC2 + Page Views / Day
                                         S3 + CloudFront
                       ·   1 NGinX, 4 Web Engines
                       ·   1 MySQL DB + 1 Read Slave
                       ·   1 Task Queue + 2 Task Processors
                       ·   1 MongoDB

            Mar 2010
                       ·   2 Engineers
                                         Jan 2011             Jan 2012   May 2012




  Scaling Pinterest

12年8月10⽇日星期五

TODO: Show total somewhere
·   Amazon EC2 + S3 + CloudFront
                  ·   2 NGinX, 16 Web Engines + 2 API Engines
                                      Page Views / Day

                  ·   5 Functionally Sharded MySQL DB + 9 read slaves
                  ·   4 Cassandra Nodes
                  ·   15 Membase Nodes (3 separate clusters)
                  ·   8 Memcache Nodes
                  ·   10 Redis Nodes
          Mar 2010·
           Mar 2010
                      3 Task Routers + 4 Task Processors
                                    Jan 2011
                                               Jan 2011
                                                           Jan 2012
                                                                      Jan 2012
                                                                         May 2012

                  ·   4 Elastic Search Nodes
                  ·   3 Mongo Clusters
                  ·   3 Engineers
  Scaling Pinterest

12年8月10⽇日星期五
Lesson Learned #1
                      It will fail. Keep it simple.




  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
·   Amazon EC2 + S3 + Akamai, ELB
                                                Page Views / Day
                       ·   90 Web Engines + 50 API Engines
                       ·   66 MySQL DBs (m1.xlarge) + 1 slave each
                       ·   59 Redis Instances
                       ·   51 Memcache Instances
                       ·   1 Redis Task Manager + 25 Task Processors
            Mar 2010   ·   Sharded Solr   Jan 2011                 Jan 2012   May 2012



                       ·   6 Engineers



  Scaling Pinterest

12年8月10⽇日星期五
·   Amazon EC2 + S3 + Edge Cast, ELB
                                                Page Views / Day
                       ·   135 Web Engines + 75 API Engines
                       ·   80 MySQL DBs (m1.xlarge) + 1 slave each
                       ·   110 Redis Instances
                       ·   60 Memcache Instances
                       ·   2 Redis Task Manager + 60 Task Processors
            Mar 2010   ·   Sharded Solr   Jan 2011                 Jan 2012   May 2012



                       ·   25 Engineers



  Scaling Pinterest

12年8月10⽇日星期五
Why Amazon EC2/S3?
             · Very good reliability, reporting, and support
             · Very good peripherals, such as managed
                  cache, DB, load balancing, DNS, map
                  reduce, and more...
             · New instances ready in seconds
             · Con: Limited choice
             · Pro: Limited choice

  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why MySQL?
             ·    Extremely mature
             ·    Well known and well liked
             ·    Rarely catastrophic loss of data
             ·    Response time to request rate increases linearly
             ·    Very good software support - XtraBackup, Innotop,
                  Maatkit
             · Solid active community
             · Very good support from Percona
             · Free

  Scaling Pinterest

12年8月10⽇日星期五

TODO: Animate in a money bag
Why Memcache?
             ·    Extremely mature
             ·    Very good performance
             ·    Well known and well liked
             ·    Never crashes, and few failure modes
             ·    Free




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why Redis?
             ·    Variety of convenient data structures
             ·    Has persistence and replication
             ·    Well known and well liked
             ·    Consistently good performance
             ·    Few failure modes
             ·    Free



  Scaling Pinterest

12年8月10⽇日星期五

Data structures -- list, set, sorted set, pubsub, string. All support atomic operations. All
support pipelining.
Clustering
                                     vs
                                  Sharding




  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported)
Clustering



                                 ·   Data distributed automatically
                                 ·   Data can move
                                 ·   Rebalances to distribute capacity
                                 ·   Nodes communicate with each othe



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
Clustering



                                 ·   Data distributed manually
                                 ·   Data does not move
                                 ·   Split data to distribute load
                                 ·   Nodes are not aware of each other



                Sharding
  Scaling Pinterest

12年8月10⽇日星期五

clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how
data is distributed, splits databases, and writes code to redistribute data (if that option is
supported). Clustered nodes gossip with each other to determine if rebalancing is needed
Why Clustering?
             ·    Examples: Cassandra, MemBase, HBase, Riak
             ·    Automatically scale your datastore
             ·    Easy to set up
             ·    Spatially distribute and colocate your data
             ·    High availability
             ·    Load balancing
             ·    No single point of failure

  Scaling Pinterest

12年8月10⽇日星期五

What could go wrong?
What could possibly go wrong?




                                                          source: thereifixedit.com




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Why Not Clustering?
                  ·     Still fairly young
                  ·     Fundamentally complicated
                  ·     Less community support
                  ·     Fewer engineers with working knowledge
                  ·     Difficult and scary upgrade mechanisms
                  ·     And, yes, there is a single point of failure. A
                        BIG one.


   Scaling Pinterest

12年8月10⽇日星期五
I lied, there is major single point of failure. The cluster management system. What if it fails? When will it fail? How will you, a non DB developer, fix it? Clustering is a pipe dream. Maybe
someday it will work -- same day when there's a perfect makefile system, version control, bug management
Clustering Single Point of Failure


                                                               Cluster
                                                              Management
                                                              Algorithm




  Scaling Pinterest

12年8月10⽇日星期五

joke (without names) about how when the cluster manager fails, load balancing may stop one
night and keep you up til 6am, or the cluster manager sprays bad data to all nodes and teh
CEO of the cluster company contacts you telling you your data is gone and can he buy you a
pizza.
Cluster Manager
             · Same complex code replicated over all nodes
             · Failure modes:
               · Data rebalance breaks
               · Data corruption across all nodes
               · Improper balancing that cannot be fixed
                      (easily)
                  · Data authority failure

  Scaling Pinterest

12年8月10⽇日星期五

What could go wrong?
Lesson Learned #2
                      Clustering is scary.




  Scaling Pinterest

12年8月10⽇日星期五

Swap speakers after this slide
Why Sharding?
             ·    Can split your databases to add more capacity
             ·    Spatially distribute and colocate your data
             ·    High availability
             ·    Load balancing
             ·    Algorithm for placing data is very simple
             ·    ID generation is simplistic


  Scaling Pinterest

12年8月10⽇日星期五

ID management is a single point of failure too, but much simpler and easy to test/debug.
When to shard?
             · Sharding makes schema design harder

             · Solidify site design and backend architecture
             · Remove all joins and complex queries, add
                  cache
             · Functionally shard as much as possible
             · Still growing? Shard.

  Scaling Pinterest

12年8月10⽇日星期五

Maybe a pictograph here showing sharding early = faster transition, but unnecessary
complexity. later = slower transition
Our Transition
                          1 DB + Foreign Keys + Joins
                          1 DB + Denormalized +
                          Cache + Read slaves +
                           1 DB
                      Cache
         Several functionally sharded DBs + Read slaves +
         Cache ID sharded DBs + Backup slaves +

                      Cache


  Scaling Pinterest

12年8月10⽇日星期五

Another possible splitting point
Watch out for...
             ·    Cannot perform most JOINS
             ·    No transaction capabilities
             ·    Extra effort to maintain unique constraints
             ·    Schema changes requires more planning
             ·    Reports require running same query on all
                  shards


  Scaling Pinterest

12年8月10⽇日星期五

Another possible splitting point
How we sharded




  Scaling Pinterest

12年8月10⽇日星期五
Sharded Server Topology




                      db00001        db00513               db03072              db03584
                      db00002        db00514               db03073              db03585
                        .......        .......               .......              .......
                      db00512        db01024               db03583              db04096


                      Initially, 8 physical servers, each with 512
                      DBs
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
High Availability




                      db00001          db00513            db03072              db03584
                      db00002          db00514            db03073              db03585
                        .......          .......            .......              .......
                      db00512          db01024            db03583              db04096


                                   Multi Master
                                   replication
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Increased load on DB?

                                                                   db00001
                                                                   db00002
                                                                     .......
                                                                   db00256




                         db00001
                         db00002
                           .......                                 db00257
                         db00512                                   db00258
                                                                     .......
                                                                   db00512
                 To increase capacity, a server is replicated
                 and the new replica becomes responsible for
                 some DBs
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
ID Structure
                                             64 bits


                      Shard ID        Type               Local ID
         · A lookup data structure has physical server to
              shard ID range (cached by each app server
              process)
         · Shard ID denotes which shard
         · Type denotes object type (e.g., pins)
         · Local ID denotes position in table
  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Lookup Structure

                                    {“sharddb001a”:   (   1, 512),
                                     “sharddb002b”:   ( 513, 1024),
                                     “sharddb003a”:   (1025, 1536),
                                      ...
                                     “sharddb008b”:   (3585, 4096)}




                      sharddb003a                      DB01025                   users


                                                        users                1   ser-data
                                                  user_has_boards            2   ser-data
                                                       boards                3   ser-data



  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
ID Structure
         · New users are randomly distributed across
              shards
         · Boards, pins, etc. try to be collocated with user
         · Local ID’s are assigned by auto-increment
         · Enough ID space for 65536 shards, but only
              first 4096 opened initially. Can expand
              horizontally.


  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
· Object tables (e.g., pin, board, user, comment)
                  Objects and Mappings
               · Local ID    MySQL blob (JSON / Serialized
                      thrift)
             · Mapping tables (e.g., user has boards, pin has
                  likes)
                  · Full ID  Full ID (+ timestamp)
                  · Naming schema is noun_verb_noun
             ·    Queries are PK or index lookups (no joins)
             ·    Data DOES NOT MOVE
             ·    All tables exist on all shards
             ·    No schema changes required (index = new
  Scaling Pinterest



                  table)
12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Loading a Page
    · Rendering user profile
         SELECT       body FROM users WHERE id=<local_user_id>
         SELECT       board_id FROM user_has_boards WHERE user_id=<user_id>
         SELECT       body FROM boards WHERE id IN (<board_ids>)
         SELECT       pin_id FROM board_has_pins WHERE board_id=<board_id>
         SELECT       body FROM pins WHERE id IN (pin_ids)
    · Most of these calls will be a cache hit
    · Omitting offset/limits and mapping sequence
         id sort

  Scaling Pinterest

12年8月10⽇日星期五
Scripting
         · Must get old data into your shiny new shard
         · 500M pins, 1.6B follower rows, etc
         · Build a scripting farm
           · Spawn more workers and complete the task
                  faster
         · Pyres - based on Github’s Resque queue



  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
Future
         · Sharded MySQL is here to stay
         · Auto-sharding on top of MySQL becoming
              viable
         · Clustering may become hardened in 5 to 10
              years




  Scaling Pinterest

12年8月10⽇日星期五

If you have to use a list, do this. But remember, the less stuff on a slide, the better
In The Works
                · Service Based Architecture
                  · Connection limits
                  · Isolation of functionality
                  · Isolation of access (security)
                · Scaling the Team



  Scaling Pinterest

12年8月10⽇日星期五

Connection limits + Isolation of functionality = service oriented architecture
Lesson Learned #3
                          Keep it fun.




  Scaling Pinterest

12年8月10⽇日星期五

Swap speakers after this slide
We are Hiring!
                                 jobs@pinterest.com




  Scaling Pinterest

12年8月10⽇日星期五

Connection limits + Isolation of functionality = service oriented architecture
Questions?

                      marty@pinterest.com              yashh@pinterest.com

                                     evrhet@pinterest.com


  Scaling Pinterest

12年8月10⽇日星期五

Talking points should be presented with any key phrases in bold, and everything else regular
weight. All text should always be centered.
杭州站·2012年 10月 25日 ~27日
大会官网:www.c n a g h uc m
        q o h n z o .o

Contenu connexe

Similaire à Pinterest arch summit august 2012 - scaling pinterest

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构
Tommy Chiu
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012
trisberg
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
Ben Stopford
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
Don Demcsak
 
3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j
Neo4j
 

Similaire à Pinterest arch summit august 2012 - scaling pinterest (20)

Pinterest的数据库分片架构
Pinterest的数据库分片架构Pinterest的数据库分片架构
Pinterest的数据库分片架构
 
Morning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et IntroductionsMorning with MongoDB Paris 2012 - Accueil et Introductions
Morning with MongoDB Paris 2012 - Accueil et Introductions
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Spring Data NHJUG April 2012
Spring Data NHJUG April 2012Spring Data NHJUG April 2012
Spring Data NHJUG April 2012
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911North Bay Ruby Meetup 101911
North Bay Ruby Meetup 101911
 
NoSQL
NoSQLNoSQL
NoSQL
 
Building Scalable Web Applications For The Cloud
Building Scalable Web Applications For The CloudBuilding Scalable Web Applications For The Cloud
Building Scalable Web Applications For The Cloud
 
Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012Where Does Big Data Meet Big Database - QCon 2012
Where Does Big Data Meet Big Database - QCon 2012
 
Minnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with CassandraMinnebar 2013 - Scaling with Cassandra
Minnebar 2013 - Scaling with Cassandra
 
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...Building a data warehouse  with Amazon Redshift … and a quick look at Amazon ...
Building a data warehouse with Amazon Redshift … and a quick look at Amazon ...
 
Iwmn architecture
Iwmn architectureIwmn architecture
Iwmn architecture
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
Machine Learning with JavaScript
Machine Learning with JavaScriptMachine Learning with JavaScript
Machine Learning with JavaScript
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
CloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heavenCloudFoundry and MongoDb, a marriage made in heaven
CloudFoundry and MongoDb, a marriage made in heaven
 
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQLHow big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
How big data moved the needle from monolithic SQL RDBMS to distributed NoSQL
 
3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j3/15 - Intro to Spring Data Neo4j
3/15 - Intro to Spring Data Neo4j
 
Overview of Redundant Disk Arrays
Overview of Redundant Disk ArraysOverview of Redundant Disk Arrays
Overview of Redundant Disk Arrays
 
No sql
No sqlNo sql
No sql
 

Plus de drewz lin

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
drewz lin
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
drewz lin
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
drewz lin
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
drewz lin
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
drewz lin
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
drewz lin
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
drewz lin
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
drewz lin
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
drewz lin
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
drewz lin
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
drewz lin
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
drewz lin
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
drewz lin
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
drewz lin
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
drewz lin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
drewz lin
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
drewz lin
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
drewz lin
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架
drewz lin
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈
drewz lin
 

Plus de drewz lin (20)

Web security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-kearyWeb security-–-everything-we-know-is-wrong-eoin-keary
Web security-–-everything-we-know-is-wrong-eoin-keary
 
Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013Via forensics appsecusa-nov-2013
Via forensics appsecusa-nov-2013
 
Phu appsec13
Phu appsec13Phu appsec13
Phu appsec13
 
Owasp2013 johannesullrich
Owasp2013 johannesullrichOwasp2013 johannesullrich
Owasp2013 johannesullrich
 
Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2Owasp advanced mobile-application-code-review-techniques-v0.2
Owasp advanced mobile-application-code-review-techniques-v0.2
 
I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2I mas appsecusa-nov13-v2
I mas appsecusa-nov13-v2
 
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolfDefeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
Defeating xss-and-xsrf-with-my faces-frameworks-steve-wolf
 
Csrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equalCsrf not-all-defenses-are-created-equal
Csrf not-all-defenses-are-created-equal
 
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
Chuck willis-owaspbwa-beyond-1.0-app secusa-2013-11-21
 
Appsec usa roberthansen
Appsec usa roberthansenAppsec usa roberthansen
Appsec usa roberthansen
 
Appsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaolaAppsec usa2013 js_libinsecurity_stefanodipaola
Appsec usa2013 js_libinsecurity_stefanodipaola
 
Appsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_editsAppsec2013 presentation-dickson final-with_all_final_edits
Appsec2013 presentation-dickson final-with_all_final_edits
 
Appsec2013 presentation
Appsec2013 presentationAppsec2013 presentation
Appsec2013 presentation
 
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitationsAppsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
Appsec 2013-krehel-ondrej-forensic-investigations-of-web-exploitations
 
Appsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martinAppsec2013 assurance tagging-robert martin
Appsec2013 assurance tagging-robert martin
 
Amol scadaowasp
Amol scadaowaspAmol scadaowasp
Amol scadaowasp
 
Agile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usaAgile sdlc-v1.1-owasp-app sec-usa
Agile sdlc-v1.1-owasp-app sec-usa
 
Vulnex app secusa2013
Vulnex app secusa2013Vulnex app secusa2013
Vulnex app secusa2013
 
基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架基于虚拟化技术的分布式软件测试框架
基于虚拟化技术的分布式软件测试框架
 
新浪微博稳定性经验谈
新浪微博稳定性经验谈新浪微博稳定性经验谈
新浪微博稳定性经验谈
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Pinterest arch summit august 2012 - scaling pinterest

  • 1. 感谢您参加本次Ar h u c S mmi全球架构师峰会! t 大会官方网站与资料下载地址: www. c um m i . om ar hs tc
  • 2. Scaling Marty Weiner Evrhet Milam Krypton Batcave 12年8月10⽇日星期五 TODO: Title page names Pass on page title consistency Fill out numbers Put "always test in production" pin in screenshot of website
  • 3. Pinterest is . . . An online pinboard to organize and share what inspires you. Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 4. 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 5. 12年8月10⽇日星期五 Images should be full-bleed when possible. Captions should be succinct and appear in bold in the bottom right corner. If necessary, you can make this white to make it legible (like this one).
  • 6. 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 7. Relationships Marty Weiner Grayskull, Eternia Yashh Nelapati Gotham City Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 8. Page Views / Day · RackSpace · 1 small Web Engine · Mar 2010 Jan 2011 Jan 2012 1 small MySQL DB · 1 Engineer Mar 2010 Jan 2011 Jan 2012 May 2012 Scaling Pinterest 12年8月10⽇日星期五 4-5 mts.
  • 9. · Amazon EC2 + Page Views / Day S3 + CloudFront · 1 NGinX, 4 Web Engines · 1 MySQL DB + 1 Read Slave · 1 Task Queue + 2 Task Processors · 1 MongoDB Mar 2010 · 2 Engineers Jan 2011 Jan 2012 May 2012 Scaling Pinterest 12年8月10⽇日星期五 TODO: Show total somewhere
  • 10. · Amazon EC2 + S3 + CloudFront · 2 NGinX, 16 Web Engines + 2 API Engines Page Views / Day · 5 Functionally Sharded MySQL DB + 9 read slaves · 4 Cassandra Nodes · 15 Membase Nodes (3 separate clusters) · 8 Memcache Nodes · 10 Redis Nodes Mar 2010· Mar 2010 3 Task Routers + 4 Task Processors Jan 2011 Jan 2011 Jan 2012 Jan 2012 May 2012 · 4 Elastic Search Nodes · 3 Mongo Clusters · 3 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 11. Lesson Learned #1 It will fail. Keep it simple. Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 12. · Amazon EC2 + S3 + Akamai, ELB Page Views / Day · 90 Web Engines + 50 API Engines · 66 MySQL DBs (m1.xlarge) + 1 slave each · 59 Redis Instances · 51 Memcache Instances · 1 Redis Task Manager + 25 Task Processors Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012 · 6 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 13. · Amazon EC2 + S3 + Edge Cast, ELB Page Views / Day · 135 Web Engines + 75 API Engines · 80 MySQL DBs (m1.xlarge) + 1 slave each · 110 Redis Instances · 60 Memcache Instances · 2 Redis Task Manager + 60 Task Processors Mar 2010 · Sharded Solr Jan 2011 Jan 2012 May 2012 · 25 Engineers Scaling Pinterest 12年8月10⽇日星期五
  • 14. Why Amazon EC2/S3? · Very good reliability, reporting, and support · Very good peripherals, such as managed cache, DB, load balancing, DNS, map reduce, and more... · New instances ready in seconds · Con: Limited choice · Pro: Limited choice Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 15. Why MySQL? · Extremely mature · Well known and well liked · Rarely catastrophic loss of data · Response time to request rate increases linearly · Very good software support - XtraBackup, Innotop, Maatkit · Solid active community · Very good support from Percona · Free Scaling Pinterest 12年8月10⽇日星期五 TODO: Animate in a money bag
  • 16. Why Memcache? · Extremely mature · Very good performance · Well known and well liked · Never crashes, and few failure modes · Free Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 17. Why Redis? · Variety of convenient data structures · Has persistence and replication · Well known and well liked · Consistently good performance · Few failure modes · Free Scaling Pinterest 12年8月10⽇日星期五 Data structures -- list, set, sorted set, pubsub, string. All support atomic operations. All support pipelining.
  • 18. Clustering vs Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported)
  • 19. Clustering · Data distributed automatically · Data can move · Rebalances to distribute capacity · Nodes communicate with each othe Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported). Clustered nodes gossip with each other to determine if rebalancing is needed
  • 20. Clustering · Data distributed manually · Data does not move · Split data to distribute load · Nodes are not aware of each other Sharding Scaling Pinterest 12年8月10⽇日星期五 clustered = automatically managed nodes with autoscaling, sharding = engineer assigns how data is distributed, splits databases, and writes code to redistribute data (if that option is supported). Clustered nodes gossip with each other to determine if rebalancing is needed
  • 21. Why Clustering? · Examples: Cassandra, MemBase, HBase, Riak · Automatically scale your datastore · Easy to set up · Spatially distribute and colocate your data · High availability · Load balancing · No single point of failure Scaling Pinterest 12年8月10⽇日星期五 What could go wrong?
  • 22. What could possibly go wrong? source: thereifixedit.com Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 23. Why Not Clustering? · Still fairly young · Fundamentally complicated · Less community support · Fewer engineers with working knowledge · Difficult and scary upgrade mechanisms · And, yes, there is a single point of failure. A BIG one. Scaling Pinterest 12年8月10⽇日星期五 I lied, there is major single point of failure. The cluster management system. What if it fails? When will it fail? How will you, a non DB developer, fix it? Clustering is a pipe dream. Maybe someday it will work -- same day when there's a perfect makefile system, version control, bug management
  • 24. Clustering Single Point of Failure Cluster Management Algorithm Scaling Pinterest 12年8月10⽇日星期五 joke (without names) about how when the cluster manager fails, load balancing may stop one night and keep you up til 6am, or the cluster manager sprays bad data to all nodes and teh CEO of the cluster company contacts you telling you your data is gone and can he buy you a pizza.
  • 25. Cluster Manager · Same complex code replicated over all nodes · Failure modes: · Data rebalance breaks · Data corruption across all nodes · Improper balancing that cannot be fixed (easily) · Data authority failure Scaling Pinterest 12年8月10⽇日星期五 What could go wrong?
  • 26. Lesson Learned #2 Clustering is scary. Scaling Pinterest 12年8月10⽇日星期五 Swap speakers after this slide
  • 27. Why Sharding? · Can split your databases to add more capacity · Spatially distribute and colocate your data · High availability · Load balancing · Algorithm for placing data is very simple · ID generation is simplistic Scaling Pinterest 12年8月10⽇日星期五 ID management is a single point of failure too, but much simpler and easy to test/debug.
  • 28. When to shard? · Sharding makes schema design harder · Solidify site design and backend architecture · Remove all joins and complex queries, add cache · Functionally shard as much as possible · Still growing? Shard. Scaling Pinterest 12年8月10⽇日星期五 Maybe a pictograph here showing sharding early = faster transition, but unnecessary complexity. later = slower transition
  • 29. Our Transition 1 DB + Foreign Keys + Joins 1 DB + Denormalized + Cache + Read slaves + 1 DB Cache Several functionally sharded DBs + Read slaves + Cache ID sharded DBs + Backup slaves + Cache Scaling Pinterest 12年8月10⽇日星期五 Another possible splitting point
  • 30. Watch out for... · Cannot perform most JOINS · No transaction capabilities · Extra effort to maintain unique constraints · Schema changes requires more planning · Reports require running same query on all shards Scaling Pinterest 12年8月10⽇日星期五 Another possible splitting point
  • 31. How we sharded Scaling Pinterest 12年8月10⽇日星期五
  • 32. Sharded Server Topology db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Initially, 8 physical servers, each with 512 DBs Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 33. High Availability db00001 db00513 db03072 db03584 db00002 db00514 db03073 db03585 ....... ....... ....... ....... db00512 db01024 db03583 db04096 Multi Master replication Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 34. Increased load on DB? db00001 db00002 ....... db00256 db00001 db00002 ....... db00257 db00512 db00258 ....... db00512 To increase capacity, a server is replicated and the new replica becomes responsible for some DBs Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 35. ID Structure 64 bits Shard ID Type Local ID · A lookup data structure has physical server to shard ID range (cached by each app server process) · Shard ID denotes which shard · Type denotes object type (e.g., pins) · Local ID denotes position in table Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 36. Lookup Structure {“sharddb001a”: ( 1, 512), “sharddb002b”: ( 513, 1024), “sharddb003a”: (1025, 1536), ... “sharddb008b”: (3585, 4096)} sharddb003a DB01025 users users 1 ser-data user_has_boards 2 ser-data boards 3 ser-data Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 37. ID Structure · New users are randomly distributed across shards · Boards, pins, etc. try to be collocated with user · Local ID’s are assigned by auto-increment · Enough ID space for 65536 shards, but only first 4096 opened initially. Can expand horizontally. Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 38. · Object tables (e.g., pin, board, user, comment) Objects and Mappings · Local ID MySQL blob (JSON / Serialized thrift) · Mapping tables (e.g., user has boards, pin has likes) · Full ID Full ID (+ timestamp) · Naming schema is noun_verb_noun · Queries are PK or index lookups (no joins) · Data DOES NOT MOVE · All tables exist on all shards · No schema changes required (index = new Scaling Pinterest table) 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 39. Loading a Page · Rendering user profile SELECT body FROM users WHERE id=<local_user_id> SELECT board_id FROM user_has_boards WHERE user_id=<user_id> SELECT body FROM boards WHERE id IN (<board_ids>) SELECT pin_id FROM board_has_pins WHERE board_id=<board_id> SELECT body FROM pins WHERE id IN (pin_ids) · Most of these calls will be a cache hit · Omitting offset/limits and mapping sequence id sort Scaling Pinterest 12年8月10⽇日星期五
  • 40. Scripting · Must get old data into your shiny new shard · 500M pins, 1.6B follower rows, etc · Build a scripting farm · Spawn more workers and complete the task faster · Pyres - based on Github’s Resque queue Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 41. Future · Sharded MySQL is here to stay · Auto-sharding on top of MySQL becoming viable · Clustering may become hardened in 5 to 10 years Scaling Pinterest 12年8月10⽇日星期五 If you have to use a list, do this. But remember, the less stuff on a slide, the better
  • 42. In The Works · Service Based Architecture · Connection limits · Isolation of functionality · Isolation of access (security) · Scaling the Team Scaling Pinterest 12年8月10⽇日星期五 Connection limits + Isolation of functionality = service oriented architecture
  • 43. Lesson Learned #3 Keep it fun. Scaling Pinterest 12年8月10⽇日星期五 Swap speakers after this slide
  • 44. We are Hiring! jobs@pinterest.com Scaling Pinterest 12年8月10⽇日星期五 Connection limits + Isolation of functionality = service oriented architecture
  • 45. Questions? marty@pinterest.com yashh@pinterest.com evrhet@pinterest.com Scaling Pinterest 12年8月10⽇日星期五 Talking points should be presented with any key phrases in bold, and everything else regular weight. All text should always be centered.
  • 46. 杭州站·2012年 10月 25日 ~27日 大会官网:www.c n a g h uc m q o h n z o .o