Seminario realizado en el marco del master CANS en la Facultad de Informática de Barcelona.
Anatomia de una aplicación Web
Demasiadas escrituras en la BD, ¿qué puedo hacer?
¿Cómo puedo aprovechar el "Cloud"?
Optimizando aplicaciones Facebook
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
EEDC 2010. Scaling Web Applications
1. 6.1. Web Scale 34330 EEDC Execution Environments for Distributed Computing 6.1.1. Anatomy of a service 6.1.2. Too many Writes to Database 6.1.3. Cheaper peaks 6.1.4. Facebook Platform Master in Computer Architecture, Networks and Systems - CANS
2. 6.1. Web Scale 34330 EEDC Execution Environments for Distributed Computing 6.1.1. Anatomy of a service 6.1.2. Too many Writes to Database 6.1.3. Cheaper peaks 6.1.4. Facebook Platform Master in Computer Architecture, Networks and Systems - CANS
8. Problems may arise in… Networklimits, interruptlimits OS limits, bugs, configurationerrors, faulty HW, error recovery,
9. Problems may arise in… Speed of clients, #threads, contentnot in sync, unresponsive Apps, toomanysources of contents, userpersistence, configurationerrors, bugs
11. Problems may arise in… Speed of clients, #threads, contentnot in sync, unresponsive Apps, toomanysources of contents, userpersistence configurationerrors, bugs
12. Problems may arise in… Databaseconcurrency, accessto 3rd party data (APIs), CPU ormemoryboundproblems, datacenterreplication, logginguseractions
13. Problems may arise in… Database concurrency, modifying schemas, Massive tables -> indexes, disk performance, CPU/memory bound, datacenter replication
14. Problems may arise in… Availability and performance, More than 24h to analyze daily logs Not reaching Inbox (spam folders) Surpass monitoring capacity
15. 6.1. Web Scale 34330 EEDC Execution Environments for Distributed Computing 6.1.1. Anatomy of a service 6.1.2. Too many Writes to Database 6.1.3. Cheaper peaks 6.1.4. Facebook Platform Master in Computer Architecture, Networks and Systems - CANS
16. Too many writes to database There’s no machine that could do 44k/sec over 1 TB of data. Scaling reads is easier: Big cache Replication On write you have to: Update data Update Transaction log Update indexes Invalidate cache Replicate Write to 2 or more disks (RAID x) http://www.scribd.com/doc/2592098/DVPmysqlucFederation-at-Flickr-Doing-Billions-of-Queries-Per-Day
17. Case Database Federation Sharding per User-ID Global Ring, know where is the data PHP Logic to connect shards and data consistent What’s a Shard?: Horizontal partitioning of a table, usually per Primary Key Benefits You can scale as long as you have budget Disadvantages You lost the possibility to do any JOIN, COUNT, RANGE, between Shards Your application logic has to be aware If you what to rebalance shards, you will need some kind of global unique, beware of auto-increments More services needing HA, BCP, change control, and so on
18. Case Global Ring? Storing Key-Value of: User_ID -> Shard_ID Photo_ID -> User_ID Group_ID -> Shard_ID Every access to data has to know where -> memcached with a TTL of 30 minutes Global IDs?: You don’t want two objects with the same ID! Strategies GUIDs: 128 bits Ids, so bigger indexes, and poor supported by MySQL Central autoincrement: You have a table where for every Id needed you do an insert and let MySQL take care of everything. At 60 photos/sec will be a BIG table Replace Into: An only MySQL solution, small tables and allows for redundancy (one server provides odd and another even
19. Case: Replace INTO The Tickets64 schema looks like: CREATE TABLE `Tickets64` ( `id` bigint(20) unsigned NOT NULL auto_increment, `stub` char(1) NOT NULL default '', PRIMARY KEY (`id`), UNIQUE KEY `stub` (`stub`) ) ENGINE=MyISAM SELECT * from Tickets64 returns a single row that looks something like: +-------------------+------+ | id | stub | +-------------------+------+ | 72157623227190423 | a | +-------------------+------+ When they need a new globally unique 64-bit ID they issue the following SQL: REPLACE INTO Tickets64 (stub) VALUES ('a'); SELECT LAST_INSERT_ID();
20. Case PHP Logic You lost any kind of intershard relational query (No JOINs) You lost any kind of integrity reference (No ForeignKeys) You have to control distributed transactions You select a Favorite (so they need to update your Shard and the one of the other user) Open 2 connections to the two shards Begin a transaction on both Shards Add the data If everything is ok -> commit, else roll back and error So we improve scalability but impact code complexity and performance off a single page view (hint: async database access)
21. Case They get an arbitrary scalable infrastructure They have a marginally more complex code
23. Case They get an arbitrary scalable infrastructure They have a marginally more complex code They “only” have 20 engineers, so scalability also means: Roughly 2.5 million Flickr members per engineer. Roughly 200 million photos per engineer. 28 user facing pages. 23 administrative pages. 20 API methods, though only 7.5 public API methods. 80 API calls per second. 250 CPUs. 850 annual deploys. 16 feature flags.
24. 6.1. Web Scale 34330 EEDC Execution Environments for Distributed Computing 6.1.1. Anatomy of a service 6.1.2. Too many Writes to Database 6.1.3. Cheaper peaks 6.1.4. Facebook Platform Master in Computer Architecture, Networks and Systems - CANS
25. Cheaperpeaks Ifyourcapacityplanning comes fromtheaggregate of allyourcustomers and you plan tohavethousands of them, whatcouldyou do? And your performance impacts in thebrand of yourcustomer (so you’llhaveproblems) You are a Start-up withoutloads of money
27. Case Have to store data for every page view their customer gets Do MAGIC over millions of rows to calculate related items for YOU Show recommendations to user Only 2 snippets of Javascript/HTML Less than 0’5 seconds per view
28. Case Option A Every hit to tracker becomes an Insert to a MySQL sharded by customer Every hit to recommender recalculates the list of items to show based on collective intelligence Benefits Straightforward to code and manage Quick and easy for a proof of concept Disadvantages One customer on their peak could surpass the capacity of the MySQL instance The same customer on their valley could be wasting money on an idle instance Our webserver could be overloaded with the sum of all our customers The recommender is a CPU and memory Hog and we need too many servers to cope with our estimated demand
29. Case Option B Every hit to tracker becomes an Insert to a MySQL sharded by customer We have a cron job that recalculates in advance different sets of related items Every hit to recommender gets from the DB the corresponding set of items Benefits Straightforward to code The compute intensive task is out of critical path, is asynchronous Disadvantages One customer on their peak could surpass the capacity of the MySQL instance The same customer on their valley could be wasting money on an idle instance Our webserver could be overloaded with the sum of all our customers We have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database
30. Case Option C Every hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&… We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related items Every hit to recommender gets from the DB (sharded by customer) the corresponding set of items Benefits Straightforward to code, only had to move and parse files A surge on pageviews don’t bring down the database for writes The compute intensive task is out of critical path, it’s asynchronous Disadvantages One customer on their peak could surpass the capacity of the MySQL instance The same customer on their valley could be wasting money on an idle instance We have to control what are doing our cron jobs and check for errors and tune them so they don’t bring down the database We could hit bandwidth limits
31. Case Option D Every hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&… We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related items Every hit to recommender gets from the DB the corresponding set of items Went the Hadoop/Hbase way, no more sharding Benefits Easy to add and remove Data servers on demand so no wasting/limits here A surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronous Disadvantages Beta software, poor documentation/examples We have more complexity at our infrastructure We could hit bandwidth limits
32. Case: Map/Reduce Hadoop: It’s “only” a Framework for running Map/Reduce Applications on large clusters. Allows replication and Fault tolerance, as HW failure will be the norm, using a distributed file system, HDFS Map/Reduce: In a map/reduce application, there are two kinds of jobs, Map and Reduce. Mappers read the HDFS blocks and does local processing and run in parallel. From a webserver log file <url,#hits> Reducers get the output of many mappers and consolidate data. If there was a mapper per day, reducer could calculate how many monthly hits get an URL Hbase: Hadoop/MR design gets better throughput than latency so it’s used as analytical platform, but Hbase allow low latency random access to very big tables (billions of rows per millions of columns) Column oriented DB: Table->Row->ColumnFamily->Timestamp=>Value
33. Case Option D Every hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&… We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related items Every hit to recommender gets from the DB the corresponding set of items Went the Hadoop/Hbase way, no more sharding Benefits Easy to add and remove Data servers on demand so no wasting/limits here A surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronous Disadvantages Beta software, poor documentation/examples We have more complexity at our infrastructure We could hit bandwidth limits
34. Case Option E Every hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&… We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related items Every hit to recommender gets from the DB the corresponding set of items Went the Hadoop/Hbase way, no more sharding All static files served by a CDN Benefits Easy to add and remove Data servers on demand so no wasting/limits here A surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronous Unlimited bandwidth Disadvantages Beta software, poor documentation/examples We have more complexity at our infrastructure
35. Case: CDN What’s a Content Delivery Network? Your server or http repository (Amazon S3,..) is the Origin of the content They give you a DNS name (bb.cdn.net) and you have to create a CNAME to this name (www.example.com -> bb.cdn.net.) When a user asks for www.example.com, the CDN will chose which of their nodes is the nearest to the user and give it/they IP addresses The user asks for a content (/a.gif) to the node of the CDN, that will check if it has a fresh copy that will send or if it’s a MISS will check with they upstream caches till your Origin So we get unlimited bandwidth and better latency (we can’t surpass the speed of light)
36. Case Option E Every hit to tracker is only a static image file with various parameters /a.gif?b=1&c=2&… We have a cron job that gets the log files from the webservers and database stored items and recalculates in advance different sets of related items Every hit to recommender gets from the DB the corresponding set of items Went the Hadoop/Hbase way, no more sharding All static files served by a CDN Benefits Easy to add and remove Data servers on demand so no wasting/limits here A surge on page views only costs money, as we get paid per page view, it’s ok The compute intensive task is out of critical path, it’s asynchronous Unlimited bandwidth Disadvantages Beta software, poor documentation/examples We have more complexity at our infrastructure
37. Case They get a completely scalable infrastructure at AWS Can provision a new Cruncher, Datastore or Recommender in a matter of minutes and remove it as soon as needed They don’t have any upper limit of how many request could serve All the requests that can impact on the User Experience of the customers of theirs are served by a CDN As there are only 3 kinds of servers and are managed as images, don’t need so much engineers to take care of the infrastructure
38. 6.1. Web Scale 34330 EEDC Execution Environments for Distributed Computing 6.1.1. Anatomy of a service 6.1.2. Too many Writes to Database 6.1.3. Cheaper peaks 6.1.4. Facebook Platform Master in Computer Architecture, Networks and Systems - CANS
39. Facebook Platform If your primary data source is not under your control and it’s too far, what happens? An API case
40. Case DuplicatedGifts
41. Case Lovingit More «Pongos» Hittingthebullseye?
42. Case It’s a social wish list application When you access checks if your friends have enabled the application and shows their wish lists You can share your wish lists on Facebook You can capture wishes (gifts) and be shown a feed of possible merchants Initial loading time is critical Expect virality so we won’t have too much response time
43. Case Flow
44. Case Nicebut Slow. 3 to 7 secondsto load
45. Case Define goals Define metrics Analizemetrics Improveone at time
46. Case: Goals Time to load < 1 second Everythingworks
47. Case: Metrics Time tosessionsetup Validatingto Facebook GettingFriendsInformation Lookupsto local Database (lists, items, captureditems) Time to load «home» page Get HTML Getwidgets GetJavascripts Getvariousgraphicassets
48. Case: Analyzing Metrics Time to session setup Validating to Facebook (300 ms) Getting Friends Information (3 sec) Lookups to local Database (lists, items, captured items) (30 ms) Time to load «home» page Get HTML (400 ms) Get widgets (300 ms) Get Javascripts (300 ms) Get various graphic assets (500 ms)
49. Case: Facebook access From To From 3 seconds to 500 ms!
50. Case: Facebook access In ASP.net we “only” have 12 threads/CPU -> Only 12 concurrent requests. From 4 users/sec to 24/sec We could use asynchronous calls but: Low parallelism, if we don’t know the GetAppUsers, we can’t ask for GetUserInfo, so no speedup We could increase the default #threads to another number (.NET 4.0 defaults at 5000/CPU) We can get fail resiliency adjusting timeouts and increasing threads, connections, and so on
51. Case: Leveraging “free” tools Set future Expires on static files Users leverage their browser’s cache and are lighter at server’s side Use “free” CDN to get Jquery et Al. Microsoft and Google provide a public and free repository of Javascript tools Use CSS sprites Although graphic files are small, they need a TCP connection to retrieve. Combining most graphic assets in a big file and use CSS to select which one to show #nav li a {background-image:url('../img/image_nav.gif')} #nav li a.item1 {background-position:0px 0px} #nav li a:hover.item1 {background-position:0px -72px}
52. Case: more on Sprites Avg size 2KB/file HTTP/1.1 (rfc 2616) suggests that browsers download no more than 2 components in parallel per hostname Small files doesn’t use all available bandwidth. TCP Slow Start… Latency also plays an important role
53. Aboutthissession Sergi Morales, Founder & CTO of Expertos en TIPhone: +34 6688-XPNTIEmail : sergi.morales+eedc@expertosenti.comBlog : http://blog.expertosenti.com Web: http://www.expertosenti.com Expertos en TI: We help Internet oriented projects to leverage all the research done by the big sites (Flickr, Facebook, Twitter, Salesforce, Google, and so on) so they can improve their bottom line and be prepared for growth
54. About the EEDC course 34330 Execution Environments for Distributed Computing (EEDC), Master in Computer Architecture, Networks and Systems (CANS) Computer Architectura Department (AC) Universitat Politècnica de Catalunya – Barcelona Tech (UPC) ECTS credits: 6 INSTRUCTOR Professor Jordi TorresPhone: +34 93 401 7223 Email : torres@ac.upc.eduOffice : Campus Nord, Modul C6. Room 217. Web: http://www.JordiTorres.org
55. 34330 EEDC Execution Environments for Distributed Computing Sergi Morales Founder & CTO T: 668897684 E: sergi.morales@expertosenti.com L: www.linkedin.com/in/sergimorales Master in Computer Architecture, Networks and Systems - CANS
56. Case Asynchronous access to Facebook API server Expect to fail Tables with so many rows, a key/value approach Consistent hashing to loadbalance data Sticky servers?