This document discusses patterns for parallel computing. It outlines key concepts like Amdahl's law and types of parallelism like data and task parallelism. Examples are provided of how major tech companies like Microsoft, Google, Amazon implement parallelism at different levels of their infrastructure and applications to scale efficiently. Design principles are discussed for converting sequential programs to parallel programs while maintaining performance.
10. >Patterns > Multi-threading Multi-threading Typically, functional decomposition into individual threads But, explicit concurrent programming brings complexities Managing threads, semaphores, monitors, dead-locks, race conditions, mutual exclusion, synchronization, etc. Moving towards implicit parallelism Integrating concurrency & coordination into mainstream programming languages Developing tools to ease development Encapsulating parallelism in reusable components Raising the semantic level: new approaches
11. >Patterns > Multi-threading > Example Photobucket Web Browser 2007 stats: +30M searches processed / day 25M UU/month in US, +46M worldwide +7B images uploaded +300K unique websites link to content #31 top 50 sites in US #41 top 100 sites worldwide 18th largest ad supported site in US Thumbs Images Albums Groups Content Pods Content Pods Content Pods Content Pods API Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods PIC Scaling the performance: Browser handles concurrency Centralized lookup Horizontal partitioning of distributed content Metadata Membership
12. >Patterns > Data Parallelism Data Parallelism Loop-level parallelism Focuses on distributing the data across different parallel computing nodes Denormalization, sharding, horizontal partitioning, etc. Each processor performs the same task on different pieces of distributed data Emphasizes the distributed (parallelized) nature of the data Ideal for data that is read more than written (scale vs. consistency)
13. >Patterns > Data Parallelism Parallelizing Data in Distributed Architecture Browser Browser Browser Web/App Server Web/App Server Web/App Server Web/App Server Web/App Server A-Z A-M N-Z H-M N-S A-G T-Z Index
14. >Patterns > Data Parallelism > Example Flickr 2007 stats: Serve 40,000 photos / second Handle 100,000 cache operations / second Process 130,000 database queries / second Scaling the “read” data: Data denormalization Database replication and federation Vertical partitioning Central cluster for index lookups Large data sets horizontally partitioned as shards Grow by binary hashing of user buckets
15. >Patterns > Data Parallelism > Example MySpace 2007 stats: 115B pageviews/month 5M concurrent users @ peak +3B images, mp3, videos +10M new images/day 160 Gbit/sec peak bandwidth Scaling the “write” data: MyCache: distributed dynamic memory cache MyRelay: inter-node messaging transport handling +100K req/sec, directs reads/writes to any node MySpace Distributed File System: geographically redundant distributed storage providing massive concurrent access to images, mp3, videos, etc. MySpace Distributed Transaction Manager: broker for all non-transient writes to databases/SAN, multi-phase commit across data centers
16. >Patterns > Data Parallelism > Example Facebook 2009 stats: +200B pageviews/month >3.9T feed actions/day +300M active users >1B chat mesgs/day 100M search queries/day >6B minutes spent/day (ranked #2 on Internet) +20B photos, +2B/month growth 600,000 photos served / sec 25TB log data / day processed thru Scribe 120M queries /sec on memcache Scaling the “relational” data: Keeps data normalized, randomly distributed, accessed at high volumes Uses “shared nothing” architecture
17. >Patterns > Task Parallelism Task Parallelism Functional parallelism Focuses on distributing execution processes (threads) across different parallel computing nodes Each processor executes a different thread (or process) on the same or different data Communication takes place usually to pass data from one thread to the next as part of a workflow Emphasizes the distributed (parallelized) nature of the processing (i.e. threads) Need to design how to compose partial output from concurrent processes
18. >Patterns > Task Parallelism > Example Google 2007 stats: +20 petabytes of data processed / day by +100K MapReduce jobs 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage ~40 GB/sec aggregate read/write throughput across the cluster +500 servers for each search query < 500ms Scaling the process: MapReduce: parallel processing framework BigTable: structured hash database Google File System: massively scalable distributed storage
20. > Design Principles Parallelism for Scale-out Sequential Parallel Convert sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment Over-decompose for scaling Structured multi-threading with a data focus Relax sequential order to gain more parallelism Ensure atomicity of unordered interactions Consider data as well as control flow Careful data structure & locking choices to manage contention User parallel data structures Minimize shared data and synchronization Continuous optimization
21. >Design Principles > Example Amazon Principles for Scalable Service Design (Werner Vogels, CTO, Amazon) Autonomy Asynchrony Controlled concurrency Controlled parallelism Decentralize Decompose into small well-understood building blocks Failure tolerant Local responsibility Recovery built-in Simplicity Symmetry
22. > Microsoft Platform Parallel computing on the Microsoft platform Concurrent Programming (.NET 4.0 Parallel APIs) Distributed Computing (CCR & DSS Runtime, Dryad) Cloud Computing (Azure Services Platform) Grid Computing (Windows HPC Server 2008) Massive Data Processing (SQL Server “Madison”) Components spanning a spectrum of computing models
24. > Microsoft Platform > Distributed Computing CCR & DSS Toolkit Concurrency & Coordination Runtime Decentralized Software Services Supporting multi-core and concurrent applications by facilitating asynchronous operations Dealing with concurrency, exploiting parallel hardware and handling partial failure Supporting robust, distributed applications based on a light-weight state-driven service model Providing service composition, event notification, and data isolation
25. > Microsoft Platform > Distributed Computing Dryad General-purpose execution environment for distributed, data-parallel applications Automated management of resources, scheduling, distribution, monitoring, fault tolerance, accounting, etc. Concurrency and mutual exclusion semantics transparency Higher-level and domain-specific language support
26. > Microsoft Platform > Cloud Computing Azure Services Platform Internet-scale, highly available cloud fabric Auto-provisioning 64-bit compute nodes on Windows Server VMs Massively scalable distributed storage (table, blob, queue) Massively scalable and highly consistent relational database
27. > Microsoft Platform > Grid Computing Windows HPC Server #10 fastest supercomputer in the world (top500.org) 30,720 cores 180.6 teraflops 77.5% efficiency Image multicasting-based parallel deployment of cluster nodes Fault tolerance with failover clustering of head node Policy-driven, NUMA-aware, multicore-aware, job scheduler Inter-process distributed communication via MS-MPI
28. > Microsoft Platform > Massive Data Processing SQL Server “Madison” Massively parallel processing (MPP) architecture +500TB to PB’s databases “Ultra Shared Nothing” design IO and CPU affinity within symmetric multi-processing (SMP) nodes Multiple physical instances of tables w/ dynamic re-distribution Distribute / partition large tables across multiple nodes Replicate small tables Replicate + distribute medium tables
29. > Resources For More Information Architect Council Website (blogs.msdn.com/sac) This series (blogs.msdn.com/sac/pages/council-2009q2.aspx) .NET 4.0 Parallel APIs (msdn.com/concurrency) CCR & DSS Toolkit (microsoft.com/ccrdss) Dryad (research.microsoft.com/dryad) Azure Services Platform (azure.com) SQL Server “Madison” (microsoft.com/madison) Windows HPC Server 2008 (microsoft.com/hpc)