SlideShare une entreprise Scribd logo
1  sur  30
Patterns for Parallel Computing David Chou david.chou@microsoft.com blogs.msdn.com/dachou
> Outline An architectural conversation Concepts Patterns Design Principles Microsoft Platform
> Concepts Why is this interesting? Amdahl’s law (1967) Multi-core processors Virtualization High-performance computing Distributed architecture Web–scale applications Cloud computing  Paradigm shift!
> Concepts Parallel Computing == ?? Simultaneous multi-threading (Intel HyperThreading, IBM Cell microprocessor for PS3, etc.) Operating system multitasking (cooperative, preemptive; symmetric multi-processing, etc.) Server load-balancing & clustering(Oracle RAC, Windows HPC Server, etc.) Grid computing (SETI@home, Sun Grid, DataSynapse, DigiPede, etc.) Asynchronous programming (AJAX, JMS, MQ, event-driven, etc.) Multi-threaded & concurrent programming (java.lang.Thread, System.Thread, Click, LabVIEW, etc.) Massively parallel processing (MapReduce, Hadoop, Dryad, etc.)  Elements and best practices in all of these
> Patterns Types of Parallelism Bit-level parallelism (microprocessors) Instruction-level parallelism (compilers) Multiprocessing, multi-tasking (operating systems) HPC, clustering (servers) Multi-threading (application code) Data parallelism (massive distributed databases) Task parallelism(concurrent distributed processing)  Focus is moving “up” the technology stack…
>Patterns > HPC, Clustering Clustering Infrastructure for High Availability
>Patterns > HPC, Clustering High-Performance Computing Browser Browser Web/App Server Web/App Server A-Z A-Z
>Patterns > HPC, Clustering > Example Microsoft.com Infrastructure and Application Footprint 7 Internet data centers & 3 CDN partnerships 120+ Websites, 1000’s apps and 2500 databases  20-30+ Gbits/sec Web traffic; 500+ Gbits/sec download traffic 2007 stats (microsoft.com):  #9 ranked domain in U.S; 54.0M UU for 36.0% reach #5 site worldwide; reaching 287.3M UU 15K req/sec, 35K concurrent connections on 80 servers 600 vroots, 350 IIS Web apps & 12 app pools Windows Server 2008, SQL Server 2008, IIS7, ASP.NET 3.5 2007 stats (Windows Update): 350M UScans/day, 60K ASP.NET req/sec, 1.5M concurrent connections 50B downloads for CY 2006 Update Egress – MS, Akamai, Level3 & Limelight (50-500+ Gbits/sec)
>Patterns > Multi-threading Multi-threaded programming Sequential Concurrent Execution Time Execution Time
>Patterns > Multi-threading Multi-threading Typically, functional decomposition into individual threads But, explicit concurrent programming brings complexities Managing threads, semaphores, monitors, dead-locks, race conditions, mutual exclusion, synchronization, etc. Moving towards implicit parallelism Integrating concurrency & coordination into mainstream programming languages Developing tools to ease development Encapsulating parallelism in reusable components  Raising the semantic level: new approaches
>Patterns > Multi-threading > Example Photobucket Web Browser 2007 stats: +30M searches processed / day 25M UU/month in US, +46M  worldwide +7B images uploaded +300K unique websites link to content #31 top 50 sites in US #41 top 100 sites worldwide 18th largest ad supported site in US Thumbs Images Albums Groups Content Pods Content Pods Content Pods Content Pods API Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods PIC Scaling the performance: Browser handles concurrency Centralized lookup Horizontal partitioning of distributed content Metadata Membership
>Patterns > Data Parallelism Data Parallelism Loop-level parallelism Focuses on distributing the data across different parallel computing nodes Denormalization, sharding, horizontal partitioning, etc. Each processor performs the same task on different pieces of distributed data Emphasizes the distributed (parallelized) nature of the data Ideal for data that is read more than written (scale vs. consistency)
>Patterns > Data Parallelism Parallelizing Data in Distributed Architecture Browser Browser Browser Web/App Server Web/App Server Web/App Server Web/App Server Web/App Server A-Z A-M N-Z H-M N-S A-G T-Z Index
>Patterns > Data Parallelism > Example Flickr 2007 stats: Serve 40,000 photos / second Handle 100,000 cache operations / second Process 130,000 database queries / second Scaling the “read” data: Data denormalization Database replication and  federation Vertical partitioning Central cluster for index lookups Large data sets horizontally partitioned as shards Grow by binary hashing of user buckets
>Patterns > Data Parallelism > Example MySpace 2007 stats: 115B pageviews/month 5M concurrent users @ peak +3B images, mp3, videos +10M new images/day 160 Gbit/sec peak bandwidth Scaling the “write” data: MyCache: distributed dynamic memory cache MyRelay: inter-node messaging transport handling +100K req/sec, directs reads/writes to any node MySpace Distributed File System: geographically redundant distributed storage providing massive concurrent access to images, mp3, videos, etc. MySpace Distributed Transaction Manager: broker for all non-transient writes to databases/SAN, multi-phase commit across data centers
>Patterns > Data Parallelism > Example Facebook 2009 stats: +200B pageviews/month >3.9T feed actions/day +300M active users >1B chat mesgs/day 100M search queries/day >6B minutes spent/day (ranked #2 on Internet) +20B photos, +2B/month growth 600,000 photos served / sec 25TB log data / day processed thru Scribe 120M queries /sec on memcache Scaling the “relational” data: Keeps data normalized, randomly distributed, accessed at high volumes Uses “shared nothing” architecture
>Patterns > Task Parallelism Task Parallelism Functional parallelism Focuses on distributing execution processes (threads) across different parallel computing nodes Each processor executes a different thread (or process) on the same or different data Communication takes place usually to pass data from one thread to the next as part of a workflow Emphasizes the distributed (parallelized) nature of the processing (i.e. threads) Need to design how to compose partial output from concurrent processes
>Patterns > Task Parallelism > Example Google 2007 stats: +20 petabytes of data processed / day by +100K MapReduce jobs  1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage ~40 GB/sec aggregate read/write throughput across the cluster +500 servers for each search query < 500ms Scaling the process: MapReduce: parallel processing framework BigTable: structured hash database Google File System: massively scalable distributed storage
> Design Principles Parallelism for Speedup Amdahl’s law (1967): 11 −P+ PN Amdahl’s speedup: Max.Speedup≤ p1+f∗(p−1) Gustafson’s law (1988): SP=P − 𝛼 ∙P−1 Gustafson’s speedup: S=an+p∙(1−an) Karp-Flatt metric (1990): e=1𝜑−1p1−1p Speedup: Sp=T1Tp Efficiency: Ep=Spp=T1pTp  
> Design Principles Parallelism for Scale-out Sequential  Parallel Convert sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment Over-decompose for scaling Structured multi-threading with a data focus  Relax sequential order to gain more parallelism Ensure atomicity of unordered interactions  Consider data as well as control flow Careful data structure & locking choices to manage contention User parallel data structures Minimize shared data and synchronization Continuous optimization
>Design Principles > Example Amazon Principles for Scalable Service Design (Werner Vogels, CTO, Amazon) Autonomy Asynchrony Controlled concurrency Controlled parallelism Decentralize Decompose into small well-understood building blocks Failure tolerant Local responsibility Recovery built-in Simplicity Symmetry
> Microsoft Platform Parallel computing on the Microsoft platform Concurrent Programming (.NET 4.0 Parallel APIs) Distributed Computing (CCR & DSS Runtime, Dryad) Cloud Computing (Azure Services Platform) Grid Computing (Windows HPC Server 2008) Massive Data Processing (SQL Server “Madison”)  Components spanning a spectrum of computing models
> Microsoft Platform > Concurrent Programming .NET 4.0 Parallel APIs Task Parallel Library (TPL) Parallel LINQ (PLINQ) Data Structures Diagnostic Tools
> Microsoft Platform > Distributed Computing CCR & DSS Toolkit Concurrency & Coordination Runtime Decentralized Software Services Supporting multi-core and concurrent applications by facilitating asynchronous operations Dealing with concurrency, exploiting parallel hardware and handling partial failure Supporting robust, distributed applications based on a light-weight state-driven service model Providing service composition, event notification, and data isolation
> Microsoft Platform > Distributed Computing Dryad General-purpose execution environment for distributed, data-parallel applications Automated management of resources, scheduling, distribution, monitoring, fault tolerance, accounting, etc. Concurrency and mutual exclusion semantics transparency Higher-level and domain-specific language support
> Microsoft Platform > Cloud Computing Azure Services Platform Internet-scale, highly available cloud fabric Auto-provisioning 64-bit compute nodes on Windows Server VMs Massively scalable distributed storage (table, blob, queue) Massively scalable and highly consistent relational database
> Microsoft Platform > Grid Computing Windows HPC Server #10 fastest supercomputer in the world (top500.org) 30,720 cores 180.6 teraflops 77.5% efficiency Image multicasting-based parallel deployment of cluster nodes Fault tolerance with failover clustering of head node Policy-driven, NUMA-aware, multicore-aware, job scheduler Inter-process distributed communication via MS-MPI
> Microsoft Platform > Massive Data Processing SQL Server “Madison” Massively parallel processing (MPP) architecture +500TB to PB’s databases “Ultra Shared Nothing” design IO and CPU affinity within symmetric multi-processing (SMP) nodes Multiple physical instances of tables w/ dynamic re-distribution Distribute / partition large tables across multiple nodes Replicate small tables Replicate + distribute medium tables
> Resources For More Information Architect Council Website (blogs.msdn.com/sac) This series (blogs.msdn.com/sac/pages/council-2009q2.aspx) .NET 4.0 Parallel APIs (msdn.com/concurrency) CCR & DSS Toolkit (microsoft.com/ccrdss) Dryad (research.microsoft.com/dryad) Azure Services Platform (azure.com) SQL Server “Madison” (microsoft.com/madison) Windows HPC Server 2008 (microsoft.com/hpc)
Thank you! david.chou@microsoft.com blogs.msdn.com/dachou © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation.  Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.  MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Contenu connexe

Tendances

Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel ComputingJörn Dinkla
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computingMehul Patel
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel ProcessingRTigger
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computingHeman Pathak
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Marcirio Chaves
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingRoshan Karunarathna
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processingPage Maker
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computingVajira Thambawita
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coaBala Vignesh
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programmingShaveta Banda
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processingare you
 
Lecture 1
Lecture 1Lecture 1
Lecture 1Mr SMAK
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in indiaPreeti Chauhan
 

Tendances (20)

Introduction To Parallel Computing
Introduction To Parallel ComputingIntroduction To Parallel Computing
Introduction To Parallel Computing
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Introduction to parallel_computing
Introduction to parallel_computingIntroduction to parallel_computing
Introduction to parallel_computing
 
Parallel Processing
Parallel ProcessingParallel Processing
Parallel Processing
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1Tutorial on Parallel Computing and Message Passing Model - C1
Tutorial on Parallel Computing and Message Passing Model - C1
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Introduction to parallel computing
Introduction to parallel computingIntroduction to parallel computing
Introduction to parallel computing
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Lecture 1 introduction to parallel and distributed computing
Lecture 1   introduction to parallel and distributed computingLecture 1   introduction to parallel and distributed computing
Lecture 1 introduction to parallel and distributed computing
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coa
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
parallel processing
parallel processingparallel processing
parallel processing
 
Parallel architecture-programming
Parallel architecture-programmingParallel architecture-programming
Parallel architecture-programming
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Application of Parallel Processing
Application of Parallel ProcessingApplication of Parallel Processing
Application of Parallel Processing
 
Lecture02 types
Lecture02 typesLecture02 types
Lecture02 types
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Parallel computing in india
Parallel computing in indiaParallel computing in india
Parallel computing in india
 

En vedette

MySpace Data Architecture June 2009
MySpace Data Architecture June 2009MySpace Data Architecture June 2009
MySpace Data Architecture June 2009Mark Ginnebaugh
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingGrigoris Anagnostopoulos
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R Vivian S. Zhang
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with RAbhirup Mallik
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsThilina Gunarathne
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systemsKlika Tech, Inc
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologiesjaliyae
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 

En vedette (13)

MySpace Data Architecture June 2009
MySpace Data Architecture June 2009MySpace Data Architecture June 2009
MySpace Data Architecture June 2009
 
Parallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modelingParallel Computing: Perspectives for more efficient hydrological modeling
Parallel Computing: Perspectives for more efficient hydrological modeling
 
S4 HANA Launch MENA
S4 HANA Launch MENAS4 HANA Launch MENA
S4 HANA Launch MENA
 
R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R R workshop xx -- Parallel Computing with R
R workshop xx -- Parallel Computing with R
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Scalable Parallel Computing on Clouds
Scalable Parallel Computing on CloudsScalable Parallel Computing on Clouds
Scalable Parallel Computing on Clouds
 
CAP theorem and distributed systems
CAP theorem and distributed systemsCAP theorem and distributed systems
CAP theorem and distributed systems
 
High Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud TechnologiesHigh Performance Parallel Computing with Clouds and Cloud Technologies
High Performance Parallel Computing with Clouds and Cloud Technologies
 
Introduction P2p
Introduction P2pIntroduction P2p
Introduction P2p
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 

Similaire à Patterns For Parallel Computing

Architecting Cloudy Applications
Architecting Cloudy ApplicationsArchitecting Cloudy Applications
Architecting Cloudy ApplicationsDavid Chou
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Coursejimliddle
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionAmazon Web Services
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...Niraj Tolia
 
Architecting For The Windows Azure Platform
Architecting For The Windows Azure PlatformArchitecting For The Windows Azure Platform
Architecting For The Windows Azure PlatformDavid Chou
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierDemai Ni
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Cloud designpatterns
Cloud designpatternsCloud designpatterns
Cloud designpatternsVMEngine
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataDebajani Mohanty
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudDavid Chou
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta DataDigikrit
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978David Chou
 

Similaire à Patterns For Parallel Computing (20)

Architecting Cloudy Applications
Architecting Cloudy ApplicationsArchitecting Cloudy Applications
Architecting Cloudy Applications
 
Bigdata
BigdataBigdata
Bigdata
 
BigData
BigDataBigData
BigData
 
L19 Application Architecture
L19 Application ArchitectureL19 Application Architecture
L19 Application Architecture
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
A Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in ActionA Data Culture with Embedded Analytics in Action
A Data Culture with Embedded Analytics in Action
 
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
(Speaker Notes Version) Architecting An Enterprise Storage Platform Using Obj...
 
Architecting For The Windows Azure Platform
Architecting For The Windows Azure PlatformArchitecting For The Windows Azure Platform
Architecting For The Windows Azure Platform
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
Graph Data: a New Data Management Frontier
Graph Data: a New Data Management FrontierGraph Data: a New Data Management Frontier
Graph Data: a New Data Management Frontier
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
Cloud designpatterns
Cloud designpatternsCloud designpatterns
Cloud designpatterns
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
Maginatics @ SDC 2013: Architecting An Enterprise Storage Platform Using Obje...
 
Architecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The CloudArchitecting Solutions Leveraging The Cloud
Architecting Solutions Leveraging The Cloud
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Master Meta Data
Master Meta DataMaster Meta Data
Master Meta Data
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
 

Plus de David Chou

Cloud Native Apps
Cloud Native AppsCloud Native Apps
Cloud Native AppsDavid Chou
 
Windows Phone app development overview
Windows Phone app development overviewWindows Phone app development overview
Windows Phone app development overviewDavid Chou
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform OverviewDavid Chou
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial IntelligenceDavid Chou
 
Immersive Computing
Immersive ComputingImmersive Computing
Immersive ComputingDavid Chou
 
Java on Windows Azure
Java on Windows AzureJava on Windows Azure
Java on Windows AzureDavid Chou
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft AzureDavid Chou
 
Designing Microservices
Designing MicroservicesDesigning Microservices
Designing MicroservicesDavid Chou
 
Combining Private and Public Clouds into Meaningful Hybrids
Combining Private and Public Clouds into Meaningful HybridsCombining Private and Public Clouds into Meaningful Hybrids
Combining Private and Public Clouds into Meaningful HybridsDavid Chou
 
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows AzureCloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows AzureDavid Chou
 
Java on Windows Azure
Java on Windows AzureJava on Windows Azure
Java on Windows AzureDavid Chou
 
Windows Azure AppFabric
Windows Azure AppFabricWindows Azure AppFabric
Windows Azure AppFabricDavid Chou
 
Java on Windows Azure (Cloud Computing Expo 2010)
Java on Windows Azure (Cloud Computing Expo 2010)Java on Windows Azure (Cloud Computing Expo 2010)
Java on Windows Azure (Cloud Computing Expo 2010)David Chou
 
Scale as a Competitive Advantage
Scale as a Competitive AdvantageScale as a Competitive Advantage
Scale as a Competitive AdvantageDavid Chou
 
Kelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingKelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingDavid Chou
 
Windows Phone 7
Windows Phone 7Windows Phone 7
Windows Phone 7David Chou
 
Silverlight 4 Briefing
Silverlight 4 BriefingSilverlight 4 Briefing
Silverlight 4 BriefingDavid Chou
 
SOA And Cloud Computing
SOA And Cloud ComputingSOA And Cloud Computing
SOA And Cloud ComputingDavid Chou
 
Microsoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformMicrosoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformDavid Chou
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database OptionsDavid Chou
 

Plus de David Chou (20)

Cloud Native Apps
Cloud Native AppsCloud Native Apps
Cloud Native Apps
 
Windows Phone app development overview
Windows Phone app development overviewWindows Phone app development overview
Windows Phone app development overview
 
Microsoft AI Platform Overview
Microsoft AI Platform OverviewMicrosoft AI Platform Overview
Microsoft AI Platform Overview
 
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial Intelligence
 
Immersive Computing
Immersive ComputingImmersive Computing
Immersive Computing
 
Java on Windows Azure
Java on Windows AzureJava on Windows Azure
Java on Windows Azure
 
Microsoft Azure
Microsoft AzureMicrosoft Azure
Microsoft Azure
 
Designing Microservices
Designing MicroservicesDesigning Microservices
Designing Microservices
 
Combining Private and Public Clouds into Meaningful Hybrids
Combining Private and Public Clouds into Meaningful HybridsCombining Private and Public Clouds into Meaningful Hybrids
Combining Private and Public Clouds into Meaningful Hybrids
 
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows AzureCloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
 
Java on Windows Azure
Java on Windows AzureJava on Windows Azure
Java on Windows Azure
 
Windows Azure AppFabric
Windows Azure AppFabricWindows Azure AppFabric
Windows Azure AppFabric
 
Java on Windows Azure (Cloud Computing Expo 2010)
Java on Windows Azure (Cloud Computing Expo 2010)Java on Windows Azure (Cloud Computing Expo 2010)
Java on Windows Azure (Cloud Computing Expo 2010)
 
Scale as a Competitive Advantage
Scale as a Competitive AdvantageScale as a Competitive Advantage
Scale as a Competitive Advantage
 
Kelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud ComputingKelley Blue Book and Cloud Computing
Kelley Blue Book and Cloud Computing
 
Windows Phone 7
Windows Phone 7Windows Phone 7
Windows Phone 7
 
Silverlight 4 Briefing
Silverlight 4 BriefingSilverlight 4 Briefing
Silverlight 4 Briefing
 
SOA And Cloud Computing
SOA And Cloud ComputingSOA And Cloud Computing
SOA And Cloud Computing
 
Microsoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure PlatformMicrosoft Cloud Computing - Windows Azure Platform
Microsoft Cloud Computing - Windows Azure Platform
 
Microsoft Database Options
Microsoft Database OptionsMicrosoft Database Options
Microsoft Database Options
 

Dernier

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Patterns For Parallel Computing

  • 1. Patterns for Parallel Computing David Chou david.chou@microsoft.com blogs.msdn.com/dachou
  • 2. > Outline An architectural conversation Concepts Patterns Design Principles Microsoft Platform
  • 3. > Concepts Why is this interesting? Amdahl’s law (1967) Multi-core processors Virtualization High-performance computing Distributed architecture Web–scale applications Cloud computing  Paradigm shift!
  • 4. > Concepts Parallel Computing == ?? Simultaneous multi-threading (Intel HyperThreading, IBM Cell microprocessor for PS3, etc.) Operating system multitasking (cooperative, preemptive; symmetric multi-processing, etc.) Server load-balancing & clustering(Oracle RAC, Windows HPC Server, etc.) Grid computing (SETI@home, Sun Grid, DataSynapse, DigiPede, etc.) Asynchronous programming (AJAX, JMS, MQ, event-driven, etc.) Multi-threaded & concurrent programming (java.lang.Thread, System.Thread, Click, LabVIEW, etc.) Massively parallel processing (MapReduce, Hadoop, Dryad, etc.)  Elements and best practices in all of these
  • 5. > Patterns Types of Parallelism Bit-level parallelism (microprocessors) Instruction-level parallelism (compilers) Multiprocessing, multi-tasking (operating systems) HPC, clustering (servers) Multi-threading (application code) Data parallelism (massive distributed databases) Task parallelism(concurrent distributed processing)  Focus is moving “up” the technology stack…
  • 6. >Patterns > HPC, Clustering Clustering Infrastructure for High Availability
  • 7. >Patterns > HPC, Clustering High-Performance Computing Browser Browser Web/App Server Web/App Server A-Z A-Z
  • 8. >Patterns > HPC, Clustering > Example Microsoft.com Infrastructure and Application Footprint 7 Internet data centers & 3 CDN partnerships 120+ Websites, 1000’s apps and 2500 databases 20-30+ Gbits/sec Web traffic; 500+ Gbits/sec download traffic 2007 stats (microsoft.com): #9 ranked domain in U.S; 54.0M UU for 36.0% reach #5 site worldwide; reaching 287.3M UU 15K req/sec, 35K concurrent connections on 80 servers 600 vroots, 350 IIS Web apps & 12 app pools Windows Server 2008, SQL Server 2008, IIS7, ASP.NET 3.5 2007 stats (Windows Update): 350M UScans/day, 60K ASP.NET req/sec, 1.5M concurrent connections 50B downloads for CY 2006 Update Egress – MS, Akamai, Level3 & Limelight (50-500+ Gbits/sec)
  • 9. >Patterns > Multi-threading Multi-threaded programming Sequential Concurrent Execution Time Execution Time
  • 10. >Patterns > Multi-threading Multi-threading Typically, functional decomposition into individual threads But, explicit concurrent programming brings complexities Managing threads, semaphores, monitors, dead-locks, race conditions, mutual exclusion, synchronization, etc. Moving towards implicit parallelism Integrating concurrency & coordination into mainstream programming languages Developing tools to ease development Encapsulating parallelism in reusable components Raising the semantic level: new approaches
  • 11. >Patterns > Multi-threading > Example Photobucket Web Browser 2007 stats: +30M searches processed / day 25M UU/month in US, +46M worldwide +7B images uploaded +300K unique websites link to content #31 top 50 sites in US #41 top 100 sites worldwide 18th largest ad supported site in US Thumbs Images Albums Groups Content Pods Content Pods Content Pods Content Pods API Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods Content Pods PIC Scaling the performance: Browser handles concurrency Centralized lookup Horizontal partitioning of distributed content Metadata Membership
  • 12. >Patterns > Data Parallelism Data Parallelism Loop-level parallelism Focuses on distributing the data across different parallel computing nodes Denormalization, sharding, horizontal partitioning, etc. Each processor performs the same task on different pieces of distributed data Emphasizes the distributed (parallelized) nature of the data Ideal for data that is read more than written (scale vs. consistency)
  • 13. >Patterns > Data Parallelism Parallelizing Data in Distributed Architecture Browser Browser Browser Web/App Server Web/App Server Web/App Server Web/App Server Web/App Server A-Z A-M N-Z H-M N-S A-G T-Z Index
  • 14. >Patterns > Data Parallelism > Example Flickr 2007 stats: Serve 40,000 photos / second Handle 100,000 cache operations / second Process 130,000 database queries / second Scaling the “read” data: Data denormalization Database replication and federation Vertical partitioning Central cluster for index lookups Large data sets horizontally partitioned as shards Grow by binary hashing of user buckets
  • 15. >Patterns > Data Parallelism > Example MySpace 2007 stats: 115B pageviews/month 5M concurrent users @ peak +3B images, mp3, videos +10M new images/day 160 Gbit/sec peak bandwidth Scaling the “write” data: MyCache: distributed dynamic memory cache MyRelay: inter-node messaging transport handling +100K req/sec, directs reads/writes to any node MySpace Distributed File System: geographically redundant distributed storage providing massive concurrent access to images, mp3, videos, etc. MySpace Distributed Transaction Manager: broker for all non-transient writes to databases/SAN, multi-phase commit across data centers
  • 16. >Patterns > Data Parallelism > Example Facebook 2009 stats: +200B pageviews/month >3.9T feed actions/day +300M active users >1B chat mesgs/day 100M search queries/day >6B minutes spent/day (ranked #2 on Internet) +20B photos, +2B/month growth 600,000 photos served / sec 25TB log data / day processed thru Scribe 120M queries /sec on memcache Scaling the “relational” data: Keeps data normalized, randomly distributed, accessed at high volumes Uses “shared nothing” architecture
  • 17. >Patterns > Task Parallelism Task Parallelism Functional parallelism Focuses on distributing execution processes (threads) across different parallel computing nodes Each processor executes a different thread (or process) on the same or different data Communication takes place usually to pass data from one thread to the next as part of a workflow Emphasizes the distributed (parallelized) nature of the processing (i.e. threads) Need to design how to compose partial output from concurrent processes
  • 18. >Patterns > Task Parallelism > Example Google 2007 stats: +20 petabytes of data processed / day by +100K MapReduce jobs 1 petabyte sort took ~6 hours on ~4K servers replicated onto ~48K disks +200 GFS clusters, each at 1-5K nodes, handling +5 petabytes of storage ~40 GB/sec aggregate read/write throughput across the cluster +500 servers for each search query < 500ms Scaling the process: MapReduce: parallel processing framework BigTable: structured hash database Google File System: massively scalable distributed storage
  • 19. > Design Principles Parallelism for Speedup Amdahl’s law (1967): 11 −P+ PN Amdahl’s speedup: Max.Speedup≤ p1+f∗(p−1) Gustafson’s law (1988): SP=P − 𝛼 ∙P−1 Gustafson’s speedup: S=an+p∙(1−an) Karp-Flatt metric (1990): e=1𝜑−1p1−1p Speedup: Sp=T1Tp Efficiency: Ep=Spp=T1pTp  
  • 20. > Design Principles Parallelism for Scale-out Sequential  Parallel Convert sequential and/or single-machine program into a form in which it can be executed in a concurrent, potentially distributed environment Over-decompose for scaling Structured multi-threading with a data focus Relax sequential order to gain more parallelism Ensure atomicity of unordered interactions Consider data as well as control flow Careful data structure & locking choices to manage contention User parallel data structures Minimize shared data and synchronization Continuous optimization
  • 21. >Design Principles > Example Amazon Principles for Scalable Service Design (Werner Vogels, CTO, Amazon) Autonomy Asynchrony Controlled concurrency Controlled parallelism Decentralize Decompose into small well-understood building blocks Failure tolerant Local responsibility Recovery built-in Simplicity Symmetry
  • 22. > Microsoft Platform Parallel computing on the Microsoft platform Concurrent Programming (.NET 4.0 Parallel APIs) Distributed Computing (CCR & DSS Runtime, Dryad) Cloud Computing (Azure Services Platform) Grid Computing (Windows HPC Server 2008) Massive Data Processing (SQL Server “Madison”)  Components spanning a spectrum of computing models
  • 23. > Microsoft Platform > Concurrent Programming .NET 4.0 Parallel APIs Task Parallel Library (TPL) Parallel LINQ (PLINQ) Data Structures Diagnostic Tools
  • 24. > Microsoft Platform > Distributed Computing CCR & DSS Toolkit Concurrency & Coordination Runtime Decentralized Software Services Supporting multi-core and concurrent applications by facilitating asynchronous operations Dealing with concurrency, exploiting parallel hardware and handling partial failure Supporting robust, distributed applications based on a light-weight state-driven service model Providing service composition, event notification, and data isolation
  • 25. > Microsoft Platform > Distributed Computing Dryad General-purpose execution environment for distributed, data-parallel applications Automated management of resources, scheduling, distribution, monitoring, fault tolerance, accounting, etc. Concurrency and mutual exclusion semantics transparency Higher-level and domain-specific language support
  • 26. > Microsoft Platform > Cloud Computing Azure Services Platform Internet-scale, highly available cloud fabric Auto-provisioning 64-bit compute nodes on Windows Server VMs Massively scalable distributed storage (table, blob, queue) Massively scalable and highly consistent relational database
  • 27. > Microsoft Platform > Grid Computing Windows HPC Server #10 fastest supercomputer in the world (top500.org) 30,720 cores 180.6 teraflops 77.5% efficiency Image multicasting-based parallel deployment of cluster nodes Fault tolerance with failover clustering of head node Policy-driven, NUMA-aware, multicore-aware, job scheduler Inter-process distributed communication via MS-MPI
  • 28. > Microsoft Platform > Massive Data Processing SQL Server “Madison” Massively parallel processing (MPP) architecture +500TB to PB’s databases “Ultra Shared Nothing” design IO and CPU affinity within symmetric multi-processing (SMP) nodes Multiple physical instances of tables w/ dynamic re-distribution Distribute / partition large tables across multiple nodes Replicate small tables Replicate + distribute medium tables
  • 29. > Resources For More Information Architect Council Website (blogs.msdn.com/sac) This series (blogs.msdn.com/sac/pages/council-2009q2.aspx) .NET 4.0 Parallel APIs (msdn.com/concurrency) CCR & DSS Toolkit (microsoft.com/ccrdss) Dryad (research.microsoft.com/dryad) Azure Services Platform (azure.com) SQL Server “Madison” (microsoft.com/madison) Windows HPC Server 2008 (microsoft.com/hpc)
  • 30. Thank you! david.chou@microsoft.com blogs.msdn.com/dachou © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Notes de l'éditeur

  1. SETI@Home states:StatToday; Change (Last 24 hours) Teams55,848; 12    Active   15,817;   4 Users977,698; 291    Active   148,334;   -65 Hosts2.34e+6; 930    Active   238,234 ;  -256 Total Credit4.89e+10; 4.97e+7 Recent Average6.31e+7; -1,352,173 Total FLOPs 4.221e+22; 4.298e+19
  2. Source: Cal Henderson, Chief Architect, Flickr
  3. Source: Cal Henderson, Chief Architect, Flickr
  4. Source: Aber Whitcomb, Co-Founder and CTO, MySpace; Jim Benedetto, SVP Technical Operations, MySpace
  5. Source: John Rothschild, VP of Technology, Facebook
  6. Source: Jeffrey Dean and Sanjay Ghemawat, Google
  7. Source: WernerVogels, CTO, Amazon
  8. Deployed at MySpace for messaging infrastructure
  9. Deployed in AdCenter for massivelog processing