SlideShare une entreprise Scribd logo
1  sur  33
Visual Studio 2010 Using the Parallel Computing Platform Phil Pennington philpenn@microsoft.com
Agenda 2 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
First, An ExampleMonte Carlo Approximation of Pi S = 4*r*r  C = Pi*r*r Pi = 4*(C/S) For each Point (P), d(P) = SQRT((x * x) + (y * y)) if (d < r) thenP(x,y) is in C
Windows and Maximum Processors Before Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word size LP state (e.g. idle, affinity) represented in word-sized bitmask 32-bit Windows: 32 LPs 64-bit Windows: 64 LPs 32-bit Idle Processor Mask 31 0 16 Busy Idle
Processor GroupsNew with Windows7 and Windows Server R2 5 GROUP NUMA NODE Socket Socket Core Core LP LP LP LP Core Core NUMA NODE
Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s  6 Group Group NUMA Node NUMA Node Socket Socket Socket Socket NUMA Node NUMA Node Socket Socket Socket Socket Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP
Many-Core Topology APIs Discovery 7
Many-Core Topology APIs Resource Localization 8
Many-Core Topology APIs Memory Management 9
User Mode SchedulingArchitectural Perspective UMS Scheduler’s Ready List Your Scheduler Logic Wait Reason: Yield Reason: Yield Reason: Blocked Reason: Created CPU 1 CPU 2 UMS Completion List W1 W2 W3 W4 S1 S2 Application Kernel Blocked Worker Threads Scheduler Threads
Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking Affects Tasks are run by worker threads, which the scheduler controls Dead Zone WT0 WT1 WT2 WT3 Without UMS (signal-and-wait) WT0 WT1 WT2 WT3 With UMS (UMS yield)
Load-Balancing, Work Stealing Scheduler DynamicScheduling Static Scheduling CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 Dynamic scheduling improves performance by distributing work efficiently at runtime.
Demos The Platform - Topology - Schedulers
Agenda 14 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
Visual Studio 2010, .NET Developer Tools, Programming Models, Runtimes Tools Programming Models – Structured Parallelism Parallel LINQ (PLINQ) Task ParallelLibrary (TPL) Debugger  Data Structures .NET Parallel Extensions Profiler Task Scheduler Resource Manager .NET Runtime Threads Pools Managed Library Tools
Thread-Pool Scheduler in .NET 4.0 Thread 1 Dispatch Loop Thread 2 Dispatch Loop Thread N Dispatch Loop Enqueue Dequeue T1 T2 T3 T4 Global Queue (FIFO) Dequeue Enqueue T5 Global Q is shared by legacy ThreadPool API and TPL Local work queues and work stealing scheduler (TPL only) T6 T7 T8 Steal Steal Steal Thread 1 Local Queue (LIFO) Thread 2 Local Queue (LIFO) Thread N Local Queue (LIFO)
Task Parallel Library (TPL)Tasks Concepts Common Functionality: waiting, cancellation, continuations, parent/child relationships
Primitives and Structures Thread-safe, scalable collections IProducerConsumerCollection<T> ConcurrentQueue<T> ConcurrentStack<T> ConcurrentBag<T> ConcurrentDictionary<TKey,TValue> Phases and work exchange Barrier  BlockingCollection<T> CountdownEvent Partitioning {Orderable}Partitioner<T> Partitioner.Create Exception handling AggregateException Initialization Lazy<T> LazyInitializer.EnsureInitialized<T> ThreadLocal<T> Locks ManualResetEventSlim SemaphoreSlim SpinLock SpinWait Cancellation CancellationToken{Source}
Parallel Debugging Two new debugger toolwindows Support both native and managed “Parallel Tasks” “Parallel Stacks”
Parallel Tasks ,[object Object]
Where are my tasks running (location, call stack)?
Which tasks are blocked?
How many tasks are waiting to run?,[object Object]
Task-specific view (Task status)
Easy navigation to any executing method
Rich UI (zooming, panning, bird’s eye view, flagging, tooltips)Bird’s eye view
Parallel Profiling
CPU Utilization Other processes Number of cores Idle time Your Process
Threads Measure time for interesting segments Hide uninteresting threads Zoom in and out Detailed thread analysis (one channel per thread) Active Legend Usage Hints Call Stacks
Cores Each logical core in a swim lane One color per thread Migration visualization Cross-core migration details
Demo Libraries Languages Debuggers Profilers
Agenda 27 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
Thinking Parallel - “Task” vs. “Data” Parallelism Task Parallelism Parallel.Invoke( 		() =>	{ Console.WriteLine("Begin first task...");  },         	() =>	{ Console.WriteLine("Begin second task..."); },  		() =>	{ Console.WriteLine("Begin third task...");  } );  Data Parallelism IEnumerable<int> numbers = Enumerable.Range(2, 100-3); varmyQuery =  		from n in numbers.AsParallel() 		where Enumerable.Range(2,  (int)Math.Sqrt(n)).All(i => n % i > 0) 		select n; int[] primes = myQuery.ToArray();

Contenu connexe

Tendances

Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
oscon2007
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
Sri Ambati
 

Tendances (19)

Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Tensorflow windows installation
Tensorflow windows installationTensorflow windows installation
Tensorflow windows installation
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
On the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of PythonOn the Necessity and Inapplicability of Python
On the Necessity and Inapplicability of Python
 
On the necessity and inapplicability of python
On the necessity and inapplicability of pythonOn the necessity and inapplicability of python
On the necessity and inapplicability of python
 
Numba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPyNumba: Array-oriented Python Compiler for NumPy
Numba: Array-oriented Python Compiler for NumPy
 
TensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow HubTensorFlow.Data 및 TensorFlow Hub
TensorFlow.Data 및 TensorFlow Hub
 
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
TENSORFLOW: ARCHITECTURE AND USE CASE - NASA SPACE APPS CHALLENGE by Gema Par...
 
Reversing the dropbox client on windows
Reversing the dropbox client on windowsReversing the dropbox client on windows
Reversing the dropbox client on windows
 
Introduction to TensorFlow
Introduction to TensorFlowIntroduction to TensorFlow
Introduction to TensorFlow
 
Tensorflow for Beginners
Tensorflow for BeginnersTensorflow for Beginners
Tensorflow for Beginners
 
Natural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usageNatural language processing open seminar For Tensorflow usage
Natural language processing open seminar For Tensorflow usage
 
Neural Networks with Google TensorFlow
Neural Networks with Google TensorFlowNeural Networks with Google TensorFlow
Neural Networks with Google TensorFlow
 
Deep Learning, Scala, and Spark
Deep Learning, Scala, and SparkDeep Learning, Scala, and Spark
Deep Learning, Scala, and Spark
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
Os Reindersfinal
Os ReindersfinalOs Reindersfinal
Os Reindersfinal
 
Introduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlowIntroduction to Deep Learning, Keras, and TensorFlow
Introduction to Deep Learning, Keras, and TensorFlow
 
Tensorflow presentation
Tensorflow presentationTensorflow presentation
Tensorflow presentation
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
 

Similaire à Using Parallel Computing Platform - NHDNUG

Toub parallelism tour_oct2009
Toub parallelism tour_oct2009Toub parallelism tour_oct2009
Toub parallelism tour_oct2009
nkaluva
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Dmitri Nesteruk
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
Adam Getchell
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi
Spiffy
 

Similaire à Using Parallel Computing Platform - NHDNUG (20)

Toub parallelism tour_oct2009
Toub parallelism tour_oct2009Toub parallelism tour_oct2009
Toub parallelism tour_oct2009
 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Overview Of Parallel Development - Ericnel
Overview Of Parallel Development -  EricnelOverview Of Parallel Development -  Ericnel
Overview Of Parallel Development - Ericnel
 
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
Python Brasil 2010 - Potter vs Voldemort - Lições ofidiglotas da prática Pyth...
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi.NET 4 Demystified - Sandeep Joshi
.NET 4 Demystified - Sandeep Joshi
 
Python For Scientists
Python For ScientistsPython For Scientists
Python For Scientists
 
MTaulty_DevWeek_Parallel
MTaulty_DevWeek_ParallelMTaulty_DevWeek_Parallel
MTaulty_DevWeek_Parallel
 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
 
SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel SDAccel Design Contest: Xilinx SDAccel
SDAccel Design Contest: Xilinx SDAccel
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
1032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.21032 cs208 g operation system ip camera case share.v0.2
1032 cs208 g operation system ip camera case share.v0.2
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
Parallel program design
Parallel program designParallel program design
Parallel program design
 
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen TatarynovWorkshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
Workshop "Can my .NET application use less CPU / RAM?", Yevhen Tatarynov
 
R and Python, A Code Demo
R and Python, A Code DemoR and Python, A Code Demo
R and Python, A Code Demo
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
 

Dernier

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Using Parallel Computing Platform - NHDNUG

  • 1. Visual Studio 2010 Using the Parallel Computing Platform Phil Pennington philpenn@microsoft.com
  • 2. Agenda 2 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
  • 3. First, An ExampleMonte Carlo Approximation of Pi S = 4*r*r C = Pi*r*r Pi = 4*(C/S) For each Point (P), d(P) = SQRT((x * x) + (y * y)) if (d < r) thenP(x,y) is in C
  • 4. Windows and Maximum Processors Before Win7/R2, the maximum number of Logical Processors (LPs) was dictated by processor integral word size LP state (e.g. idle, affinity) represented in word-sized bitmask 32-bit Windows: 32 LPs 64-bit Windows: 64 LPs 32-bit Idle Processor Mask 31 0 16 Busy Idle
  • 5. Processor GroupsNew with Windows7 and Windows Server R2 5 GROUP NUMA NODE Socket Socket Core Core LP LP LP LP Core Core NUMA NODE
  • 6. Processor GroupsExample: 2 Groups, 4 nodes, 8 sockets, 32 cores, 128 LP’s 6 Group Group NUMA Node NUMA Node Socket Socket Socket Socket NUMA Node NUMA Node Socket Socket Socket Socket Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core Core LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP LP
  • 8. Many-Core Topology APIs Resource Localization 8
  • 9. Many-Core Topology APIs Memory Management 9
  • 10. User Mode SchedulingArchitectural Perspective UMS Scheduler’s Ready List Your Scheduler Logic Wait Reason: Yield Reason: Yield Reason: Blocked Reason: Created CPU 1 CPU 2 UMS Completion List W1 W2 W3 W4 S1 S2 Application Kernel Blocked Worker Threads Scheduler Threads
  • 11. Task Scheduling with a UMS SchedulerMaximize Quantum, Minimize Blocking Affects Tasks are run by worker threads, which the scheduler controls Dead Zone WT0 WT1 WT2 WT3 Without UMS (signal-and-wait) WT0 WT1 WT2 WT3 With UMS (UMS yield)
  • 12. Load-Balancing, Work Stealing Scheduler DynamicScheduling Static Scheduling CPU0 CPU1 CPU2 CPU3 CPU0 CPU1 CPU2 CPU3 Dynamic scheduling improves performance by distributing work efficiently at runtime.
  • 13. Demos The Platform - Topology - Schedulers
  • 14. Agenda 14 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
  • 15. Visual Studio 2010, .NET Developer Tools, Programming Models, Runtimes Tools Programming Models – Structured Parallelism Parallel LINQ (PLINQ) Task ParallelLibrary (TPL) Debugger Data Structures .NET Parallel Extensions Profiler Task Scheduler Resource Manager .NET Runtime Threads Pools Managed Library Tools
  • 16. Thread-Pool Scheduler in .NET 4.0 Thread 1 Dispatch Loop Thread 2 Dispatch Loop Thread N Dispatch Loop Enqueue Dequeue T1 T2 T3 T4 Global Queue (FIFO) Dequeue Enqueue T5 Global Q is shared by legacy ThreadPool API and TPL Local work queues and work stealing scheduler (TPL only) T6 T7 T8 Steal Steal Steal Thread 1 Local Queue (LIFO) Thread 2 Local Queue (LIFO) Thread N Local Queue (LIFO)
  • 17. Task Parallel Library (TPL)Tasks Concepts Common Functionality: waiting, cancellation, continuations, parent/child relationships
  • 18. Primitives and Structures Thread-safe, scalable collections IProducerConsumerCollection<T> ConcurrentQueue<T> ConcurrentStack<T> ConcurrentBag<T> ConcurrentDictionary<TKey,TValue> Phases and work exchange Barrier BlockingCollection<T> CountdownEvent Partitioning {Orderable}Partitioner<T> Partitioner.Create Exception handling AggregateException Initialization Lazy<T> LazyInitializer.EnsureInitialized<T> ThreadLocal<T> Locks ManualResetEventSlim SemaphoreSlim SpinLock SpinWait Cancellation CancellationToken{Source}
  • 19. Parallel Debugging Two new debugger toolwindows Support both native and managed “Parallel Tasks” “Parallel Stacks”
  • 20.
  • 21. Where are my tasks running (location, call stack)?
  • 22. Which tasks are blocked?
  • 23.
  • 25. Easy navigation to any executing method
  • 26. Rich UI (zooming, panning, bird’s eye view, flagging, tooltips)Bird’s eye view
  • 28. CPU Utilization Other processes Number of cores Idle time Your Process
  • 29. Threads Measure time for interesting segments Hide uninteresting threads Zoom in and out Detailed thread analysis (one channel per thread) Active Legend Usage Hints Call Stacks
  • 30. Cores Each logical core in a swim lane One color per thread Migration visualization Cross-core migration details
  • 31. Demo Libraries Languages Debuggers Profilers
  • 32. Agenda 27 What’s new with Windows? Parallel Computing Tools in Visual Studio Using .NET Parallel Extensions
  • 33. Thinking Parallel - “Task” vs. “Data” Parallelism Task Parallelism Parallel.Invoke( () => { Console.WriteLine("Begin first task..."); }, () => { Console.WriteLine("Begin second task..."); }, () => { Console.WriteLine("Begin third task..."); } ); Data Parallelism IEnumerable<int> numbers = Enumerable.Range(2, 100-3); varmyQuery = from n in numbers.AsParallel() where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i > 0) select n; int[] primes = myQuery.ToArray();
  • 34. Thinking Parallel – How to Partition Work? Several partitioning schemes built-in Chunk Works with any IEnumerable<T> Single enumerator shared; chunks handed out on-demand Range Works only with IList<T> Input divided into contiguous regions, one per partition Stripe Works only with IList<T> Elements handed out round-robin to each partition Hash Works with any IEnumerable<T> Elements assigned to partition based on hash code Custom partitioning available through Partitioner<T> Partitioner.Create available for tighter control over built-in partitioning schemes
  • 35. Thinking Parallel – How to Execute Tasks?
  • 36. Thinking Parallel – How to Collate Results?
  • 38. Resources NativeAPIs/runtimes (Visual C++ 10) Tasks, loops, collections, and Agents http://msdn.microsoft.com/en-us/library/dd504870(VS.100).aspx Tools (in the VS2010 IDE) Debugger and profiler http://msdn.microsoft.com/en-us/library/dd460685(VS.100).aspx Managed APIs/runtimes (.NET 4) Tasks, loops, collections, and PLINQ http://msdn.microsoft.com/en-us/library/dd460693(VS.100).aspx General VS2010 Parallel Computing Developer Center http://msdn.microsoft.com/en-us/concurrency/default.aspx

Notes de l'éditeur

  1. Let’s use this slide for an “Architectural Perspective” of UMS.&lt;CLICK&gt;S1 and S2 are the first threads created within a UMS solution. These are “Scheduler Threads” or “Primary Threads”. These threads represent “core” or physical CPU’s from a Scheduler perspective. These are normal threads to begin with, but you would typically first establish processor affinity using the new CreateRemoteThreadEx API and the use a new API, EnterUmsSchedulingMode, to specify that the new thread is a Scheduler thread.You pass in a callback, i.e. UMSSchedulerProc, function pointer to begin executing instructions on the Scheduler thread.A UMS worker thread is created by calling CreateRemoteThreadEx with the PROC_THREAD_ATTRIBUTE_UMS_THREAD attribute and specifying a UMS thread context and a completion list. The OS places these threads into the Completion List and your Scheduler logic takes over typically placing the new threads onto the Scheduler’s Ready List.&lt;CLICK&gt;The first thing that a Scheduler should do is move it’s associated Worker threads onto the Scheduler’s Ready List. Then, it can began executing your customer scheduler logic.&lt;CLICK&gt;Each of the Scheduler threads should then pop a Worker thread off of the Ready List and run it on the associated Core. When this occurs, the Scheduler thread context is essentially lost forever… the Worker thread now owns the core and is executing. The Scheduler thread will not regain the core until a processor Yield event occurs.&lt;CLICK&gt;The first thing that could happen is that this thread could yield. Yield is again a Scheduler callback mechanism and perhaps the single most important function of UMS. It’s within the Yield that you will implement your own synchronization primitives and scheduling logic.Ideally, the yielding thread provides some contextual information to the scheduler (maybe it wants to wait on some specific application domain event to occur). Your Scheduler would look at this Yield request and associated context and make a scheduling decision.&lt;CLICK&gt;Maybe the Scheduler places the Worker thread within a Wait list for that specific event or event type.Now your Scheduler has to decide what to run next. &lt;CLICK&gt;Maybe the next Worker thread from the Ready List, for instance… and we’re back running again. Note, that no kernel scheduling context switch was necessary. Maybe that wait event handling took 200 cycles in user-mode. It may have cost 10 times that with a kernel context switch.&lt;CLICK&gt;Let’s now assume that this worker performs a system call… At this point, we switch the worker thread to it’s kernel-mode context and the thread continues to run within the kernel. If it does not block (in other words, if it doesn’t use one of the kernel synchronization primitives, then it just continues to run. If the thread never blocks in the kernel, then it just returns to user-mode and continues to run and do work.&lt;CLICK&gt;Let’s assume that the thread does block. Maybe a page fault occurred, for instance. Now our Scheduler thread regains control of the processor via a callback from the kernel. Now, the kernel is telling your Scheduler that a worker thread is blocked and the reason for that block. This is the point where we integrate kernel synchronization with user synchronization. But now, you get to decide what to run next.&lt;CLICK&gt;The Scheduler looks at the state of it’s affairs and perhaps decides to run the next Worker thread from the Ready List, for example.Let’s assume that later in time Worker 3 unblocks. &lt;CLICK&gt;The kernel will now place this unblocked Worker thread into the UMS Completion List.&lt;CLICK&gt;At the next Yield event, for instance, we get another Scheduler decision opportunity. Maybe this Yield contains information that affects the state of our Wait list.&lt;CLICK&gt;The first thing that the Scheduler should do, however, is manage the Completion List and move any unblocked threads to the Ready List.&lt;CLICK&gt;Next, our Scheduler must make a priority decision. Maybe our Waiting thread gets to run again and our Yielding thread gets placed upon the Ready List.And we’re done…
  2. UMS is an enabler for:Finer-grained parallelismMore deterministic behaviorBetter cache localityUMS allows your Scheduler to boost performance in certain situations:Apps that have a lot of blocking, for example
  3. Think Tasks not Threads.Threads represent execution flow, not workHard-coded; significant system overheadMinimal intrinsic parallel constructsQueueUserWorkItem() is handy for fire-and-forgetBut what about…WaitingCancelingContinuingComposingExceptionsDataflowIntegrationDebugging
  4. NOW, LET’S FIRST CONSIDER THE TOOLS ARCHICTECTURE FROM A .NET DEVELOPER’S PERSPECTIVE.LET’S START WITH THE .NET Runtime AND THE .NET Parallel Extensions library. In a moment, we’ll look at how a developer uses the extensions within their application. The .NET PARALLEL EXTENSIONS provide the benefits of concurrent task scheduling without YOU having to build a custom scheduler that is appropriately reentrant, thread-safe, and non-blocking.&lt;CLICK&gt;The Parallel Extensions library contains a Task Scheduler and a Resource Manager component that integrates with the underlying .NET Runtime. The Resource Manager manages access to system resources like the collection of available CPU’s. &lt;CLICK&gt;The Scheduler leverages only thread pools for task scheduling. &lt;CLICK&gt;The Parallel Extensions also supports multiple Programming Models. &lt;CLICK&gt;The Task Parallel Library (TPL) is an easy and convenient way to express fine-grain parallelism within your applications. The TPL provides patterns for Task Execution, Synchronization, and Data Sharing.&lt;CLICK&gt;The PLINQ (or Parallel LINQ) enables parallel query execution not only on SQL Data but also on XML or Collections Data.&lt;CLICK&gt;The Parallel Extensions also includes Data Structures that are “scheduler aware” enabling you to optimally specify task scheduling requests and custom scheduler policies.&lt;CLICK&gt;Again, Visual Studio 2010 includes new tools for parallel application development and testing. These include:&lt;CLICK&gt;A new parallel debugger. And…&lt;CLICK&gt;A new parallel application profiler.Let’s take a brief look at a simple .NET parallel application along with the Visual Studio 2010 Debugger and Parallel Performance Analyzer.Pure .NET librariesFeature areasTask Parallel LibraryParallel LINQSynchronization primitives and thread-safe data structuresEnhanced ThreadPool