SlideShare une entreprise Scribd logo
1  sur  24
NUMA
MAATALLA Abed
abedmaatalla@gmail.com
• What is NUMA?
• History of processors.
• Close look on NUMA.
• UMA, NUMA & NUMA SMP architect.
• Barriers of NUMA.
• Solutions.
• Existing simulators.
• Benefits of NUMA
What is NUMA?
• Non-Uniform Memory Access: it will take longer
to access some regions of memory than others
• Designed to improve scalability on large SMPs
• Processor can access its own local memory faster than
non-local memory.
SMP: symmetric multiprocessing
What is NUMA?
• Groups of processors (NUMA node) have their own local
memory
– Any processor can access any memory, including the
one not "owned" by its group (remote memory)
– Non-uniform: accessing local memory is faster than
accessing remote memory
What is NUMA?
• Nodes are linked to each other by a hight-speed interconnection
• NUMA limits the number of CPUs
• Each group of processors has its own memory and possibly its I/O
channels
• The number of CPUs withing a NUMA node depends on the hardware
vendor.
What is NUMA?
• Facts:
– (most of) memory is
allocated at task startup.
– tasks are (usually) free to
run on any processor.
Both local and remote
accesses can happen
during task's life.
History of processors.
• Mental model of CPUs is stuck in the 1980s: basically
boxes that do arithmetic, logic, bit twiddling and shifting,
and loading and storing things in memory. But various
newer developments like vector instructions (SIMD) and
the idea that newer CPUs have support for virtualization.
• Many supercomputer designs of the 1980s and 1990s
focused on providing high-speed memory access as
opposed to faster processors, allowing the computers to
work on large data sets at speeds other systems could
not approach.
History of processors.
• The first commercial implementation of a NUMA-based
Unix system was the Symmetrical Multi Processing XPS-
100 family of servers, designed by Dan Gielan of VAST
Corporation for Honeywell Information Systems Italy.
Close look on NUMA.
• One can view NUMA as a tightly coupled form of cluster
computing. The addition of virtual memory paging to a
cluster architecture can allow the implementation of
NUMA entirely in software. However, the inter-node
latency of software-based NUMA remains several orders
of magnitude greater (slower) than that of hardware-
based NUMA.
• NUMA come to solve performance problems by
providing separate memory for each processor &
avoiding the performance hit when several processors
attempt to address the same memory.
Close look on NUMA
• Threads that share memory should be on the same
socket, and a memory-mapped I/O heavy thread should
make sure it’s on the socket that’s closest to the I/O
device it’s talking to.
• There is multiple level of memory like CC & LLC
because CPU become faster and need to speed up
memory access, it calls memory tree.
Close look on NUMA
• NUMA VS ccNUMA: The difference is almost
nonexistent at this point. ccNUMA stands for Cache-
Coherent NUMA, but NUMA and ccNUMA have really
come to be synonymous. The applications for non-cache
coherent NUMA machines are almost non-existent, and
they are a real pain to program for, so unless specifically
stated otherwise, NUMA actually means ccNUMA.
Close look on NUMA
• When a processor looks for data at a certain memory
address, it first looks in the L1 cache on the
microprocessor itself, then on a somewhat larger L1 and
L2 cache chip nearby, and then on a third level of cache
that the NUMA configuration provides before seeking the
data in the "remote memory" located near the other
microprocessors. Each of these NODES in the
interconnection network. NUMA maintains a hierarchical
view of the data on all the nodes.
• InterConnection Netwrok (ICN): as mentioned above,
ICN related NODES to allow exchange of data between
them. ( same in cluster physical link allow exchange of
data)
UMA, NUMA & NUMA SMP architect
• Uniform memory access(UMA): all
processors have same latency to
access memory. This architecture is
scalable only for limited nmber of
processors.
• Nom Uniform Memory
Access(NUMA): each processor has
its own local memory, the memory of
other processor is accessible but the
lantency to access them is not the
same which this event called " remote
memory access"
UMA, NUMA & NUMA SMP architect
• NUMA SMP: the hardware
trend is to use NUMA systems
with sereval NUMA nodes as
show in figure. A NUMA node
haa a group of processors
having shared memory. A
NUMA node can use its local
bus to interact with local
memory. Multiple NUMA
nodes can be added to form a
SMP. A common SMP bus can
interconnect all NUMA nodes
Barriers of NUMA.
• Spread data between memories.
Barriers of NUMA.
• Spread tacks between sockets.
Barriers of NUMA.
• IO NUMA: needs to be considered during placement /
scheduling.
Barriers of NUMA.
• There was just memory in 80s. Then CPUs got fast
enough relative to memory that people wanted to add a
cache. It’s bad news if the cache is inconsistent with the
backing store (memory), so the cache has to keep some
information about what it’s holding on to so it knows
if/when it needs to write things to the backing store.
Barriers of NUMA.
• Data request by more
than one processor.
• How far apart the
processors are from their
associated memory
banks.
Solutions
• It exist some hardware implementation to solve some
problems. Because, buying a high end server is so
expensive to test on it new approches and need a
special condition like cold and space.
• We as developer could create a simulator to implement
different approaches to analyse, improve performance
and scalability. This mean that simulator need to handle
software and hardware part also, by indicating remote
memory access events, calculate execution time of each
process and IO events ... etc.
Existing simulators
There is a same number of existing project that could be
named such as: RSIM, SICOSYS, SIMT and simNUMA.
Those projects exist and have done pretty nice job each
of those has power points and weakness points, but it's
already started and there is much more to cover and to
implement in this field.
There are a lot of approches and theories that needs to
be tested and proved or disproved.
For those reason mentioned above simulator plays an
important role in the near future
Benefit of NUMA
As mentioned above and scalability. It is extremely
difficult to scale SMP CPUs. At that number of CPUs, the
memory bus is under heavy contention. NUMA is one
way of reducing the number of CPUs competing for
access to a shared memory bus. This is accomplished
by having several memory busses and only having a
small number of CPUs on each of those busses.
I’m interested in things that
CPUs can’t do yet but will be
able to do in the near future.
Thank you

Contenu connexe

Tendances

Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Bhavik Vashi
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memoryHamza Zahid
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architectureArpan Baishya
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsZbigniew Jerzak
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systemssumitjain2013
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Fazli Amin
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel systemManish Singh
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...Malobe Lottin Cyrille Marcel
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING Zena Abo-Altaheen
 
Semophores and it's types
Semophores and it's typesSemophores and it's types
Semophores and it's typesNishant Joshi
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1Mr SMAK
 

Tendances (20)

Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Aca2 01 new
Aca2 01 newAca2 01 new
Aca2 01 new
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
Distributed Operating System_4
Distributed Operating System_4Distributed Operating System_4
Distributed Operating System_4
 
Parallel processing (simd and mimd)
Parallel processing (simd and mimd)Parallel processing (simd and mimd)
Parallel processing (simd and mimd)
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
message passing vs shared memory
message passing vs shared memorymessage passing vs shared memory
message passing vs shared memory
 
11. dfs
11. dfs11. dfs
11. dfs
 
Multiprocessor architecture
Multiprocessor architectureMultiprocessor architecture
Multiprocessor architecture
 
Unix case-study
Unix case-studyUnix case-study
Unix case-study
 
Clock Synchronization in Distributed Systems
Clock Synchronization in Distributed SystemsClock Synchronization in Distributed Systems
Clock Synchronization in Distributed Systems
 
Fault tolerance in distributed systems
Fault tolerance in distributed systemsFault tolerance in distributed systems
Fault tolerance in distributed systems
 
Lecture 1 (distributed systems)
Lecture 1 (distributed systems)Lecture 1 (distributed systems)
Lecture 1 (distributed systems)
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Chap 2 classification of parralel architecture and introduction to parllel p...
Chap 2  classification of parralel architecture and introduction to parllel p...Chap 2  classification of parralel architecture and introduction to parllel p...
Chap 2 classification of parralel architecture and introduction to parllel p...
 
Parallelism
ParallelismParallelism
Parallelism
 
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSINGADVANCED COMPUTER ARCHITECTUREAND PARALLEL PROCESSING
ADVANCED COMPUTER ARCHITECTURE AND PARALLEL PROCESSING
 
Semophores and it's types
Semophores and it's typesSemophores and it's types
Semophores and it's types
 
Distributed Computing ppt
Distributed Computing pptDistributed Computing ppt
Distributed Computing ppt
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 

Similaire à NUMA overview

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)Mamesh
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...langonej
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxFaizanSaleem81
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentsandya veduri
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processingSyed Zaid Irshad
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptxAbcvDef
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0sprdd
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelManoraj Pannerselum
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Docker, Inc.
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processorSou Jana
 

Similaire à NUMA overview (20)

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Lecture4
Lecture4Lecture4
Lecture4
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
OS_MD_4.pdf
OS_MD_4.pdfOS_MD_4.pdf
OS_MD_4.pdf
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
 
Multicore and shared multi processor
Multicore and shared multi processorMulticore and shared multi processor
Multicore and shared multi processor
 

NUMA overview

  • 2. • What is NUMA? • History of processors. • Close look on NUMA. • UMA, NUMA & NUMA SMP architect. • Barriers of NUMA. • Solutions. • Existing simulators. • Benefits of NUMA
  • 3. What is NUMA? • Non-Uniform Memory Access: it will take longer to access some regions of memory than others • Designed to improve scalability on large SMPs • Processor can access its own local memory faster than non-local memory. SMP: symmetric multiprocessing
  • 4. What is NUMA? • Groups of processors (NUMA node) have their own local memory – Any processor can access any memory, including the one not "owned" by its group (remote memory) – Non-uniform: accessing local memory is faster than accessing remote memory
  • 5. What is NUMA? • Nodes are linked to each other by a hight-speed interconnection • NUMA limits the number of CPUs • Each group of processors has its own memory and possibly its I/O channels • The number of CPUs withing a NUMA node depends on the hardware vendor.
  • 6. What is NUMA? • Facts: – (most of) memory is allocated at task startup. – tasks are (usually) free to run on any processor. Both local and remote accesses can happen during task's life.
  • 7. History of processors. • Mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. But various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization. • Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.
  • 8. History of processors. • The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS- 100 family of servers, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems Italy.
  • 9. Close look on NUMA. • One can view NUMA as a tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater (slower) than that of hardware- based NUMA. • NUMA come to solve performance problems by providing separate memory for each processor & avoiding the performance hit when several processors attempt to address the same memory.
  • 10. Close look on NUMA • Threads that share memory should be on the same socket, and a memory-mapped I/O heavy thread should make sure it’s on the socket that’s closest to the I/O device it’s talking to. • There is multiple level of memory like CC & LLC because CPU become faster and need to speed up memory access, it calls memory tree.
  • 11. Close look on NUMA • NUMA VS ccNUMA: The difference is almost nonexistent at this point. ccNUMA stands for Cache- Coherent NUMA, but NUMA and ccNUMA have really come to be synonymous. The applications for non-cache coherent NUMA machines are almost non-existent, and they are a real pain to program for, so unless specifically stated otherwise, NUMA actually means ccNUMA.
  • 12. Close look on NUMA • When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor itself, then on a somewhat larger L1 and L2 cache chip nearby, and then on a third level of cache that the NUMA configuration provides before seeking the data in the "remote memory" located near the other microprocessors. Each of these NODES in the interconnection network. NUMA maintains a hierarchical view of the data on all the nodes. • InterConnection Netwrok (ICN): as mentioned above, ICN related NODES to allow exchange of data between them. ( same in cluster physical link allow exchange of data)
  • 13. UMA, NUMA & NUMA SMP architect • Uniform memory access(UMA): all processors have same latency to access memory. This architecture is scalable only for limited nmber of processors. • Nom Uniform Memory Access(NUMA): each processor has its own local memory, the memory of other processor is accessible but the lantency to access them is not the same which this event called " remote memory access"
  • 14. UMA, NUMA & NUMA SMP architect • NUMA SMP: the hardware trend is to use NUMA systems with sereval NUMA nodes as show in figure. A NUMA node haa a group of processors having shared memory. A NUMA node can use its local bus to interact with local memory. Multiple NUMA nodes can be added to form a SMP. A common SMP bus can interconnect all NUMA nodes
  • 15. Barriers of NUMA. • Spread data between memories.
  • 16. Barriers of NUMA. • Spread tacks between sockets.
  • 17. Barriers of NUMA. • IO NUMA: needs to be considered during placement / scheduling.
  • 18. Barriers of NUMA. • There was just memory in 80s. Then CPUs got fast enough relative to memory that people wanted to add a cache. It’s bad news if the cache is inconsistent with the backing store (memory), so the cache has to keep some information about what it’s holding on to so it knows if/when it needs to write things to the backing store.
  • 19. Barriers of NUMA. • Data request by more than one processor. • How far apart the processors are from their associated memory banks.
  • 20. Solutions • It exist some hardware implementation to solve some problems. Because, buying a high end server is so expensive to test on it new approches and need a special condition like cold and space. • We as developer could create a simulator to implement different approaches to analyse, improve performance and scalability. This mean that simulator need to handle software and hardware part also, by indicating remote memory access events, calculate execution time of each process and IO events ... etc.
  • 21. Existing simulators There is a same number of existing project that could be named such as: RSIM, SICOSYS, SIMT and simNUMA. Those projects exist and have done pretty nice job each of those has power points and weakness points, but it's already started and there is much more to cover and to implement in this field. There are a lot of approches and theories that needs to be tested and proved or disproved. For those reason mentioned above simulator plays an important role in the near future
  • 22. Benefit of NUMA As mentioned above and scalability. It is extremely difficult to scale SMP CPUs. At that number of CPUs, the memory bus is under heavy contention. NUMA is one way of reducing the number of CPUs competing for access to a shared memory bus. This is accomplished by having several memory busses and only having a small number of CPUs on each of those busses.
  • 23. I’m interested in things that CPUs can’t do yet but will be able to do in the near future.