SlideShare une entreprise Scribd logo
1  sur  24
NUMA
MAATALLA Abed
abedmaatalla@gmail.com
• What is NUMA?
• History of processors.
• Close look on NUMA.
• UMA, NUMA & NUMA SMP architect.
• Barriers of NUMA.
• Solutions.
• Existing simulators.
• Benefits of NUMA
What is NUMA?
• Non-Uniform Memory Access: it will take longer
to access some regions of memory than others
• Designed to improve scalability on large SMPs
• Processor can access its own local memory faster than
non-local memory.
SMP: symmetric multiprocessing
What is NUMA?
• Groups of processors (NUMA node) have their own local
memory
– Any processor can access any memory, including the
one not "owned" by its group (remote memory)
– Non-uniform: accessing local memory is faster than
accessing remote memory
What is NUMA?
• Nodes are linked to each other by a hight-speed interconnection
• NUMA limits the number of CPUs
• Each group of processors has its own memory and possibly its I/O
channels
• The number of CPUs withing a NUMA node depends on the hardware
vendor.
What is NUMA?
• Facts:
– (most of) memory is
allocated at task startup.
– tasks are (usually) free to
run on any processor.
Both local and remote
accesses can happen
during task's life.
History of processors.
• Mental model of CPUs is stuck in the 1980s: basically
boxes that do arithmetic, logic, bit twiddling and shifting,
and loading and storing things in memory. But various
newer developments like vector instructions (SIMD) and
the idea that newer CPUs have support for virtualization.
• Many supercomputer designs of the 1980s and 1990s
focused on providing high-speed memory access as
opposed to faster processors, allowing the computers to
work on large data sets at speeds other systems could
not approach.
History of processors.
• The first commercial implementation of a NUMA-based
Unix system was the Symmetrical Multi Processing XPS-
100 family of servers, designed by Dan Gielan of VAST
Corporation for Honeywell Information Systems Italy.
Close look on NUMA.
• One can view NUMA as a tightly coupled form of cluster
computing. The addition of virtual memory paging to a
cluster architecture can allow the implementation of
NUMA entirely in software. However, the inter-node
latency of software-based NUMA remains several orders
of magnitude greater (slower) than that of hardware-
based NUMA.
• NUMA come to solve performance problems by
providing separate memory for each processor &
avoiding the performance hit when several processors
attempt to address the same memory.
Close look on NUMA
• Threads that share memory should be on the same
socket, and a memory-mapped I/O heavy thread should
make sure it’s on the socket that’s closest to the I/O
device it’s talking to.
• There is multiple level of memory like CC & LLC
because CPU become faster and need to speed up
memory access, it calls memory tree.
Close look on NUMA
• NUMA VS ccNUMA: The difference is almost
nonexistent at this point. ccNUMA stands for Cache-
Coherent NUMA, but NUMA and ccNUMA have really
come to be synonymous. The applications for non-cache
coherent NUMA machines are almost non-existent, and
they are a real pain to program for, so unless specifically
stated otherwise, NUMA actually means ccNUMA.
Close look on NUMA
• When a processor looks for data at a certain memory
address, it first looks in the L1 cache on the
microprocessor itself, then on a somewhat larger L1 and
L2 cache chip nearby, and then on a third level of cache
that the NUMA configuration provides before seeking the
data in the "remote memory" located near the other
microprocessors. Each of these NODES in the
interconnection network. NUMA maintains a hierarchical
view of the data on all the nodes.
• InterConnection Netwrok (ICN): as mentioned above,
ICN related NODES to allow exchange of data between
them. ( same in cluster physical link allow exchange of
data)
UMA, NUMA & NUMA SMP architect
• Uniform memory access(UMA): all
processors have same latency to
access memory. This architecture is
scalable only for limited nmber of
processors.
• Nom Uniform Memory
Access(NUMA): each processor has
its own local memory, the memory of
other processor is accessible but the
lantency to access them is not the
same which this event called " remote
memory access"
UMA, NUMA & NUMA SMP architect
• NUMA SMP: the hardware
trend is to use NUMA systems
with sereval NUMA nodes as
show in figure. A NUMA node
haa a group of processors
having shared memory. A
NUMA node can use its local
bus to interact with local
memory. Multiple NUMA
nodes can be added to form a
SMP. A common SMP bus can
interconnect all NUMA nodes
Barriers of NUMA.
• Spread data between memories.
Barriers of NUMA.
• Spread tacks between sockets.
Barriers of NUMA.
• IO NUMA: needs to be considered during placement /
scheduling.
Barriers of NUMA.
• There was just memory in 80s. Then CPUs got fast
enough relative to memory that people wanted to add a
cache. It’s bad news if the cache is inconsistent with the
backing store (memory), so the cache has to keep some
information about what it’s holding on to so it knows
if/when it needs to write things to the backing store.
Barriers of NUMA.
• Data request by more
than one processor.
• How far apart the
processors are from their
associated memory
banks.
Solutions
• It exist some hardware implementation to solve some
problems. Because, buying a high end server is so
expensive to test on it new approches and need a
special condition like cold and space.
• We as developer could create a simulator to implement
different approaches to analyse, improve performance
and scalability. This mean that simulator need to handle
software and hardware part also, by indicating remote
memory access events, calculate execution time of each
process and IO events ... etc.
Existing simulators
There is a same number of existing project that could be
named such as: RSIM, SICOSYS, SIMT and simNUMA.
Those projects exist and have done pretty nice job each
of those has power points and weakness points, but it's
already started and there is much more to cover and to
implement in this field.
There are a lot of approches and theories that needs to
be tested and proved or disproved.
For those reason mentioned above simulator plays an
important role in the near future
Benefit of NUMA
As mentioned above and scalability. It is extremely
difficult to scale SMP CPUs. At that number of CPUs, the
memory bus is under heavy contention. NUMA is one
way of reducing the number of CPUs competing for
access to a shared memory bus. This is accomplished
by having several memory busses and only having a
small number of CPUs on each of those busses.
I’m interested in things that
CPUs can’t do yet but will be
able to do in the near future.
Thank you

Contenu connexe

Tendances

Parallel computing
Parallel computingParallel computing
Parallel computingVinay Gupta
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)vani261
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Akhil Nadh PC
 
Operating system 31 multiple processor scheduling
Operating system 31 multiple processor schedulingOperating system 31 multiple processor scheduling
Operating system 31 multiple processor schedulingVaibhav Khanna
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing modelsishmecse13
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersDhanashree Prasad
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureBalaji Vignesh
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)Dinesh Modak
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platformsVajira Thambawita
 
Multiprocessor Systems
Multiprocessor SystemsMultiprocessor Systems
Multiprocessor Systemsvampugani
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared MemoryPrakhar Rastogi
 

Tendances (20)

09. storage-part-1
09. storage-part-109. storage-part-1
09. storage-part-1
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Multiprocessor Architecture (Advanced computer architecture)
Multiprocessor Architecture  (Advanced computer architecture)Multiprocessor Architecture  (Advanced computer architecture)
Multiprocessor Architecture (Advanced computer architecture)
 
Unit 4
Unit 4Unit 4
Unit 4
 
Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]Chorus - Distributed Operating System [ case study ]
Chorus - Distributed Operating System [ case study ]
 
05 internal memory
05 internal memory05 internal memory
05 internal memory
 
Operating system 31 multiple processor scheduling
Operating system 31 multiple processor schedulingOperating system 31 multiple processor scheduling
Operating system 31 multiple processor scheduling
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing models
 
OpenMP Tutorial for Beginners
OpenMP Tutorial for BeginnersOpenMP Tutorial for Beginners
OpenMP Tutorial for Beginners
 
VIRTUAL MEMORY
VIRTUAL MEMORYVIRTUAL MEMORY
VIRTUAL MEMORY
 
Unit 5 Advanced Computer Architecture
Unit 5 Advanced Computer ArchitectureUnit 5 Advanced Computer Architecture
Unit 5 Advanced Computer Architecture
 
Distributed operating system(os)
Distributed operating system(os)Distributed operating system(os)
Distributed operating system(os)
 
Distributed and clustered systems
Distributed and clustered systemsDistributed and clustered systems
Distributed and clustered systems
 
Memory Organization
Memory OrganizationMemory Organization
Memory Organization
 
Underlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computingUnderlying principles of parallel and distributed computing
Underlying principles of parallel and distributed computing
 
Lecture 3 parallel programming platforms
Lecture 3   parallel programming platformsLecture 3   parallel programming platforms
Lecture 3 parallel programming platforms
 
Multiprocessor Systems
Multiprocessor SystemsMultiprocessor Systems
Multiprocessor Systems
 
Monolithic kernel
Monolithic kernelMonolithic kernel
Monolithic kernel
 
Distributed Shared Memory
Distributed Shared MemoryDistributed Shared Memory
Distributed Shared Memory
 
6.distributed shared memory
6.distributed shared memory6.distributed shared memory
6.distributed shared memory
 

Similaire à NUMA overview

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)Mamesh
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...langonej
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computingNiranjana Ambadi
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxFaizanSaleem81
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word documentsandya veduri
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processingSyed Zaid Irshad
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Nakul Manchanda
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptxAbcvDef
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0sprdd
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer ArchitectureSubhasis Dash
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.pptAberaZeleke1
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelManoraj Pannerselum
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Docker, Inc.
 

Similaire à NUMA overview (20)

Numa (non uniform memory access)
Numa (non uniform memory access)Numa (non uniform memory access)
Numa (non uniform memory access)
 
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
Federal VMUG - March - Virtual machine sizing considerations in a numa enviro...
 
network ram parallel computing
network ram parallel computingnetwork ram parallel computing
network ram parallel computing
 
Lecture4
Lecture4Lecture4
Lecture4
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
22CS201 COA
22CS201 COA22CS201 COA
22CS201 COA
 
CA UNIT IV.pptx
CA UNIT IV.pptxCA UNIT IV.pptx
CA UNIT IV.pptx
 
Week 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptxWeek 13-14 Parrallel Processing-new.pptx
Week 13-14 Parrallel Processing-new.pptx
 
New microsoft office word document
New microsoft office word documentNew microsoft office word document
New microsoft office word document
 
Parallel & Distributed processing
Parallel & Distributed processingParallel & Distributed processing
Parallel & Distributed processing
 
Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)Non-Uniform Memory Access ( NUMA)
Non-Uniform Memory Access ( NUMA)
 
CSA unit5.pptx
CSA unit5.pptxCSA unit5.pptx
CSA unit5.pptx
 
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
Summit2014 riel chegu_w_0340_automatic_numa_balancing_0
 
High Performance Computer Architecture
High Performance Computer ArchitectureHigh Performance Computer Architecture
High Performance Computer Architecture
 
Multiprocessor_YChen.ppt
Multiprocessor_YChen.pptMultiprocessor_YChen.ppt
Multiprocessor_YChen.ppt
 
High performance computing
High performance computingHigh performance computing
High performance computing
 
Massively Parallel Architectures
Massively Parallel ArchitecturesMassively Parallel Architectures
Massively Parallel Architectures
 
OS_MD_4.pdf
OS_MD_4.pdfOS_MD_4.pdf
OS_MD_4.pdf
 
Symmetric multiprocessing and Microkernel
Symmetric multiprocessing and MicrokernelSymmetric multiprocessing and Microkernel
Symmetric multiprocessing and Microkernel
 
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
Sharding Containers: Make Go Apps Computer-Friendly Again by Andrey Sibiryov
 

NUMA overview

  • 2. • What is NUMA? • History of processors. • Close look on NUMA. • UMA, NUMA & NUMA SMP architect. • Barriers of NUMA. • Solutions. • Existing simulators. • Benefits of NUMA
  • 3. What is NUMA? • Non-Uniform Memory Access: it will take longer to access some regions of memory than others • Designed to improve scalability on large SMPs • Processor can access its own local memory faster than non-local memory. SMP: symmetric multiprocessing
  • 4. What is NUMA? • Groups of processors (NUMA node) have their own local memory – Any processor can access any memory, including the one not "owned" by its group (remote memory) – Non-uniform: accessing local memory is faster than accessing remote memory
  • 5. What is NUMA? • Nodes are linked to each other by a hight-speed interconnection • NUMA limits the number of CPUs • Each group of processors has its own memory and possibly its I/O channels • The number of CPUs withing a NUMA node depends on the hardware vendor.
  • 6. What is NUMA? • Facts: – (most of) memory is allocated at task startup. – tasks are (usually) free to run on any processor. Both local and remote accesses can happen during task's life.
  • 7. History of processors. • Mental model of CPUs is stuck in the 1980s: basically boxes that do arithmetic, logic, bit twiddling and shifting, and loading and storing things in memory. But various newer developments like vector instructions (SIMD) and the idea that newer CPUs have support for virtualization. • Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.
  • 8. History of processors. • The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS- 100 family of servers, designed by Dan Gielan of VAST Corporation for Honeywell Information Systems Italy.
  • 9. Close look on NUMA. • One can view NUMA as a tightly coupled form of cluster computing. The addition of virtual memory paging to a cluster architecture can allow the implementation of NUMA entirely in software. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater (slower) than that of hardware- based NUMA. • NUMA come to solve performance problems by providing separate memory for each processor & avoiding the performance hit when several processors attempt to address the same memory.
  • 10. Close look on NUMA • Threads that share memory should be on the same socket, and a memory-mapped I/O heavy thread should make sure it’s on the socket that’s closest to the I/O device it’s talking to. • There is multiple level of memory like CC & LLC because CPU become faster and need to speed up memory access, it calls memory tree.
  • 11. Close look on NUMA • NUMA VS ccNUMA: The difference is almost nonexistent at this point. ccNUMA stands for Cache- Coherent NUMA, but NUMA and ccNUMA have really come to be synonymous. The applications for non-cache coherent NUMA machines are almost non-existent, and they are a real pain to program for, so unless specifically stated otherwise, NUMA actually means ccNUMA.
  • 12. Close look on NUMA • When a processor looks for data at a certain memory address, it first looks in the L1 cache on the microprocessor itself, then on a somewhat larger L1 and L2 cache chip nearby, and then on a third level of cache that the NUMA configuration provides before seeking the data in the "remote memory" located near the other microprocessors. Each of these NODES in the interconnection network. NUMA maintains a hierarchical view of the data on all the nodes. • InterConnection Netwrok (ICN): as mentioned above, ICN related NODES to allow exchange of data between them. ( same in cluster physical link allow exchange of data)
  • 13. UMA, NUMA & NUMA SMP architect • Uniform memory access(UMA): all processors have same latency to access memory. This architecture is scalable only for limited nmber of processors. • Nom Uniform Memory Access(NUMA): each processor has its own local memory, the memory of other processor is accessible but the lantency to access them is not the same which this event called " remote memory access"
  • 14. UMA, NUMA & NUMA SMP architect • NUMA SMP: the hardware trend is to use NUMA systems with sereval NUMA nodes as show in figure. A NUMA node haa a group of processors having shared memory. A NUMA node can use its local bus to interact with local memory. Multiple NUMA nodes can be added to form a SMP. A common SMP bus can interconnect all NUMA nodes
  • 15. Barriers of NUMA. • Spread data between memories.
  • 16. Barriers of NUMA. • Spread tacks between sockets.
  • 17. Barriers of NUMA. • IO NUMA: needs to be considered during placement / scheduling.
  • 18. Barriers of NUMA. • There was just memory in 80s. Then CPUs got fast enough relative to memory that people wanted to add a cache. It’s bad news if the cache is inconsistent with the backing store (memory), so the cache has to keep some information about what it’s holding on to so it knows if/when it needs to write things to the backing store.
  • 19. Barriers of NUMA. • Data request by more than one processor. • How far apart the processors are from their associated memory banks.
  • 20. Solutions • It exist some hardware implementation to solve some problems. Because, buying a high end server is so expensive to test on it new approches and need a special condition like cold and space. • We as developer could create a simulator to implement different approaches to analyse, improve performance and scalability. This mean that simulator need to handle software and hardware part also, by indicating remote memory access events, calculate execution time of each process and IO events ... etc.
  • 21. Existing simulators There is a same number of existing project that could be named such as: RSIM, SICOSYS, SIMT and simNUMA. Those projects exist and have done pretty nice job each of those has power points and weakness points, but it's already started and there is much more to cover and to implement in this field. There are a lot of approches and theories that needs to be tested and proved or disproved. For those reason mentioned above simulator plays an important role in the near future
  • 22. Benefit of NUMA As mentioned above and scalability. It is extremely difficult to scale SMP CPUs. At that number of CPUs, the memory bus is under heavy contention. NUMA is one way of reducing the number of CPUs competing for access to a shared memory bus. This is accomplished by having several memory busses and only having a small number of CPUs on each of those busses.
  • 23. I’m interested in things that CPUs can’t do yet but will be able to do in the near future.