SlideShare une entreprise Scribd logo
1  sur  35
Télécharger pour lire hors ligne
1
ARMv7-A Architecture
Overview
David Brash
Architecture Program Manager, ARM Ltd.
2
ARM Architecture roadmap
Announced Aug-2010:
• Large PhysAddr Extn
• PA => 40-bits
•Virtualization Extn
(v8 builds on these)
3
ARMv7-A registers
R0
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R13 (SP)
R14 (LR)
R0 R0 R0 R0 R0R0
SP_svc
LR_svc
SP_irq
LR_irq
SP_und
LR_und
SP_fiq
LR_fiq
SP_abt
LR_abt
R8_fiq
R9_fiq
R10_fiq
R11_fiq
R12_fiq
SP_hyp
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R1
R2
R3
R4
R5
R6
R7
R8
R9
R10
R11
R12
R1
R2
R3
R4
R5
R6
R7
LR_(BL)
CPSR bits
• Flags: NZCVQ
• IT[7:0] status bits
• GE[3:0] Adv SIMD flags
• J (Jazelle)
• T (Thumb)
• E (endian)
• I, A, F masks
• M[4:0] (mode)
SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_hyp
ELR_hyp
SPSR_fiq
CPSR
4
Classic ARM MMU
 32-bit physical address space
 2-level translation tables
 Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings
but with restrictions to the user/kernel memory split)
 32-bit page table entries
 1st
level contains 4096 entries (4 pages for PGD)
 1MB section per entry or
 Pointer to a 2nd
level table
 Implementation-defined 16MB supersections
 2nd
level contains 256 entries pointing to 4KB page each
 1KB per 2nd
level page table
 ARMv6/v7 introduced TEX remapping
 Memory type becomes a 3-bit index
5
Classic ARM MMU (cont’d)
 Other features
 XN (eXecute Never) bit
 Different memory types: Normal (cacheable and non-cacheable),
Device, Strongly Ordered
 Shareability attributes for SMP systems
 ASID-tagged TLB (ARMv6 onwards)
 Avoids TLB flushing at context switch
 8-bit ASID value assigned to an mm_struct
 Dynamically allocated (there can be more than 256 processes)
6
Classic ARM MMU Limitations
 Only 32-bit physical address space
 Growing market requiring more than 4GB physical address space
(both RAM and peripherals)
 Supersections can be used to allow up to 40-bit addresses using
16MB sections (implementation-defined feature)
 Prior to ARMv6, not a direct link between access permissions
and Linux PTE bits
 Simplified permission model introduced with ARMv6 but not used by
Linux
 2nd
level page table does not fill a full 4K page
 ARM Linux workarounds
 Separate array for the Linux PTE bits
 1st
level entry consists of two 32-bit locations pointing to 2KB 2nd
level
page table entries
7
The ARMv7-A Virtualization Extensions
Popek and Goldberg summarized the concept in 1974:
 Equivalence/Fidelity
 A program running under the hypervisor should exhibit a behaviour
essentially identical to that demonstrated when running on an
equivalent machine directly.
 Resource control / Safety
 The hypervisor should be in complete control of the virtualized
resources.
 Efficiency/Performance
 A statistically dominant fraction of machine instructions must be
executed without hypervisor intervention.
"Formal Requirements for Virtualizable Third Generation Architectures". Communications of the ACM 17
8
ARM hypervisor support philosophy
 Virtual machine (VM) scheduling and resource sharing
 New Hyp mode for Hypervisor execution
 Minimise Hypervisor intervention for “routine” GuestOS tasks
 Guest OS page table management
 Interrupt control
 Guest OS Device Drivers
 Syndrome support for trapping key instructions
 GuestOS load/store emulation
 Privileged control instructions
 System instructions (MRS, MSR) to read/write key registers
 Virtualized ID register management
9
ARM virtualization extension - modes
A new privilege layer – hypervisor mode
 Guest OS kernel given same privilege structure as for a non-
virtualized environment
 Can run the same instructions
 Hypervisor mode has higher privilege
 VMM can control a wide range of OS accesses to hardware
User Mode
(Non-privileged)
Supervisor Mode
(Privileged)
Hyp Mode
(More Privileged)
Guest Operating System1
App2App1
Guest Operating System2
App2App1
Virtual Machine Monitor (VMM) or
Hypervisor
10
Hypervisor mode and security extensions
Hypervisor mode applies to normal world
Secure world supports a single virtual machine
Monitor mode controls transition between worlds
Guest Operating System1
App2App1
Guest Operating System2
App2App1
Virtual Machine Monitor (VMM) or
Hypervisor
TrustZone Monitor
Secure World OS
Trusted App2Trusted App1
11
ARMv7-A Virtualization - key features 1
 Trap and control support (HYP mode)
 Rich set of trap options (TLB/cache ops, ID groups, instructions)
 Syndrome register support
 EC: exception class (instr type, I/D abort into/within HYP)
 IL: instruction length (0 == 16-bit; 1 == 32-bit)
 ISS: instruction specific syndrome (instr fields, reason code, ...)
3
1
2
6
2
5
2
4
0
EC I
L
ISS
12
ARMv7-A Virtualization - key features 2
 Dedicated Exception Link Register (ELR)
 Stores preferred return address on exception entry
 New instruction – ERET – for exception return from HYP mode
 Other modes overload exception model and procedure call LRs
 R14 used by exception entry BL and BLX instructions
 Address translation
13
Two layers of address translation
Stage 1 translation
owned by guest OS
Virtual address map Intermediate physical
address map
Real System Physical
address map
Stage 2 translation
owned by the VMM
Hardware has 2-stage
memory translation
Tables from Guest OS
translate VA to IPA
Second set of tables from
VMM translate IPA to PA
Allows aborts to be routed to
appropriate software layer
14
LPAE – Stage 1 (VA => {I}PA)
 Managed by the OS
 64-bit descriptors, 512 entries per table
 4KB table size == 4KB page size
 1 or 2 Translation Table Base Registers
 3 levels of table supported
 up to 9 address bits per level
 2 bits at Level 1
 9 bits at Levels 2 and 3
 Page Table Entry:
32-bitVirtual
Address
TTBR1
space
TTBR0
space
Not mapped
(Fault)
232- TTCR.T1SZ
232-TTCR.T0SZ
0
232
T1SZ ≤ T0SZ
If T1SZ == T0SZ == 0 then TTBR0 always used
31:30 VA Bits <29:21> VA Bits <20:12> VA Bits <11:0>
L1 Level 2 table index Level 3 table (page) index Page offset address
6
3
5
2
4
0
3
0
1
2
2 1 0
Upper attributes SBZ Address out SBZ Lower attributes
Valid
Block/TableBlock/page
• Memory type
• Cache policy
• Shareability, AF, nG, NS flags
• Access permissions
•Table override bits (NS, XN, PXN, AP[1:0])
• Block/Page XN, PXN bits
• Block/Page 16x entry contiguous hint
• IGNORED bits
15
LPAE – Stage 2 (IPA => PA)
 Managed by the VMM / hypervisor
 Same table walk scheme as stage 1, now up to 40-bit input address:
 2x contiguous 4KB tables allowed at L1; support 240
address space
 Up to 1-16 contiguous tables allowed at L2; support 230
- 234
address space
 1x Translation Table Base register (VTBR)
 Page table entry:
6
3
5
2
4
0
3
0
1
2
2 1 0
Upper attributes SBZ Address out SBZ Lower attributes
Valid
Block/Table
Block / page only
• XN bit
• 16x entry contiguous hint
• IGNORED bits
Block/page
• Memory type
• Cache policy
• Shareability, AF bit
• R/W Access permissions
3
9
3
4
3
0
2
1
1
2
0
IPA Bits <39:30> IPA Bits <29:21> IPA Bits <20:12> IPA Bits <11:0>
IPA Bits <11:0> IPA Bits <33;21> IPA Bits <20:12> IPA Bits <11:0>
16
Linux Support for
ARM LPAE
Catalin Marinas
ELC Europe 2011
17
ARM LPAE Features
 40-bit physical addresses (1TB)
 40-bit intermediate physical addresses (guest PA space)
 3-level translation tables
 Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings)
 Not as restrictive on user/kernel memory split (can use 3:1)
 With 1GB kernel mapping, the 1st
level is skipped
 64-bit entries in each level
 1st
level contains 4 entries (stage 1 translation)
 1GB section or
 Pointer to 2nd
level table
 2nd
level contains 512 entries (4KB in total)
 2MB section or
 Pointer to 3rd
level
18
ARM LPAE Features (cont’d)
 3rd
level contains 512 entries (4KB)
 Each addressing a 4KB range
 Possibility to set a contiguity flag for 16 consecutive pages
 LDRD/STRD (64-bit load/store) instructions are atomic on
ARM processors supporting LPAE
 Only the simplified page permission model is supported
 No kernel RW and user RO combination
 Domains are no longer present (they have already been
removed in ARMv7 Linux)
 Additional spare bits to be used by the OS
 Dedicated bits for user, read-only and access flag (young)
settings
19
ARM LPAE Features (cont’d)
 ASID is part of the TTBR0 register
 Simpler context switching code (no need to deal with speculative TLB
fetching with the wrong ASID)
 The Context ID register can be used solely for debug/trace
 Additional permission control
 PXN – Privileged eXecute Never
 SCTLR.WXN, SCTLR.UWXN – prevent execution from writable
locations (the latter only for user accesses)
 APTable – restrict permissions in subsequent page table levels
 XNTable, PXNTable – override XN and PXN bits in subsequent page
table levels
 New registers for the memory region attributes
 MAIR0, MAIR1 – 32-bit Memory Attribute Indirection Registers
 8 memory types can be configured at a time
20
ARM LPAE Features
1GB block
Table ptr
1st level
table
VA[31:30]
TTBRx
4KB
2MB block
Table ptr
2nd level
table
VA[29:21]
4KB
PTE
PTE
3rd level
table
VA[20:12]
4KB
64-bit entry 64-bit entry64-bit entry
PGD PMD PTE
21
Linux and ARM LPAE
 Linux + ARM LPAE has the same memory layout as the
classic MMU implementation
 Described in Documentation/arm/memory.txt
 0..TASK_SIZE – user space
 PAGE_OFFSET-16M..PAGE_OFFSET-2M – module space
 PAGE_OFFSET-2M..PAGE_OFFSET – highmem mappings
 Highmem is already supported by the classic MMU
 Memory beyond 4G is only accessible via highmem
 Page mapping functions use pfn (32-bit variable, PAGE_SHIFT == 12,
maximum 44-bit physical addresses)
 Original ARM kernel port assumed 32-bit physical addresses
 LPAE redefines phys_addr_t, dma_addr_t as u64 (generic typedefs)
 New ATAG_MEM64 defined for specifying the 64-bit memory layout
22
Linux and ARM LPAE (cont’d)
 Hard-coded assumptions about 2 levels of page tables
 PGDIR_(SHIFT|SIZE|MASK) references converted to PMD_*
 swapper_pg_dir extended to cover both 1st
and 2nd
levels of
page tables
 1 page for PGD (only 4 entries used for stage 1 translations)
 4 pages for PMD
 init_mm.pgd points to swapper_pg_dir
 TTBR0 used for user mappings and always points to PGD
 TTBR1 used for kernel mapping:
 3:1 split – TTBR1 points to 4th
page of PMD (2 levels only, 1GB)
 Classic MMU does not allow the use of TTBR1 for the 3:1 split
 2:2 split – TTBR1 points to 3rd
PGD entry
 1:3 split – TTBR1 points to 1st
PGD entry
23
Linux and ARM LPAE (cont’d)
 The page table definitions have been separated into
pgtable*-2level.h and pgtable*-3level.h files
 Few PTE bits shared between classic and LPAE definitions
 Negated definitions of L_PTE_EXEC and L_PTE_WRITE to match the
corresponding hardware bits
 Memory types are the same and they represent an index in the TEX
remapping registers (PRRR/NMRR or MAIR0/MAIR1)
 The proc-v7.S file has been duplicated into proc-v7lpae.S
 Different register setup for TTBRx and TEX remapping (MAIRx)
 Simpler cpu_v7_set_pte_ext (1:1 mapping between hardware and
software PTE bits)
 Simpler cpu_v7_switch_mm (ASID switched with TTBR0)
 Current ARM code converted to pgtable-nopud.h
 Not using „nopmd‟ with classic MMU
24
Linux and ARM LPAE (cont’d)
 Lowmem is mapped using 2MB sections in the 2nd
level table
 PGD and PMD only allocated from lowmem
 PTE tables can be allocated from highmem
 Exception handling
 Different IFSR/DFSR registers structure and exception numbering –
arch/arm/mm/fault.c modified accordingly
 Error reporting includes PMD information as well
 PGD allocation/freeing
 Kernel PGD entries copied to the new PGD during pgd_alloc()
 Modules and pkmap entries added to the PMD during fault handling
 Identity mapping (PA == VA)
 Required for enabling or disabling the MMU – secondary CPU
booting, CPU hotplug, power management, kexec
25
Linux and ARM LPAE (cont’d)
 Uses pgd_alloc() and pgd_free()
 When PHYS_OFFSET > PAGE_OFFSET, kernel PGD entries may be
overridden
 swapper_pg_dir entries marked with an additional bit –
L_PGD_SWAPPER
 pgd_free() skips such entries during clean-up
26
Current Status and Future
 Initial development done on software models
 Tested on real hardware (FPGA and silicon)
 Parts of the LPAE patch set already in mainline
 Mainly preparatory patches, not core functionality
 Aiming for full support in Linux 3.3
 Hardware supporting LPAE
 ARM Cortex-A15, Cortex-A7 processors
 TI OMAP5 (dual-core ARM Cortex-A15)
 Other developments
 KVM support for Cortex-A15 – implemented by Christoffer Dall at
Columbia University
 Uses the Virtualisation extensions together with the LPAE stage 2
translations
27
Reference
 ARM Architecture Reference Manual rev C
 Currently beta, not publicly available yet
 Specifications publicly available on ARM Infocenter
 http://infocenter.arm.com/
 ARM architecture -> Reference Manuals -> ARMv7-AR LPA
Virtualisation Extensions
 Linux patches – ARM architecture development tree
 Hosts the latest ARM architecture developments before patches are
merged into the mainline kernel
 git://github.com/cmarinas/linux.git
 When the kernel.org accounts are back
 git://git.kernel.org/pub/scm/linux/cmarinas/linux-arm-arch.git
28
ARM Architecture
System Features
29
Interrupt Controller
 ARM standardising on the “Generic Interrupt Controller” [GIC]
 Supports the ARMv7 Security Extension
 Interrupt groups support (Nonsecure) IRQ and (Secure) FIQ split
 Distributor and cpu interface functionality
 GICv2 adds support for a virtualized cpu interface
 VMM manages physical interface & queues entries for each VM.
 GuestOS (VM) configures, acknowledges, processes and completes
interrupts directly. Traps to VMM where necessary.
 Hypervisor can virtualize FIQs from the physical (Nonsecure, IRQ)
interface.
30
Interrupt Handling – GICv2
Secure handler
(monitor mode)
or
OS FIQ handler
OS IRQ handler
GuestOS IRQ handler
GuestOS FIQ handler
VirtualCpu InterfaceN
• cpu-specific control + state
• Interrupt ACK / END (retire)
Virtual Grp0: FIQ
Virtual Grp1: IRQ
Hypervisor – virtualized distributor
• Grp1 physical => Grp1/Grp0 virtual
Virtual interrupt management
• interrupt queues (list registers)
Virtualization support
* ARM Architecture Security Extns
NEW
Distributor - control and configuration
• Assignment (≤8 cpu interfaces)
• Interrupt grouping (Grp0, Grp1)
Grp0: (Secure*, FIQ)
Grp1: IRQ always
Cpu Interface0
• cpu-specific control + state
• Interrupt ACK / END (retire)
Grp0: (Secure*, FIQ)
Grp1: IRQ always
GICv1
Cpu InterfaceN
• cpu-specific control + state
• Interrupt ACK / END (retire)
• software
• private (per cpu) peripheral
• shared peripheral
Interrupt sources:
31
Generic Timer
 Shared “always on” counter
 Fast read access for reliable
distribution of time.
 ≥ ComparedValue or ≤ 0 timer
event configuration support
 Maskable interrupts
 Offset capability for virtual time
 Event stream support
 Hypervisor trap support
SoC
Counter
Power
Controller
CPU Cluster1
CPU0
Timer0
CPU1
Timer1
Counter Interface
Memory Interconnect and Memory Controller
Shared Cache
$ $
Timer Distribution
Bus
Peripheral Bus
Always Powered
Power Domain
GIC
CPU Cluster0
CPU0
Timer0
CPU1
Timer1
Counter Interface
Shared Cache
$ $
GIC
Systemevents
Implementation example:
32
System MMU
Local Coherence Bus (no snooping on
bus)
“Event pulse” enabling next cycle
notifications
 System MMU architecture translation options from pre-set tables
 No Translation
 VA -> IPA (Stage 1) or IPA->PA (Stage 2) only
 VA->PA (Stage1 and Stage2)
 Can share page tables with ARM cores – relate contexts to VMIDs/ASIDs
L1
I-cache
L1
D-cache
Core0
L1
I-cache
L1
D-cache
CoreN
L2 cache
System interconnect
DMA
System MMU System Memory
33
Quad
Cortex-A15
Quad
Cortex-A15
AMBA 4 Cache Coherent Interconnect
CCI-400
I/O
device
MMU-400
DMC-400
Dynamic Memory Controller
AXI Network
Interconnect
Slaves Slaves
AXI Network
Interconnect
LCDVideo
DDR3/
LPDDR2
DDR3/
LPDDR2
PHY
GIC-400
Mali
Graphics
PHY
MMU-400 MMU-400
Completing the SoC
IPA
PA
Intermediate Physical Address
(IPA)
The address the OS believes
it‟s accessing
Physical Address (PA)
The actual address the VMM
assigns
34
To summarize: ARMv7-A additions
 A natural progression of well established features applied to
an architecture long associated with low power.
 „Classic‟ uses and solutions will apply to ARM markets:
 Low cost/low power server opportunities
 Application OS + RTOS/microkernel smart mobile devices
 Feature split by „Manufacturers‟ versus User‟ VM environment
 Automotive, home, ...
 Opportunities for new use cases?
 Today‟s entrepreneurs/ideas
=> tomorrow‟s major businesses
35
Thank you

Contenu connexe

Tendances

Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
DVClub
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
Patrick Bellasi
 
Pci express3-device-architecture-optimizations-idf2009-presentation
Pci express3-device-architecture-optimizations-idf2009-presentationPci express3-device-architecture-optimizations-idf2009-presentation
Pci express3-device-architecture-optimizations-idf2009-presentation
jkcontee
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
DVClub
 

Tendances (20)

ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
Pcie drivers basics
Pcie drivers basicsPcie drivers basics
Pcie drivers basics
 
PCIe
PCIePCIe
PCIe
 
Spi drivers
Spi driversSpi drivers
Spi drivers
 
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA CampPCIe Gen 3.0 Presentation @ 4th FPGA Camp
PCIe Gen 3.0 Presentation @ 4th FPGA Camp
 
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)SFO15-TR9: PSCI, ACPI (and UEFI to boot)
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
 
Verification Strategy for PCI-Express
Verification Strategy for PCI-ExpressVerification Strategy for PCI-Express
Verification Strategy for PCI-Express
 
Linux Porting to a Custom Board
Linux Porting to a Custom BoardLinux Porting to a Custom Board
Linux Porting to a Custom Board
 
GPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor GraphicsGPU Acceleration for Containers on Intel Processor Graphics
GPU Acceleration for Containers on Intel Processor Graphics
 
Platform Drivers
Platform DriversPlatform Drivers
Platform Drivers
 
Double data rate (ddr)
Double data rate (ddr)Double data rate (ddr)
Double data rate (ddr)
 
The Low-Risk Path to Building Autonomous Car Architectures
The Low-Risk Path to Building Autonomous Car ArchitecturesThe Low-Risk Path to Building Autonomous Car Architectures
The Low-Risk Path to Building Autonomous Car Architectures
 
Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)Embedded Linux BSP Training (Intro)
Embedded Linux BSP Training (Intro)
 
I2C Drivers
I2C DriversI2C Drivers
I2C Drivers
 
Pci express3-device-architecture-optimizations-idf2009-presentation
Pci express3-device-architecture-optimizations-idf2009-presentationPci express3-device-architecture-optimizations-idf2009-presentation
Pci express3-device-architecture-optimizations-idf2009-presentation
 
Advanced Pipelining in ARM Processors.pptx
Advanced Pipelining  in ARM Processors.pptxAdvanced Pipelining  in ARM Processors.pptx
Advanced Pipelining in ARM Processors.pptx
 
PCI Express Verification using Reference Modeling
PCI Express Verification using Reference ModelingPCI Express Verification using Reference Modeling
PCI Express Verification using Reference Modeling
 
An introduction to the linux kernel and device drivers (NTU CSIE 2016.03)
An introduction to the linux kernel and device drivers (NTU CSIE 2016.03)An introduction to the linux kernel and device drivers (NTU CSIE 2016.03)
An introduction to the linux kernel and device drivers (NTU CSIE 2016.03)
 
PCI Drivers
PCI DriversPCI Drivers
PCI Drivers
 
Linux Initialization Process (2)
Linux Initialization Process (2)Linux Initialization Process (2)
Linux Initialization Process (2)
 

Similaire à Q4.11: ARM Architecture

Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 Architecture
ChangWoo Min
 

Similaire à Q4.11: ARM Architecture (20)

Intel x86 Architecture
Intel x86 ArchitectureIntel x86 Architecture
Intel x86 Architecture
 
ARM Architecture
ARM ArchitectureARM Architecture
ARM Architecture
 
ARM_2.ppt
ARM_2.pptARM_2.ppt
ARM_2.ppt
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptx
 
ARM Cortex-M3 Training
ARM Cortex-M3 TrainingARM Cortex-M3 Training
ARM Cortex-M3 Training
 
Arm11
Arm11Arm11
Arm11
 
ARM stacks, subroutines, Cortex M3, LPC 214X
ARM  stacks, subroutines, Cortex M3, LPC 214XARM  stacks, subroutines, Cortex M3, LPC 214X
ARM stacks, subroutines, Cortex M3, LPC 214X
 
ARM Versions, architecture
ARM Versions, architectureARM Versions, architecture
ARM Versions, architecture
 
Introduction to arm architecture
Introduction to arm architectureIntroduction to arm architecture
Introduction to arm architecture
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Lecture9
Lecture9Lecture9
Lecture9
 
Processor types
Processor typesProcessor types
Processor types
 
EC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptxEC8791 ARM Processor and Peripherals.pptx
EC8791 ARM Processor and Peripherals.pptx
 
Q4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update PlenaryQ4.11: ARM Technology Update Plenary
Q4.11: ARM Technology Update Plenary
 
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
Lec12 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- P6, Netbur...
 
x86_1.ppt
x86_1.pptx86_1.ppt
x86_1.ppt
 
arm_3.ppt
arm_3.pptarm_3.ppt
arm_3.ppt
 
Unit vi (2)
Unit vi (2)Unit vi (2)
Unit vi (2)
 
It322 intro 1
It322 intro 1It322 intro 1
It322 intro 1
 

Plus de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

Plus de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Dernier

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Q4.11: ARM Architecture

  • 2. 2 ARM Architecture roadmap Announced Aug-2010: • Large PhysAddr Extn • PA => 40-bits •Virtualization Extn (v8 builds on these)
  • 3. 3 ARMv7-A registers R0 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R13 (SP) R14 (LR) R0 R0 R0 R0 R0R0 SP_svc LR_svc SP_irq LR_irq SP_und LR_und SP_fiq LR_fiq SP_abt LR_abt R8_fiq R9_fiq R10_fiq R11_fiq R12_fiq SP_hyp R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 R11 R12 R1 R2 R3 R4 R5 R6 R7 LR_(BL) CPSR bits • Flags: NZCVQ • IT[7:0] status bits • GE[3:0] Adv SIMD flags • J (Jazelle) • T (Thumb) • E (endian) • I, A, F masks • M[4:0] (mode) SPSR_svc SPSR_abt SPSR_und SPSR_irq SPSR_hyp ELR_hyp SPSR_fiq CPSR
  • 4. 4 Classic ARM MMU  32-bit physical address space  2-level translation tables  Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings but with restrictions to the user/kernel memory split)  32-bit page table entries  1st level contains 4096 entries (4 pages for PGD)  1MB section per entry or  Pointer to a 2nd level table  Implementation-defined 16MB supersections  2nd level contains 256 entries pointing to 4KB page each  1KB per 2nd level page table  ARMv6/v7 introduced TEX remapping  Memory type becomes a 3-bit index
  • 5. 5 Classic ARM MMU (cont’d)  Other features  XN (eXecute Never) bit  Different memory types: Normal (cacheable and non-cacheable), Device, Strongly Ordered  Shareability attributes for SMP systems  ASID-tagged TLB (ARMv6 onwards)  Avoids TLB flushing at context switch  8-bit ASID value assigned to an mm_struct  Dynamically allocated (there can be more than 256 processes)
  • 6. 6 Classic ARM MMU Limitations  Only 32-bit physical address space  Growing market requiring more than 4GB physical address space (both RAM and peripherals)  Supersections can be used to allow up to 40-bit addresses using 16MB sections (implementation-defined feature)  Prior to ARMv6, not a direct link between access permissions and Linux PTE bits  Simplified permission model introduced with ARMv6 but not used by Linux  2nd level page table does not fill a full 4K page  ARM Linux workarounds  Separate array for the Linux PTE bits  1st level entry consists of two 32-bit locations pointing to 2KB 2nd level page table entries
  • 7. 7 The ARMv7-A Virtualization Extensions Popek and Goldberg summarized the concept in 1974:  Equivalence/Fidelity  A program running under the hypervisor should exhibit a behaviour essentially identical to that demonstrated when running on an equivalent machine directly.  Resource control / Safety  The hypervisor should be in complete control of the virtualized resources.  Efficiency/Performance  A statistically dominant fraction of machine instructions must be executed without hypervisor intervention. "Formal Requirements for Virtualizable Third Generation Architectures". Communications of the ACM 17
  • 8. 8 ARM hypervisor support philosophy  Virtual machine (VM) scheduling and resource sharing  New Hyp mode for Hypervisor execution  Minimise Hypervisor intervention for “routine” GuestOS tasks  Guest OS page table management  Interrupt control  Guest OS Device Drivers  Syndrome support for trapping key instructions  GuestOS load/store emulation  Privileged control instructions  System instructions (MRS, MSR) to read/write key registers  Virtualized ID register management
  • 9. 9 ARM virtualization extension - modes A new privilege layer – hypervisor mode  Guest OS kernel given same privilege structure as for a non- virtualized environment  Can run the same instructions  Hypervisor mode has higher privilege  VMM can control a wide range of OS accesses to hardware User Mode (Non-privileged) Supervisor Mode (Privileged) Hyp Mode (More Privileged) Guest Operating System1 App2App1 Guest Operating System2 App2App1 Virtual Machine Monitor (VMM) or Hypervisor
  • 10. 10 Hypervisor mode and security extensions Hypervisor mode applies to normal world Secure world supports a single virtual machine Monitor mode controls transition between worlds Guest Operating System1 App2App1 Guest Operating System2 App2App1 Virtual Machine Monitor (VMM) or Hypervisor TrustZone Monitor Secure World OS Trusted App2Trusted App1
  • 11. 11 ARMv7-A Virtualization - key features 1  Trap and control support (HYP mode)  Rich set of trap options (TLB/cache ops, ID groups, instructions)  Syndrome register support  EC: exception class (instr type, I/D abort into/within HYP)  IL: instruction length (0 == 16-bit; 1 == 32-bit)  ISS: instruction specific syndrome (instr fields, reason code, ...) 3 1 2 6 2 5 2 4 0 EC I L ISS
  • 12. 12 ARMv7-A Virtualization - key features 2  Dedicated Exception Link Register (ELR)  Stores preferred return address on exception entry  New instruction – ERET – for exception return from HYP mode  Other modes overload exception model and procedure call LRs  R14 used by exception entry BL and BLX instructions  Address translation
  • 13. 13 Two layers of address translation Stage 1 translation owned by guest OS Virtual address map Intermediate physical address map Real System Physical address map Stage 2 translation owned by the VMM Hardware has 2-stage memory translation Tables from Guest OS translate VA to IPA Second set of tables from VMM translate IPA to PA Allows aborts to be routed to appropriate software layer
  • 14. 14 LPAE – Stage 1 (VA => {I}PA)  Managed by the OS  64-bit descriptors, 512 entries per table  4KB table size == 4KB page size  1 or 2 Translation Table Base Registers  3 levels of table supported  up to 9 address bits per level  2 bits at Level 1  9 bits at Levels 2 and 3  Page Table Entry: 32-bitVirtual Address TTBR1 space TTBR0 space Not mapped (Fault) 232- TTCR.T1SZ 232-TTCR.T0SZ 0 232 T1SZ ≤ T0SZ If T1SZ == T0SZ == 0 then TTBR0 always used 31:30 VA Bits <29:21> VA Bits <20:12> VA Bits <11:0> L1 Level 2 table index Level 3 table (page) index Page offset address 6 3 5 2 4 0 3 0 1 2 2 1 0 Upper attributes SBZ Address out SBZ Lower attributes Valid Block/TableBlock/page • Memory type • Cache policy • Shareability, AF, nG, NS flags • Access permissions •Table override bits (NS, XN, PXN, AP[1:0]) • Block/Page XN, PXN bits • Block/Page 16x entry contiguous hint • IGNORED bits
  • 15. 15 LPAE – Stage 2 (IPA => PA)  Managed by the VMM / hypervisor  Same table walk scheme as stage 1, now up to 40-bit input address:  2x contiguous 4KB tables allowed at L1; support 240 address space  Up to 1-16 contiguous tables allowed at L2; support 230 - 234 address space  1x Translation Table Base register (VTBR)  Page table entry: 6 3 5 2 4 0 3 0 1 2 2 1 0 Upper attributes SBZ Address out SBZ Lower attributes Valid Block/Table Block / page only • XN bit • 16x entry contiguous hint • IGNORED bits Block/page • Memory type • Cache policy • Shareability, AF bit • R/W Access permissions 3 9 3 4 3 0 2 1 1 2 0 IPA Bits <39:30> IPA Bits <29:21> IPA Bits <20:12> IPA Bits <11:0> IPA Bits <11:0> IPA Bits <33;21> IPA Bits <20:12> IPA Bits <11:0>
  • 16. 16 Linux Support for ARM LPAE Catalin Marinas ELC Europe 2011
  • 17. 17 ARM LPAE Features  40-bit physical addresses (1TB)  40-bit intermediate physical addresses (guest PA space)  3-level translation tables  Pointed to by TTBR0 (user mappings) and TTBR1 (kernel mappings)  Not as restrictive on user/kernel memory split (can use 3:1)  With 1GB kernel mapping, the 1st level is skipped  64-bit entries in each level  1st level contains 4 entries (stage 1 translation)  1GB section or  Pointer to 2nd level table  2nd level contains 512 entries (4KB in total)  2MB section or  Pointer to 3rd level
  • 18. 18 ARM LPAE Features (cont’d)  3rd level contains 512 entries (4KB)  Each addressing a 4KB range  Possibility to set a contiguity flag for 16 consecutive pages  LDRD/STRD (64-bit load/store) instructions are atomic on ARM processors supporting LPAE  Only the simplified page permission model is supported  No kernel RW and user RO combination  Domains are no longer present (they have already been removed in ARMv7 Linux)  Additional spare bits to be used by the OS  Dedicated bits for user, read-only and access flag (young) settings
  • 19. 19 ARM LPAE Features (cont’d)  ASID is part of the TTBR0 register  Simpler context switching code (no need to deal with speculative TLB fetching with the wrong ASID)  The Context ID register can be used solely for debug/trace  Additional permission control  PXN – Privileged eXecute Never  SCTLR.WXN, SCTLR.UWXN – prevent execution from writable locations (the latter only for user accesses)  APTable – restrict permissions in subsequent page table levels  XNTable, PXNTable – override XN and PXN bits in subsequent page table levels  New registers for the memory region attributes  MAIR0, MAIR1 – 32-bit Memory Attribute Indirection Registers  8 memory types can be configured at a time
  • 20. 20 ARM LPAE Features 1GB block Table ptr 1st level table VA[31:30] TTBRx 4KB 2MB block Table ptr 2nd level table VA[29:21] 4KB PTE PTE 3rd level table VA[20:12] 4KB 64-bit entry 64-bit entry64-bit entry PGD PMD PTE
  • 21. 21 Linux and ARM LPAE  Linux + ARM LPAE has the same memory layout as the classic MMU implementation  Described in Documentation/arm/memory.txt  0..TASK_SIZE – user space  PAGE_OFFSET-16M..PAGE_OFFSET-2M – module space  PAGE_OFFSET-2M..PAGE_OFFSET – highmem mappings  Highmem is already supported by the classic MMU  Memory beyond 4G is only accessible via highmem  Page mapping functions use pfn (32-bit variable, PAGE_SHIFT == 12, maximum 44-bit physical addresses)  Original ARM kernel port assumed 32-bit physical addresses  LPAE redefines phys_addr_t, dma_addr_t as u64 (generic typedefs)  New ATAG_MEM64 defined for specifying the 64-bit memory layout
  • 22. 22 Linux and ARM LPAE (cont’d)  Hard-coded assumptions about 2 levels of page tables  PGDIR_(SHIFT|SIZE|MASK) references converted to PMD_*  swapper_pg_dir extended to cover both 1st and 2nd levels of page tables  1 page for PGD (only 4 entries used for stage 1 translations)  4 pages for PMD  init_mm.pgd points to swapper_pg_dir  TTBR0 used for user mappings and always points to PGD  TTBR1 used for kernel mapping:  3:1 split – TTBR1 points to 4th page of PMD (2 levels only, 1GB)  Classic MMU does not allow the use of TTBR1 for the 3:1 split  2:2 split – TTBR1 points to 3rd PGD entry  1:3 split – TTBR1 points to 1st PGD entry
  • 23. 23 Linux and ARM LPAE (cont’d)  The page table definitions have been separated into pgtable*-2level.h and pgtable*-3level.h files  Few PTE bits shared between classic and LPAE definitions  Negated definitions of L_PTE_EXEC and L_PTE_WRITE to match the corresponding hardware bits  Memory types are the same and they represent an index in the TEX remapping registers (PRRR/NMRR or MAIR0/MAIR1)  The proc-v7.S file has been duplicated into proc-v7lpae.S  Different register setup for TTBRx and TEX remapping (MAIRx)  Simpler cpu_v7_set_pte_ext (1:1 mapping between hardware and software PTE bits)  Simpler cpu_v7_switch_mm (ASID switched with TTBR0)  Current ARM code converted to pgtable-nopud.h  Not using „nopmd‟ with classic MMU
  • 24. 24 Linux and ARM LPAE (cont’d)  Lowmem is mapped using 2MB sections in the 2nd level table  PGD and PMD only allocated from lowmem  PTE tables can be allocated from highmem  Exception handling  Different IFSR/DFSR registers structure and exception numbering – arch/arm/mm/fault.c modified accordingly  Error reporting includes PMD information as well  PGD allocation/freeing  Kernel PGD entries copied to the new PGD during pgd_alloc()  Modules and pkmap entries added to the PMD during fault handling  Identity mapping (PA == VA)  Required for enabling or disabling the MMU – secondary CPU booting, CPU hotplug, power management, kexec
  • 25. 25 Linux and ARM LPAE (cont’d)  Uses pgd_alloc() and pgd_free()  When PHYS_OFFSET > PAGE_OFFSET, kernel PGD entries may be overridden  swapper_pg_dir entries marked with an additional bit – L_PGD_SWAPPER  pgd_free() skips such entries during clean-up
  • 26. 26 Current Status and Future  Initial development done on software models  Tested on real hardware (FPGA and silicon)  Parts of the LPAE patch set already in mainline  Mainly preparatory patches, not core functionality  Aiming for full support in Linux 3.3  Hardware supporting LPAE  ARM Cortex-A15, Cortex-A7 processors  TI OMAP5 (dual-core ARM Cortex-A15)  Other developments  KVM support for Cortex-A15 – implemented by Christoffer Dall at Columbia University  Uses the Virtualisation extensions together with the LPAE stage 2 translations
  • 27. 27 Reference  ARM Architecture Reference Manual rev C  Currently beta, not publicly available yet  Specifications publicly available on ARM Infocenter  http://infocenter.arm.com/  ARM architecture -> Reference Manuals -> ARMv7-AR LPA Virtualisation Extensions  Linux patches – ARM architecture development tree  Hosts the latest ARM architecture developments before patches are merged into the mainline kernel  git://github.com/cmarinas/linux.git  When the kernel.org accounts are back  git://git.kernel.org/pub/scm/linux/cmarinas/linux-arm-arch.git
  • 29. 29 Interrupt Controller  ARM standardising on the “Generic Interrupt Controller” [GIC]  Supports the ARMv7 Security Extension  Interrupt groups support (Nonsecure) IRQ and (Secure) FIQ split  Distributor and cpu interface functionality  GICv2 adds support for a virtualized cpu interface  VMM manages physical interface & queues entries for each VM.  GuestOS (VM) configures, acknowledges, processes and completes interrupts directly. Traps to VMM where necessary.  Hypervisor can virtualize FIQs from the physical (Nonsecure, IRQ) interface.
  • 30. 30 Interrupt Handling – GICv2 Secure handler (monitor mode) or OS FIQ handler OS IRQ handler GuestOS IRQ handler GuestOS FIQ handler VirtualCpu InterfaceN • cpu-specific control + state • Interrupt ACK / END (retire) Virtual Grp0: FIQ Virtual Grp1: IRQ Hypervisor – virtualized distributor • Grp1 physical => Grp1/Grp0 virtual Virtual interrupt management • interrupt queues (list registers) Virtualization support * ARM Architecture Security Extns NEW Distributor - control and configuration • Assignment (≤8 cpu interfaces) • Interrupt grouping (Grp0, Grp1) Grp0: (Secure*, FIQ) Grp1: IRQ always Cpu Interface0 • cpu-specific control + state • Interrupt ACK / END (retire) Grp0: (Secure*, FIQ) Grp1: IRQ always GICv1 Cpu InterfaceN • cpu-specific control + state • Interrupt ACK / END (retire) • software • private (per cpu) peripheral • shared peripheral Interrupt sources:
  • 31. 31 Generic Timer  Shared “always on” counter  Fast read access for reliable distribution of time.  ≥ ComparedValue or ≤ 0 timer event configuration support  Maskable interrupts  Offset capability for virtual time  Event stream support  Hypervisor trap support SoC Counter Power Controller CPU Cluster1 CPU0 Timer0 CPU1 Timer1 Counter Interface Memory Interconnect and Memory Controller Shared Cache $ $ Timer Distribution Bus Peripheral Bus Always Powered Power Domain GIC CPU Cluster0 CPU0 Timer0 CPU1 Timer1 Counter Interface Shared Cache $ $ GIC Systemevents Implementation example:
  • 32. 32 System MMU Local Coherence Bus (no snooping on bus) “Event pulse” enabling next cycle notifications  System MMU architecture translation options from pre-set tables  No Translation  VA -> IPA (Stage 1) or IPA->PA (Stage 2) only  VA->PA (Stage1 and Stage2)  Can share page tables with ARM cores – relate contexts to VMIDs/ASIDs L1 I-cache L1 D-cache Core0 L1 I-cache L1 D-cache CoreN L2 cache System interconnect DMA System MMU System Memory
  • 33. 33 Quad Cortex-A15 Quad Cortex-A15 AMBA 4 Cache Coherent Interconnect CCI-400 I/O device MMU-400 DMC-400 Dynamic Memory Controller AXI Network Interconnect Slaves Slaves AXI Network Interconnect LCDVideo DDR3/ LPDDR2 DDR3/ LPDDR2 PHY GIC-400 Mali Graphics PHY MMU-400 MMU-400 Completing the SoC IPA PA Intermediate Physical Address (IPA) The address the OS believes it‟s accessing Physical Address (PA) The actual address the VMM assigns
  • 34. 34 To summarize: ARMv7-A additions  A natural progression of well established features applied to an architecture long associated with low power.  „Classic‟ uses and solutions will apply to ARM markets:  Low cost/low power server opportunities  Application OS + RTOS/microkernel smart mobile devices  Feature split by „Manufacturers‟ versus User‟ VM environment  Automotive, home, ...  Opportunities for new use cases?  Today‟s entrepreneurs/ideas => tomorrow‟s major businesses