SlideShare une entreprise Scribd logo
1  sur  42
SOFTWARE & SYSTEMS
DESIGN
3 – Instruction Sets
AGENDA
• Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 2
Pipelines
Cycle Counting
INSTRUCTION SET
• ARM instruction set
– All instructions are 32-bit
– Most instructions can be executed conditionally
• Thumb instruction set
– 16-bit instruction set
No condition execution (except for branches)
AAETC3v00
Instruction Sets 3
– 16-bit instruction set
– No condition execution (except for branches)
– Optimized for code density from C code (~65% of ARM code size)
• Thumb-2 technology
– Extension to Thumb instruction set
– Mix of 16-bit and 32-bit instructions
– Condition execution via IT instruction
– Higher performance than Thumb and smaller than ARM
ASSEMBLER SYNTAX
• Data processing instructions
<operation><condition> Rd, Rm, <op2>
ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6
ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6
SUBS r5, r7, #4 // r5 = r7 – 4; set flags
MOV r4, #7 // r4 = 7
• Memory access instructions
AAETC3v00
Instruction Sets 4
• Memory access instructions
<operation><size> Rd, [<address>]
LDR r0, [r6, #4] // r0 = *(r6 + 4)
STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8
<operation><addressing mode> <Rn>!, <registers list>
LDMIA r0, {r1, r2, r7}
STMFD sp!, {r4-r11, lr}
• Program flow instructions
<branch> <label>
BL foo
B baR
DATA PROCESSING INSTRUCTIONS
• These instructions operate on the contents of registers
– They DO NOT affect memory
arithmetic logical move
manipulation
(has destination
register)
ADD
ADC
SUB
SBC
RSB
RSC
AND EOR MOV
ORR
ORN
BIC
T2T2
MVN
AAETC3v00
Instruction Sets 5
• Syntax:
<Operation>{S}{<cond>} {Rd,} Rn, Operand2
• Examples:
ADD r0, r1, r2 ; r0 = r1 + r2
TEQ r0, r1 ; if r0 = r1, Z flag will be set
MOV r0, r1 ; copy r1 to r0
comparison
(set flags only)
CMN
(ADDS)
CMP
(SUBS)
TST
(ANDS)
TEQ
(EORS)
MULTIPLY / DIVIDE
• 32-bit multiplication 64-bit multiplication
××××
Rn Rm
+
××××
Rn Rm
Ra
+/-
optional
accumulation
optional
accumulation
MUL
MLA
MLS
UMULL
SMULL
UMLAL
SMLAL
AAETC3v00
Instruction Sets 6
Examples:
MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2)
[U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3
Division:
SDIV r0, r1, r2 ; signed: r0 = r1 / r2
UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2
RdHi RdLoRdMLS
SMLAL
Optional in 7-A
BIT MANIPULATION INSTRUCTIONS
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
031
0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0
031
BFI r0, r0, #9, #6 ; Bit Field Insert
UBFX r1, r0, #18, #7 ; Bit Field Extract
1 1 0 1 0 0
1 0 1 0 011 1 0 1 0 0
r0
r0
AAETC3v00
Instruction Sets 7
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
031
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
BFC r1, #3, #4 ; Bit Field Clear
0
RBIT r2, r1 ; Reverse Bit Order
0
Zero extend
r1
r2
031
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1
BYTE REVERSAL
• Byte Reversal Instructions
REV{cond} Rd, Rm Reverses the bytes in a word
REV16{cond} Rd, Rm Reverses the bytes in each halfword
3 2 01 0 1 32
REV r0, r0
AAETC3v00
Instruction Sets 8
REV16{cond} Rd, Rm Reverses the bytes in each halfword
REVSH{cond} Rd, Rm Reverses the bottom two bytes,
and sign extends to 32 bits
V6 and later
REV r0, r0
Pre-V6
EOR r1, r0, r0, ROR #16
BIC r1, r1, #0xFF0000
MOV r0, r0, ROR #8
EOR r0, r0, r1, LSR #8
SIMD
• ARMv6 added a number of instructions which perform SIMD (Single Instruction
Multiple Data) operations using ARM registers
– Includes instructions for addition, subtraction, multiplication and sum of absolute
differences
– Instructions can work on four 8-bit quantities, or two 16-bit quantities
– Signed/unsigned and saturating versions available of many instructions
– CPSR GE bits used instead of normal ALU flags
UADD16 Rd, Rm, Rs
AAETC3v00
Instruction Sets 9
• There are instructions for packing (PKHBT/PKHTB) and unpacking
(UXTH/UXTB) registers
+
Rs
+
Rm
UADD16 Rd, Rm, Rs
Rd
GE[3:2] GE[1:0]
SATURATED MATH AND CLZ
• Support for Saturated Arithmetic
– Targeted at DSP & control applications
– Overflow sets Q flag (sticky) not V, and sets result to +/- max value
QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn)
QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn)
0x0
0x7FFFFFFF
0x80000000
-ve
+ve
AAETC3v00
Instruction Sets 10
QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm
- saturate(Rn * 2))
QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm
+ saturate(Rn * 2))
• Count Leading Zeros
CLZ{cond} Rd, Rm
– Returns number of unset bits before the most significant set bit
031
0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0
CLZ returns 10 in this case
SATURATION
• Saturate a value to a specified bit position (effectively saturating to any
power of 2)
– USAT - Unsigned saturate 32-bit
• Syntax: USAT Rd, #sat, Rm {shift}
• Operation: Rd = Saturate(Shift(Rm), #sat)
0 0 1 1 1
saturation position
max
(unsigned saturation)
max min
AAETC3v00
Instruction Sets 11
– Variants
SSAT - signed saturation
USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed)
SSAT16 - signed saturation of two 16-bit halfwords (no rotation
allowed)
– #sat is specified as an immediate value in the range 0 to 31
– {shift} is optional and is limited to LSL or ASR
– Q flag is set if saturation occurs
0 0 0 1 1
max
1 1 1 0 0
min
(signed saturation)
SINGLE / DOUBLE REGISTER DATA
TRANSFER
• Use to move data between one or two registers and memory
LDRD STRD Doubleword
LDR STR Word
LDRB STRB Byte
LDRH STRH Halfword
LDRSB Signed byte load
LDRSH Signed halfword load
Memory
31 0
AAETC3v00
Instruction Sets 12
• Syntax:
– LDR{<size>}{<cond>} Rd, <address>
– STR{<size>}{<cond>} Rd, <address>
• Example:
– LDRB r0, [r1] ; load bottom byte of r0 from the
; byte of memory at address in r1
Any remaining space
zero filled or sign extended
Rd
ADDRESSING MEMORY
• The address accessed by LDR/STR is specified by a base register with
an optional offset
– Base register only (no offset)
LDR r0, [r1]
– Base register plus constant
LDR r0, [r1, #8] r2, LSL #2
AAETC3v00
Instruction Sets 13
LDR r0, [r1, #8]
– Base register, plus register (optionally shifted by an immediate value)
LDR r0, [r1, r2]
LDR r0, [r1, r2, LSL #2]
– The offset can be either added or
subtracted from the base register
LDR r0, [r1, #-8]
LDR r0, [r1, -r2]
LDR r0, [r1, -r2, LSL #2]
+/-
r1 #8
r0
memory
address
r2, LSL #2
or
PRE- AND POST-INDEXED ADDRESSING
• Post-indexed (add offset after
memory access)
LDR r0, [r1], #12
• Pre-indexed (add offset before
memory access)
LDR r0, [r1, #12]{!}
+
r1 #12
address
r1
address
AAETC3v00
Instruction Sets 14
r0
memory
r0
memory
+
r1
#12
r1
• If ‘!’ present, update base register (r1) • Always update base register (r1)
+
r1
#12
r1
• These instructions move data between multiple registers and memory
• Syntax
<LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list>
• 4 addressing modes
• Increment after/before
• Decrement after/before
MULTIPLE REGISTER DATA TRANSFER
(IA)
r1 Increasing
r4 r1
r4
r0
IB DA DB
AAETC3v00
Instruction Sets 15
• Also
PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register
• Example
LDM r10, {r0,r1,r4} ; load registers, using r10 base
PUSH {r4-r6,pc} ; store registers, using SP base
Increasing
Addressr0
r1
r4
r0 r1
r4
r0
r10Base Register (Rb)
INSTRUCTIONS FOR LOADING
CONSTANTS
• The assembler provides some instructions for loading
values into registers
– These are the recommended mechanisms for loading
constants into registers
• PC- or register-relative constants
ADR Rn, label
• Add or subtract an immediate value
to or from the PC to generate the
• Absolute constants
LDR Rn, =<constant>
LDR Rn, =label
AAETC3v00
Instruction Sets 16
to or from the PC to generate the
address of the label into the
specified register, using one
instruction
• ADRL pseudo instruction uses two
instructions, giving a better range
• Can be used to generate addresses
for position independent code (but
only if in same code section)
• Constant determined at run time
• Pseudo instruction
• Assembler will use optimal sequence to
generate constant into specified register
(one of MOV, MVN or an LDR from a
literal pool)
• Can load to the PC, causing a branch
• Use for absolute addressing and
references outside the current section
(resulting in position dependent code)
• Constant determined at assembly or
link time
LDR= EXAMPLES
• The following examples show how the LDR= pseudo instruction
makes code more readable, portable and flexible
LDR r0, =0x2543 MOV r0, #0x2543
DisassemblyCode
AAETC3v00
Instruction Sets 17
LDR r0, =0xFFFF43FF
LDR r0, =0xFFFFF5
MVN r0, #0xBC00
LDR r0, [pc, #xx]
...
DCD 0xFFFFF5
BRANCH INSTRUCTIONS
• Branch instructions have the following format
B{<cond>} label
– Might not cause a pipeline flush (branch prediction)
– Branch range depends on instruction set and width
• A BL instruction additionally generates a return address in r14 (lr)
– Returning is performed by restoring the program counter (pc) from lr
AAETC3v00
Instruction Sets 18
– Returning is performed by restoring the program counter (pc) from lr
:
BL func2
:
:
BX lr
func1 func2
void func1 (void)
{
:
func2();
:
}
BRANCH RANGES
• The range of a branch instruction depends on which instruction set
is being used
• It also varies between different types of branch
ARM Thumb
B ±32MB ±16MB
CBZ/CBNZ 126 bytes
AAETC3v00
Instruction Sets 19
CBZ/CBNZ 126 bytes
BL/BLX (imm) ±32MB ±16MB
BLX (reg) Any Any
BX Any Any
TBB 510 bytes
TBH 131070 bytes
“Any” indicates an instruction which can branch to any address in the 4GB address space
READING AND WRITING PC
• In general, writing PC causes a branch to the value written
– Bit zero controls the execution state (ARM or Thumb) at the destination
– The bottom bit of the destination address is always forced to zero
– Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior
– Note that architectures prior to ARMv7 do not change state when the PC is written
directly
AAETC3v00
Instruction Sets 20
• Loading PC from memory behaves similarly
– Architectures prior to ARMv5T do not change state when the PC is loaded from memory
• The PC reads as the address of the current instruction plus an offset
– In ARM state, the offset is 8
– In Thumb state, the offset is 4
– This reflects the 3-stage structure of the ARM7TDMI pipeline
– In Thumb state, the bottom bit always reads as zero
– In ARM state, the bottom two bits will always read as zero
CHANGING STATE
• Changing between ARM and Thumb states (or “interworking”) can be carried out
using the Branch Exchange instruction
BX Rn
BLX RN
– Bit 0 of Rn determines the exchange behavior
• Unset (0) - change to (or remain in) ARM state
• Set (1) - change to (or remain in) Thumb state
AAETC3v00
Instruction Sets 21
• Branch and Link with Exchange
– Used to branch to a subroutine which is known to be in the opposite instruction set
– When branching to imported labels use BL, the linker will substitute BLX if necessary
BLX offset ; ARM/Thumb instruction which always
; changes state (and sets LR)
• All instructions which modify the PC can cause a state change
– Depending on bit 0 of the result
– For data processing instructions, state changes only if S variant not used
IF-THEN
• Thumb only, makes the next 1-4 instructions
conditional
• Syntax
IT{T|E}{T|E}{T|E} <cond>
– Any condition code may be used
– Doesn’t affect condition flags
– 16-bit instructions in the IT block do not affect condition
; if (r0 == 0)
; r0 = *r1 + 2;
; else
; r0 = *r2 + 4;
; if
CMP r0, #0
ITTEE EQ
AAETC3v00
Instruction Sets 22
– 16-bit instructions in the IT block do not affect condition
flags (except CMP, CMN & TST)
– 32-bit instructions do affect condition flags (normal rules
apply)
– No need to write this instruction: the assembler will insert
it for you where necessary
• Current “if-then status” stored in CPSR
– Conditional block may be safely interrupted and returned
to
– Not recommended to branch into or out of
‘if-then’ block
ITTEE EQ
; then
LDREQ r0, [r1]
ADDEQ r0, #2
; else
LDRNE r0, [r2]
ADDNE r0, #4
STATUS REGISTER ACCESS
• MRS and MSR allow contents of CPSR/SPSR to be transferred
to/from a general purpose register or be set to an immediate value
– MSR allows the whole status register, or just parts of it, to be updated
MRS r0,CPSR ; read CPSR into r0
BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ
MSR CPSR_c,r0 ; write modified value to ‘c’ byte only
AAETC3v00
Instruction Sets 23
• CPS can be used to directly modify some bits in the CPSR
– These are related to interrupt enable/disable and operating mode
• SETEND instruction selects the endianness of data accesses
– For use in systems with mixed endian data (e.g. peripherals)
SETEND BE
LDR r0, [r7], #4 ; big-endian
SETEND LE
LDR r1, [r7], #4 ; little-endian
User mode programs may
read all bits of CPSR but
may only change the flag
bits
SYSTEM CONTROL INSTRUCTIONS
• ARM uses coprocessors for “internal functions” so as not to enforce
a particular memory map
– System Control Coprocessor: cp15
• Used for processor configuration: System ID, caches, MMU, TCMs, etc.
– Debug Coprocessor: cp14
• Can be used to access debug control registers
AAETC3v00
Instruction Sets 24
• Can be used to access debug control registers
– VFP and NEON: cp10 and cp11
• In earlier versions of the architecture, designers were permitted to
add external coprocessors
– This is not permitted in ARMv7 architecture profiles
AGENDA
Instruction Sets
• VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 25
Pipelines
Cycle Counting
VFP ARCHITECTURE
• VFP (Vector Floating Point) is ARM’s floating point architecture
– There have been 4 versions of the architecture to date (VFPv1 is no longer
AAETC3v00
Instruction Sets 26
– There have been 4 versions of the architecture to date (VFPv1 is no longer
supported)
– VFPv2 is supported by ARM9 and ARM11 processor families
– VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles
• VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5)
– Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers
– Can be extended with half-precision conversion functions
• VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15)
– Includes half-precision conversion functions
– Supports fused multiply-add operations
THE NEON ARCHITECTURE EXTENSION
• NEON refers to the Advanced SIMD instruction set extension
– Optional extension to ARMv7-AR architecture profiles
– The NEON register set is separate from the core register bank
– NEON instruction support parallel operations on vectors of elements held in registers
– Advanced SIMDv1 is the base NEON architecture
• Can be extended with half-precision conversion functions
– Advanced SIMDv2 adds fused multiply-add operations
AAETC3v00
Instruction Sets 27
AGENDA
Instruction Sets
VFP and NEON
• Pipelines
AAETC3v00
Instruction Sets 28
• Pipelines
Cycle Counting
Fetch Decode Execute ARM7
Fetch Decode Execute Memory Writeback ARM9
Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3
Shift ALU Saturate
Address Data 1 Data 2 Writeback
Writeback
ARM1136
HISTORIC PIPELINES
AAETC3v00
Instruction Sets 29
Address Data 1 Data 2 Writeback
Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2
MAC 1 MAC 2
Address Load/Store
Writeback
Writeback
Execute 1 Execute 2
Data Engine
Writeback
Writeback
Cortex-A9
Operation
Cycle 1 2 3 4 5 6
Execute
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode
Fetch Decode Execute
Fetch Decode Execute
ADD
SUB
MOV
AND
ORR
ARM7TDMI PIPELINE (DATA PROC)
AAETC3v00
Instruction Sets 30
Fetch Decode Execute
Fetch Decode Execute
Fetch Decode
Fetch
ORR
EOR
CMP
RSB
• In this example it takes 6 clock cycles to execute 6 instructions
• All operations here are on registers ( single cycle execution )
• Clock cycles per Instruction (CPI) = 1
ARM7TDMI PIPELINE (LDR)
Cycle
Operation
1 2 3 4 5 6
ADD
SUB
LDR
FetchFetch Decode Execute
Fetch Decode Execute
Fetch Decode Execute Data Writeback
AAETC3v00
Instruction Sets 31
• In this example it takes 6 clock cycles to execute 4
instructions
• Clock cycles per Instruction (CPI) = 1.5
LDR
MOV
AND
ORR
Fetch Decode Execute Data Writeback
Fetch Decode Execute
Fetch Decode
Fetch
ARM7TDMI PIPELINE (BRANCH)
Fetch Decode
Cycle 1 2 3 4 5
0x8000 BL
0x8004 X
0x8008 XX
0x8FEC ADD
Address Operation
Linkret AdjustFetch Decode Execute
Fetch Decode
Fetch
Execute
AAETC3v00
Instruction Sets 32
• Refilling the pipeline
• Note that the core is executing in ARM state
Fetch Decode
Fetch
0x8FEC ADD
0x8FF0 SUB
0x8FF4 MOV
Execute
Decode Execute
Fetch Decode
Fetch
Cycle 1 2 3 4 5 6 7 8
IRQ
Address Operation
Fetch DecodeExecute
Linkret Adjust
Fetch
Decode
IRQ Linkret
Execute
IRQ Adjust
0x8000 ADD
0x8008 MOV
0x8004 SUB
0x800C X
Fetch
Fetch
ARM7TDMI PIPELINE (INTERRUPT)
AAETC3v00
Instruction Sets 33
0x0018 B (to 0xAF00)
0x001C XX
0x0020 XXX
0xAF00 STMFD
0xAF04 MOV
0xAF08 LDR
Fetch
Fetch
Fetch
Fetch
Fetch
Fetch
Decode
Decode
Decode
Decode
Execute
Execute
IRQ interrupt minimum latency (service routine entry) = 7 cycles
ARM9TDMI PIPELINE (LDR INTERLOCK)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
1 2 3 4 5 6 7 8
LDR R4, [R7]
9
F D E
F D E W
F D E W
F D E W
F D WE
W
I
M
S
AAETC3v00
Instruction Sets 34
• In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2
• The LDR instruction immediately followed by a data operation using the same
register causes an interlock
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback
ARM9TDMI PIPELINE (LDR)
Cycle
Operation
ADD R1, R1, R2
SUB R3, R4, R1
ORR R8, R3, R4
AND R6, R3, R1
LDR R4, [R7]
1 2 3 4 5 6 7 8 9
F D E W
F D E W
F E W
F D E W
F D WE M
D
AAETC3v00
Instruction Sets 35
• In this example it takes 6 cycles to execute 6 instructions, CPI of 1
• Cycle 4 has simultaneous I & D memory accesses
• Cycle 5 R4 data available to ORR before written to register
– Internal forwarding paths are used
EOR R3, R1, R2 F D E W
F - Fetch D - Decode E - Execute I - Interlock M - Memory
W - Writeback
CORTEX-R4 PIPELINE
Decode Issue
Pre-
Decode
Fetch2
Shift ALU Sat
MAC
1
MAC
2
Data
Cache
Data
Cache Format
Fetch1
A
G
Common decode pipeline
4 parallel back end pipelines
MAC
3
Wr
Instruction
AAETC3v00
Instruction Sets 36
FPU2
Cache
1
Cache
2
Format
FPU0 FPU1
Branch3
Wr
G
UPrefetch Unit
• Dual issue can occur for certain instruction
sequences
• Enabled at reset, can be disabled in CP15
• AGU = Address Generation Unit
• Separate divide pipeline for hardware DIV
instruction
Branch1Branch2
FPU (Optional)
Instruction
queue
CORTEX-A9 PIPELINE
Prefetch
Unit
ISS
Ex1
Ex1
WB
WB
De Re
BM
Main
(P0)
Dual
(P1)
M1
Mac
(M)
Ex2
Ex2
M2
IQ
Instruction
Address
Instruction
fetching
64
AAETC3v00
Instruction Sets 37
• IQ: Instruction Queue
• Re: Register renaming
• BM:Branch Monitor
• P0: Main execution pipeline
• M: MAC pipeline
• P1: Secondary (“dual”) execution pipeline
• AGU: Address Generation Unit
• LSU: Load/Store Unit
• DE: Data Engine - (NEON and/or FPU) pipeline
AGU WB
Data Engine
LSU
Load/store
(LS)
WB
Data Engine
(DE)
CORTEX-A15 AND CORTEX-A7
Fetch
Decode, Rename &
Dispatch
Loop Cache
Queue Issue
Integer
Integer
Multiply
Floating-Point / NEON
Branch
Load
Store
Writeback
AAETC3v00
Instruction Sets 38
Fetch Decode
Queue
Issue
Integer
Multiply
Floating-Point / NEON
Dual Issue
Load/Store
Writeback
Cortex-A15 and Cortex-A7 form an
architecturally-identical pair
Cortex-A15 is optimized for performance
Cortex-A7 is optimized for power
consumption
Together they can be built into a big.LITTLE
configuration
AGENDA
Instruction Sets
VFP and NEON
Pipelines
AAETC3v00
Instruction Sets 39
Pipelines
• Cycle Counting
CYCLE COUNTING
• Early pipelines (e.g. ARM7TDMI) were entirely deterministic and
predictable
• Later pipelines introduce interlocks and inter-instruction
dependencies
– Address, resource and data dependencies are all possible
AAETC3v00
Instruction Sets 40
– Address, resource and data dependencies are all possible
– Interactions between instructions become very complicated
• On ARMv7 cores, manual cycle counting is not really possible, so
need to use…
– Cycle-accurate trace
– Simulation models
– Performance Monitoring Unit (see later)
PERFORMANCE MONITORING
HARDWARE
• ARMv7-A cores include a performance monitoring unit (PMU)
• A PMU provides a non-intrusive method of collecting execution information
from the core
– Enabling the PMU does not change the timing of the core
• The PMU provides:
– Cycle counter – counts execution cycles (optional 1/64 divider)
AAETC3v00
Instruction Sets 41
– Cycle counter – counts execution cycles (optional 1/64 divider)
– Programmable event counters
• The number of counters and available events vary between cores
– The PMU can be configured to generate interrupts if a counter overflows
• Some examples common to most cores:
– Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction,
correct/incorrect predictions, Number of instructions executed, etc…
• Some events are architecturally defined while others are core-dependent
– Check the ARM ARM and your core’s TRM for a full list
SOFTWARE & SYSTEMS
DESIGN
3 – Instruction Sets

Contenu connexe

Tendances

ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC MachineEdutechLearners
 
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI Alliance
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Sk Cheah
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsMemory Fabric Forum
 
Pll in lpc2148
Pll in lpc2148Pll in lpc2148
Pll in lpc2148Aarav Soni
 
Arm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_modelArm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_modelGanesh Naik
 
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...Sameh El-Ashry
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxVijayKumar201823
 
ARM architcture
ARM architcture ARM architcture
ARM architcture Hossam Adel
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionDr. Pankaj Zope
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and VerificationDVClub
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON
 

Tendances (20)

ARM - Advance RISC Machine
ARM - Advance RISC MachineARM - Advance RISC Machine
ARM - Advance RISC Machine
 
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C ImplementationMIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
MIPI DevCon 2016: A Developer's Guide to MIPI I3C Implementation
 
Microprocessor interview questions
Microprocessor interview questionsMicroprocessor interview questions
Microprocessor interview questions
 
Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED. Memory ECC - The Comprehensive of SEC-DED.
Memory ECC - The Comprehensive of SEC-DED.
 
Arm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm SolutionsArm: Enabling CXL devices within the Data Center with Arm Solutions
Arm: Enabling CXL devices within the Data Center with Arm Solutions
 
Pll in lpc2148
Pll in lpc2148Pll in lpc2148
Pll in lpc2148
 
Unit vi (2)
Unit vi (2)Unit vi (2)
Unit vi (2)
 
Arm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_modelArm cm3 architecture_and_programmer_model
Arm cm3 architecture_and_programmer_model
 
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...
Efficient Methodology of Sampling UVM RAL During Simulation for SoC Functiona...
 
Ambha axi
Ambha axiAmbha axi
Ambha axi
 
UNIT 3.docx
UNIT 3.docxUNIT 3.docx
UNIT 3.docx
 
Unit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptxUnit 4 _ ARM Processors .pptx
Unit 4 _ ARM Processors .pptx
 
STM32 MCU Family
STM32 MCU FamilySTM32 MCU Family
STM32 MCU Family
 
ARM architcture
ARM architcture ARM architcture
ARM architcture
 
Unit II Arm7 Thumb Instruction
Unit II Arm7 Thumb InstructionUnit II Arm7 Thumb Instruction
Unit II Arm7 Thumb Instruction
 
Arm architecture
Arm architectureArm architecture
Arm architecture
 
Axi protocol
Axi protocolAxi protocol
Axi protocol
 
Introduction to stm32-part1
Introduction to stm32-part1Introduction to stm32-part1
Introduction to stm32-part1
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick
 

Similaire à ARM AAE - Intrustion Sets

Similaire à ARM AAE - Intrustion Sets (20)

Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
Arm teaching material
Arm teaching materialArm teaching material
Arm teaching material
 
ARM Architecture Instruction Set
ARM Architecture Instruction SetARM Architecture Instruction Set
ARM Architecture Instruction Set
 
Unit vi
Unit viUnit vi
Unit vi
 
module 5.1.pptx
module 5.1.pptxmodule 5.1.pptx
module 5.1.pptx
 
module 5.pptx
module 5.pptxmodule 5.pptx
module 5.pptx
 
UNIT 2 ERTS.ppt
UNIT 2 ERTS.pptUNIT 2 ERTS.ppt
UNIT 2 ERTS.ppt
 
15CS44 MP & MC module 5
15CS44 MP & MC  module 515CS44 MP & MC  module 5
15CS44 MP & MC module 5
 
Module 2 PPT of ES.pptx
Module 2 PPT of ES.pptxModule 2 PPT of ES.pptx
Module 2 PPT of ES.pptx
 
Arm Cortex material Arm Cortex material3222886.ppt
Arm Cortex material Arm Cortex material3222886.pptArm Cortex material Arm Cortex material3222886.ppt
Arm Cortex material Arm Cortex material3222886.ppt
 
Unit II arm 7 Instruction Set
Unit II arm 7 Instruction SetUnit II arm 7 Instruction Set
Unit II arm 7 Instruction Set
 
Lecture8
Lecture8Lecture8
Lecture8
 
OptimizingARM
OptimizingARMOptimizingARM
OptimizingARM
 
LPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.pptLPC 2148 Instructions Set.ppt
LPC 2148 Instructions Set.ppt
 
arm-intro.ppt
arm-intro.pptarm-intro.ppt
arm-intro.ppt
 
ARM Introduction
ARM IntroductionARM Introduction
ARM Introduction
 
armcortexinsructionsetupdated (2).pptx
armcortexinsructionsetupdated (2).pptxarmcortexinsructionsetupdated (2).pptx
armcortexinsructionsetupdated (2).pptx
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
ARM instruction set
ARM instruction  setARM instruction  set
ARM instruction set
 
Arm instruction set
Arm instruction setArm instruction set
Arm instruction set
 

Plus de Anh Dung NGUYEN

ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMAnh Dung NGUYEN
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory SystemsAnh Dung NGUYEN
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAnh Dung NGUYEN
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAnh Dung NGUYEN
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAnh Dung NGUYEN
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAnh Dung NGUYEN
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software DevelopmentAnh Dung NGUYEN
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAnh Dung NGUYEN
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAnh Dung NGUYEN
 

Plus de Anh Dung NGUYEN (10)

ARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARMARM AAE - Developing Code for ARM
ARM AAE - Developing Code for ARM
 
ARM AAE - Memory Systems
ARM AAE - Memory SystemsARM AAE - Memory Systems
ARM AAE - Memory Systems
 
ARM AAE - Introduction
ARM AAE - IntroductionARM AAE - Introduction
ARM AAE - Introduction
 
AAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation DiversityAAME ARM Techcon2013 006v02 Implementation Diversity
AAME ARM Techcon2013 006v02 Implementation Diversity
 
AAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System StartupAAME ARM Techcon2013 005v02 System Startup
AAME ARM Techcon2013 005v02 System Startup
 
AAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and OptimizationAAME ARM Techcon2013 004v02 Debug and Optimization
AAME ARM Techcon2013 004v02 Debug and Optimization
 
AAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced FeaturesAAME ARM Techcon2013 002v02 Advanced Features
AAME ARM Techcon2013 002v02 Advanced Features
 
AAME ARM Techcon2013 003v02 Software Development
AAME ARM Techcon2013 003v02  Software DevelopmentAAME ARM Techcon2013 003v02  Software Development
AAME ARM Techcon2013 003v02 Software Development
 
AAME ARM Techcon2013 Intro
AAME ARM Techcon2013 IntroAAME ARM Techcon2013 Intro
AAME ARM Techcon2013 Intro
 
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's modelAAME ARM Techcon2013 001v02 Architecture and Programmer's model
AAME ARM Techcon2013 001v02 Architecture and Programmer's model
 

Dernier

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxPooja Bhuva
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 

Dernier (20)

How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 

ARM AAE - Intrustion Sets

  • 1. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets
  • 2. AGENDA • Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 2 Pipelines Cycle Counting
  • 3. INSTRUCTION SET • ARM instruction set – All instructions are 32-bit – Most instructions can be executed conditionally • Thumb instruction set – 16-bit instruction set No condition execution (except for branches) AAETC3v00 Instruction Sets 3 – 16-bit instruction set – No condition execution (except for branches) – Optimized for code density from C code (~65% of ARM code size) • Thumb-2 technology – Extension to Thumb instruction set – Mix of 16-bit and 32-bit instructions – Condition execution via IT instruction – Higher performance than Thumb and smaller than ARM
  • 4. ASSEMBLER SYNTAX • Data processing instructions <operation><condition> Rd, Rm, <op2> ADDEQ r4, r5, r6 // if (EQ) r4 = r5 + r6 ORR r2, r3, r6, LSL #4 // if (EQ) r4 = r5 + r6 SUBS r5, r7, #4 // r5 = r7 – 4; set flags MOV r4, #7 // r4 = 7 • Memory access instructions AAETC3v00 Instruction Sets 4 • Memory access instructions <operation><size> Rd, [<address>] LDR r0, [r6, #4] // r0 = *(r6 + 4) STRB r4, [r7], #8 // *(byte *) r7 = r4; r7 += 8 <operation><addressing mode> <Rn>!, <registers list> LDMIA r0, {r1, r2, r7} STMFD sp!, {r4-r11, lr} • Program flow instructions <branch> <label> BL foo B baR
  • 5. DATA PROCESSING INSTRUCTIONS • These instructions operate on the contents of registers – They DO NOT affect memory arithmetic logical move manipulation (has destination register) ADD ADC SUB SBC RSB RSC AND EOR MOV ORR ORN BIC T2T2 MVN AAETC3v00 Instruction Sets 5 • Syntax: <Operation>{S}{<cond>} {Rd,} Rn, Operand2 • Examples: ADD r0, r1, r2 ; r0 = r1 + r2 TEQ r0, r1 ; if r0 = r1, Z flag will be set MOV r0, r1 ; copy r1 to r0 comparison (set flags only) CMN (ADDS) CMP (SUBS) TST (ANDS) TEQ (EORS)
  • 6. MULTIPLY / DIVIDE • 32-bit multiplication 64-bit multiplication ×××× Rn Rm + ×××× Rn Rm Ra +/- optional accumulation optional accumulation MUL MLA MLS UMULL SMULL UMLAL SMLAL AAETC3v00 Instruction Sets 6 Examples: MLA r0, r1, r2, r3 ; r0 = r3 + (r1 * r2) [U|S]MULL r4, r5, r2, r3 ; r5:r4 = r2 * r3 Division: SDIV r0, r1, r2 ; signed: r0 = r1 / r2 UDIV r0, r1, r2 ; unsigned: r0 = r1 / r2 RdHi RdLoRdMLS SMLAL Optional in 7-A
  • 7. BIT MANIPULATION INSTRUCTIONS 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 031 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 1 0 0 031 BFI r0, r0, #9, #6 ; Bit Field Insert UBFX r1, r0, #18, #7 ; Bit Field Extract 1 1 0 1 0 0 1 0 1 0 011 1 0 1 0 0 r0 r0 AAETC3v00 Instruction Sets 7 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 031 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 BFC r1, #3, #4 ; Bit Field Clear 0 RBIT r2, r1 ; Reverse Bit Order 0 Zero extend r1 r2 031 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10 0 0r1
  • 8. BYTE REVERSAL • Byte Reversal Instructions REV{cond} Rd, Rm Reverses the bytes in a word REV16{cond} Rd, Rm Reverses the bytes in each halfword 3 2 01 0 1 32 REV r0, r0 AAETC3v00 Instruction Sets 8 REV16{cond} Rd, Rm Reverses the bytes in each halfword REVSH{cond} Rd, Rm Reverses the bottom two bytes, and sign extends to 32 bits V6 and later REV r0, r0 Pre-V6 EOR r1, r0, r0, ROR #16 BIC r1, r1, #0xFF0000 MOV r0, r0, ROR #8 EOR r0, r0, r1, LSR #8
  • 9. SIMD • ARMv6 added a number of instructions which perform SIMD (Single Instruction Multiple Data) operations using ARM registers – Includes instructions for addition, subtraction, multiplication and sum of absolute differences – Instructions can work on four 8-bit quantities, or two 16-bit quantities – Signed/unsigned and saturating versions available of many instructions – CPSR GE bits used instead of normal ALU flags UADD16 Rd, Rm, Rs AAETC3v00 Instruction Sets 9 • There are instructions for packing (PKHBT/PKHTB) and unpacking (UXTH/UXTB) registers + Rs + Rm UADD16 Rd, Rm, Rs Rd GE[3:2] GE[1:0]
  • 10. SATURATED MATH AND CLZ • Support for Saturated Arithmetic – Targeted at DSP & control applications – Overflow sets Q flag (sticky) not V, and sets result to +/- max value QSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - Rn) QADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + Rn) 0x0 0x7FFFFFFF 0x80000000 -ve +ve AAETC3v00 Instruction Sets 10 QDSUB{cond} Rd, Rm, Rn ; Rd = saturate(Rm - saturate(Rn * 2)) QDADD{cond} Rd, Rm, Rn ; Rd = saturate(Rm + saturate(Rn * 2)) • Count Leading Zeros CLZ{cond} Rd, Rm – Returns number of unset bits before the most significant set bit 031 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 1 0 0 1 1 1 0 0 1 1 1 0 1 0 0 CLZ returns 10 in this case
  • 11. SATURATION • Saturate a value to a specified bit position (effectively saturating to any power of 2) – USAT - Unsigned saturate 32-bit • Syntax: USAT Rd, #sat, Rm {shift} • Operation: Rd = Saturate(Shift(Rm), #sat) 0 0 1 1 1 saturation position max (unsigned saturation) max min AAETC3v00 Instruction Sets 11 – Variants SSAT - signed saturation USAT16 - saturates two 16-bit unsigned halfwords (no rotation allowed) SSAT16 - signed saturation of two 16-bit halfwords (no rotation allowed) – #sat is specified as an immediate value in the range 0 to 31 – {shift} is optional and is limited to LSL or ASR – Q flag is set if saturation occurs 0 0 0 1 1 max 1 1 1 0 0 min (signed saturation)
  • 12. SINGLE / DOUBLE REGISTER DATA TRANSFER • Use to move data between one or two registers and memory LDRD STRD Doubleword LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load Memory 31 0 AAETC3v00 Instruction Sets 12 • Syntax: – LDR{<size>}{<cond>} Rd, <address> – STR{<size>}{<cond>} Rd, <address> • Example: – LDRB r0, [r1] ; load bottom byte of r0 from the ; byte of memory at address in r1 Any remaining space zero filled or sign extended Rd
  • 13. ADDRESSING MEMORY • The address accessed by LDR/STR is specified by a base register with an optional offset – Base register only (no offset) LDR r0, [r1] – Base register plus constant LDR r0, [r1, #8] r2, LSL #2 AAETC3v00 Instruction Sets 13 LDR r0, [r1, #8] – Base register, plus register (optionally shifted by an immediate value) LDR r0, [r1, r2] LDR r0, [r1, r2, LSL #2] – The offset can be either added or subtracted from the base register LDR r0, [r1, #-8] LDR r0, [r1, -r2] LDR r0, [r1, -r2, LSL #2] +/- r1 #8 r0 memory address r2, LSL #2 or
  • 14. PRE- AND POST-INDEXED ADDRESSING • Post-indexed (add offset after memory access) LDR r0, [r1], #12 • Pre-indexed (add offset before memory access) LDR r0, [r1, #12]{!} + r1 #12 address r1 address AAETC3v00 Instruction Sets 14 r0 memory r0 memory + r1 #12 r1 • If ‘!’ present, update base register (r1) • Always update base register (r1) + r1 #12 r1
  • 15. • These instructions move data between multiple registers and memory • Syntax <LDM|STM>{<addressing_mode>}{<cond>} Rb{!}, <register list> • 4 addressing modes • Increment after/before • Decrement after/before MULTIPLE REGISTER DATA TRANSFER (IA) r1 Increasing r4 r1 r4 r0 IB DA DB AAETC3v00 Instruction Sets 15 • Also PUSH/POP, equivalent to STMDB/LDMIA with SP! as base register • Example LDM r10, {r0,r1,r4} ; load registers, using r10 base PUSH {r4-r6,pc} ; store registers, using SP base Increasing Addressr0 r1 r4 r0 r1 r4 r0 r10Base Register (Rb)
  • 16. INSTRUCTIONS FOR LOADING CONSTANTS • The assembler provides some instructions for loading values into registers – These are the recommended mechanisms for loading constants into registers • PC- or register-relative constants ADR Rn, label • Add or subtract an immediate value to or from the PC to generate the • Absolute constants LDR Rn, =<constant> LDR Rn, =label AAETC3v00 Instruction Sets 16 to or from the PC to generate the address of the label into the specified register, using one instruction • ADRL pseudo instruction uses two instructions, giving a better range • Can be used to generate addresses for position independent code (but only if in same code section) • Constant determined at run time • Pseudo instruction • Assembler will use optimal sequence to generate constant into specified register (one of MOV, MVN or an LDR from a literal pool) • Can load to the PC, causing a branch • Use for absolute addressing and references outside the current section (resulting in position dependent code) • Constant determined at assembly or link time
  • 17. LDR= EXAMPLES • The following examples show how the LDR= pseudo instruction makes code more readable, portable and flexible LDR r0, =0x2543 MOV r0, #0x2543 DisassemblyCode AAETC3v00 Instruction Sets 17 LDR r0, =0xFFFF43FF LDR r0, =0xFFFFF5 MVN r0, #0xBC00 LDR r0, [pc, #xx] ... DCD 0xFFFFF5
  • 18. BRANCH INSTRUCTIONS • Branch instructions have the following format B{<cond>} label – Might not cause a pipeline flush (branch prediction) – Branch range depends on instruction set and width • A BL instruction additionally generates a return address in r14 (lr) – Returning is performed by restoring the program counter (pc) from lr AAETC3v00 Instruction Sets 18 – Returning is performed by restoring the program counter (pc) from lr : BL func2 : : BX lr func1 func2 void func1 (void) { : func2(); : }
  • 19. BRANCH RANGES • The range of a branch instruction depends on which instruction set is being used • It also varies between different types of branch ARM Thumb B ±32MB ±16MB CBZ/CBNZ 126 bytes AAETC3v00 Instruction Sets 19 CBZ/CBNZ 126 bytes BL/BLX (imm) ±32MB ±16MB BLX (reg) Any Any BX Any Any TBB 510 bytes TBH 131070 bytes “Any” indicates an instruction which can branch to any address in the 4GB address space
  • 20. READING AND WRITING PC • In general, writing PC causes a branch to the value written – Bit zero controls the execution state (ARM or Thumb) at the destination – The bottom bit of the destination address is always forced to zero – Writing a value with ‘10’ in the bottom two bits results in unpredictable behavior – Note that architectures prior to ARMv7 do not change state when the PC is written directly AAETC3v00 Instruction Sets 20 • Loading PC from memory behaves similarly – Architectures prior to ARMv5T do not change state when the PC is loaded from memory • The PC reads as the address of the current instruction plus an offset – In ARM state, the offset is 8 – In Thumb state, the offset is 4 – This reflects the 3-stage structure of the ARM7TDMI pipeline – In Thumb state, the bottom bit always reads as zero – In ARM state, the bottom two bits will always read as zero
  • 21. CHANGING STATE • Changing between ARM and Thumb states (or “interworking”) can be carried out using the Branch Exchange instruction BX Rn BLX RN – Bit 0 of Rn determines the exchange behavior • Unset (0) - change to (or remain in) ARM state • Set (1) - change to (or remain in) Thumb state AAETC3v00 Instruction Sets 21 • Branch and Link with Exchange – Used to branch to a subroutine which is known to be in the opposite instruction set – When branching to imported labels use BL, the linker will substitute BLX if necessary BLX offset ; ARM/Thumb instruction which always ; changes state (and sets LR) • All instructions which modify the PC can cause a state change – Depending on bit 0 of the result – For data processing instructions, state changes only if S variant not used
  • 22. IF-THEN • Thumb only, makes the next 1-4 instructions conditional • Syntax IT{T|E}{T|E}{T|E} <cond> – Any condition code may be used – Doesn’t affect condition flags – 16-bit instructions in the IT block do not affect condition ; if (r0 == 0) ; r0 = *r1 + 2; ; else ; r0 = *r2 + 4; ; if CMP r0, #0 ITTEE EQ AAETC3v00 Instruction Sets 22 – 16-bit instructions in the IT block do not affect condition flags (except CMP, CMN & TST) – 32-bit instructions do affect condition flags (normal rules apply) – No need to write this instruction: the assembler will insert it for you where necessary • Current “if-then status” stored in CPSR – Conditional block may be safely interrupted and returned to – Not recommended to branch into or out of ‘if-then’ block ITTEE EQ ; then LDREQ r0, [r1] ADDEQ r0, #2 ; else LDRNE r0, [r2] ADDNE r0, #4
  • 23. STATUS REGISTER ACCESS • MRS and MSR allow contents of CPSR/SPSR to be transferred to/from a general purpose register or be set to an immediate value – MSR allows the whole status register, or just parts of it, to be updated MRS r0,CPSR ; read CPSR into r0 BIC r0,r0,#0x80 ; clear bit 7 to enable IRQ MSR CPSR_c,r0 ; write modified value to ‘c’ byte only AAETC3v00 Instruction Sets 23 • CPS can be used to directly modify some bits in the CPSR – These are related to interrupt enable/disable and operating mode • SETEND instruction selects the endianness of data accesses – For use in systems with mixed endian data (e.g. peripherals) SETEND BE LDR r0, [r7], #4 ; big-endian SETEND LE LDR r1, [r7], #4 ; little-endian User mode programs may read all bits of CPSR but may only change the flag bits
  • 24. SYSTEM CONTROL INSTRUCTIONS • ARM uses coprocessors for “internal functions” so as not to enforce a particular memory map – System Control Coprocessor: cp15 • Used for processor configuration: System ID, caches, MMU, TCMs, etc. – Debug Coprocessor: cp14 • Can be used to access debug control registers AAETC3v00 Instruction Sets 24 • Can be used to access debug control registers – VFP and NEON: cp10 and cp11 • In earlier versions of the architecture, designers were permitted to add external coprocessors – This is not permitted in ARMv7 architecture profiles
  • 25. AGENDA Instruction Sets • VFP and NEON Pipelines AAETC3v00 Instruction Sets 25 Pipelines Cycle Counting
  • 26. VFP ARCHITECTURE • VFP (Vector Floating Point) is ARM’s floating point architecture – There have been 4 versions of the architecture to date (VFPv1 is no longer AAETC3v00 Instruction Sets 26 – There have been 4 versions of the architecture to date (VFPv1 is no longer supported) – VFPv2 is supported by ARM9 and ARM11 processor families – VFPv3 and VFPv4 are optional extensions to the ARMv7-AR architecture profiles • VFPv3 (Cortex-A8, Cortex-A9, Cortex-R4, Cortex-R5) – Can be implemented with either 16 (VFPv3-D16) or 32 (VFPv3-D32) registers – Can be extended with half-precision conversion functions • VFPv4 (Cortex-A5, Cortex-A7 and Cortex-A15) – Includes half-precision conversion functions – Supports fused multiply-add operations
  • 27. THE NEON ARCHITECTURE EXTENSION • NEON refers to the Advanced SIMD instruction set extension – Optional extension to ARMv7-AR architecture profiles – The NEON register set is separate from the core register bank – NEON instruction support parallel operations on vectors of elements held in registers – Advanced SIMDv1 is the base NEON architecture • Can be extended with half-precision conversion functions – Advanced SIMDv2 adds fused multiply-add operations AAETC3v00 Instruction Sets 27
  • 28. AGENDA Instruction Sets VFP and NEON • Pipelines AAETC3v00 Instruction Sets 28 • Pipelines Cycle Counting
  • 29. Fetch Decode Execute ARM7 Fetch Decode Execute Memory Writeback ARM9 Fetch 1 Fetch 2 Decode Issue MAC 1 MAC 2 MAC 3 Shift ALU Saturate Address Data 1 Data 2 Writeback Writeback ARM1136 HISTORIC PIPELINES AAETC3v00 Instruction Sets 29 Address Data 1 Data 2 Writeback Fetch 1 Fetch 2 Fetch 3 Queue Decode Rename Issue Execute 1 Execute 2 MAC 1 MAC 2 Address Load/Store Writeback Writeback Execute 1 Execute 2 Data Engine Writeback Writeback Cortex-A9
  • 30. Operation Cycle 1 2 3 4 5 6 Execute Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch Decode Execute Fetch Decode Execute ADD SUB MOV AND ORR ARM7TDMI PIPELINE (DATA PROC) AAETC3v00 Instruction Sets 30 Fetch Decode Execute Fetch Decode Execute Fetch Decode Fetch ORR EOR CMP RSB • In this example it takes 6 clock cycles to execute 6 instructions • All operations here are on registers ( single cycle execution ) • Clock cycles per Instruction (CPI) = 1
  • 31. ARM7TDMI PIPELINE (LDR) Cycle Operation 1 2 3 4 5 6 ADD SUB LDR FetchFetch Decode Execute Fetch Decode Execute Fetch Decode Execute Data Writeback AAETC3v00 Instruction Sets 31 • In this example it takes 6 clock cycles to execute 4 instructions • Clock cycles per Instruction (CPI) = 1.5 LDR MOV AND ORR Fetch Decode Execute Data Writeback Fetch Decode Execute Fetch Decode Fetch
  • 32. ARM7TDMI PIPELINE (BRANCH) Fetch Decode Cycle 1 2 3 4 5 0x8000 BL 0x8004 X 0x8008 XX 0x8FEC ADD Address Operation Linkret AdjustFetch Decode Execute Fetch Decode Fetch Execute AAETC3v00 Instruction Sets 32 • Refilling the pipeline • Note that the core is executing in ARM state Fetch Decode Fetch 0x8FEC ADD 0x8FF0 SUB 0x8FF4 MOV Execute Decode Execute Fetch Decode Fetch
  • 33. Cycle 1 2 3 4 5 6 7 8 IRQ Address Operation Fetch DecodeExecute Linkret Adjust Fetch Decode IRQ Linkret Execute IRQ Adjust 0x8000 ADD 0x8008 MOV 0x8004 SUB 0x800C X Fetch Fetch ARM7TDMI PIPELINE (INTERRUPT) AAETC3v00 Instruction Sets 33 0x0018 B (to 0xAF00) 0x001C XX 0x0020 XXX 0xAF00 STMFD 0xAF04 MOV 0xAF08 LDR Fetch Fetch Fetch Fetch Fetch Fetch Decode Decode Decode Decode Execute Execute IRQ interrupt minimum latency (service routine entry) = 7 cycles
  • 34. ARM9TDMI PIPELINE (LDR INTERLOCK) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 1 2 3 4 5 6 7 8 LDR R4, [R7] 9 F D E F D E W F D E W F D E W F D WE W I M S AAETC3v00 Instruction Sets 34 • In this example it takes 7 clock cycles to execute 6 instructions, CPI of 1.2 • The LDR instruction immediately followed by a data operation using the same register causes an interlock EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  • 35. ARM9TDMI PIPELINE (LDR) Cycle Operation ADD R1, R1, R2 SUB R3, R4, R1 ORR R8, R3, R4 AND R6, R3, R1 LDR R4, [R7] 1 2 3 4 5 6 7 8 9 F D E W F D E W F E W F D E W F D WE M D AAETC3v00 Instruction Sets 35 • In this example it takes 6 cycles to execute 6 instructions, CPI of 1 • Cycle 4 has simultaneous I & D memory accesses • Cycle 5 R4 data available to ORR before written to register – Internal forwarding paths are used EOR R3, R1, R2 F D E W F - Fetch D - Decode E - Execute I - Interlock M - Memory W - Writeback
  • 36. CORTEX-R4 PIPELINE Decode Issue Pre- Decode Fetch2 Shift ALU Sat MAC 1 MAC 2 Data Cache Data Cache Format Fetch1 A G Common decode pipeline 4 parallel back end pipelines MAC 3 Wr Instruction AAETC3v00 Instruction Sets 36 FPU2 Cache 1 Cache 2 Format FPU0 FPU1 Branch3 Wr G UPrefetch Unit • Dual issue can occur for certain instruction sequences • Enabled at reset, can be disabled in CP15 • AGU = Address Generation Unit • Separate divide pipeline for hardware DIV instruction Branch1Branch2 FPU (Optional) Instruction queue
  • 37. CORTEX-A9 PIPELINE Prefetch Unit ISS Ex1 Ex1 WB WB De Re BM Main (P0) Dual (P1) M1 Mac (M) Ex2 Ex2 M2 IQ Instruction Address Instruction fetching 64 AAETC3v00 Instruction Sets 37 • IQ: Instruction Queue • Re: Register renaming • BM:Branch Monitor • P0: Main execution pipeline • M: MAC pipeline • P1: Secondary (“dual”) execution pipeline • AGU: Address Generation Unit • LSU: Load/Store Unit • DE: Data Engine - (NEON and/or FPU) pipeline AGU WB Data Engine LSU Load/store (LS) WB Data Engine (DE)
  • 38. CORTEX-A15 AND CORTEX-A7 Fetch Decode, Rename & Dispatch Loop Cache Queue Issue Integer Integer Multiply Floating-Point / NEON Branch Load Store Writeback AAETC3v00 Instruction Sets 38 Fetch Decode Queue Issue Integer Multiply Floating-Point / NEON Dual Issue Load/Store Writeback Cortex-A15 and Cortex-A7 form an architecturally-identical pair Cortex-A15 is optimized for performance Cortex-A7 is optimized for power consumption Together they can be built into a big.LITTLE configuration
  • 39. AGENDA Instruction Sets VFP and NEON Pipelines AAETC3v00 Instruction Sets 39 Pipelines • Cycle Counting
  • 40. CYCLE COUNTING • Early pipelines (e.g. ARM7TDMI) were entirely deterministic and predictable • Later pipelines introduce interlocks and inter-instruction dependencies – Address, resource and data dependencies are all possible AAETC3v00 Instruction Sets 40 – Address, resource and data dependencies are all possible – Interactions between instructions become very complicated • On ARMv7 cores, manual cycle counting is not really possible, so need to use… – Cycle-accurate trace – Simulation models – Performance Monitoring Unit (see later)
  • 41. PERFORMANCE MONITORING HARDWARE • ARMv7-A cores include a performance monitoring unit (PMU) • A PMU provides a non-intrusive method of collecting execution information from the core – Enabling the PMU does not change the timing of the core • The PMU provides: – Cycle counter – counts execution cycles (optional 1/64 divider) AAETC3v00 Instruction Sets 41 – Cycle counter – counts execution cycles (optional 1/64 divider) – Programmable event counters • The number of counters and available events vary between cores – The PMU can be configured to generate interrupts if a counter overflows • Some examples common to most cores: – Cache Hits or Misses, TLB Misses (on MMU cores), Branch prediction, correct/incorrect predictions, Number of instructions executed, etc… • Some events are architecturally defined while others are core-dependent – Check the ARM ARM and your core’s TRM for a full list
  • 42. SOFTWARE & SYSTEMS DESIGN 3 – Instruction Sets