SlideShare une entreprise Scribd logo
1  sur  130
Télécharger pour lire hors ligne
MODULE 2
PROCESSOR
ORGANIZATION AND
ARCHITECTURE
PROCESSOR
ORGANIZATION AND
ARCHITECTURE
By Prof. Vinit Raut
Classification of Processors
• Categorized by memory organization
 Von-Neumann architecture
 Harvard architecture
• Categorized by instruction type
 CISC
 RISC
 VLIW
• Categorized by memory organization
 Von-Neumann architecture
 Harvard architecture
• Categorized by instruction type
 CISC
 RISC
 VLIW
Von Neumann Model
• In 1946, John von Neumann and his colleagues
began the design of a new stored program
computer referred to as the IAS (Institute for
Advanced Study) computer.
• Stores program and data in same memory.
• It was designed to overcome the limitation of
previous ENIAC computer.
• The limitation of ENIAC computer:-
The task of entering and altering programs for
the ENIAC was extremely tedious.
• In 1946, John von Neumann and his colleagues
began the design of a new stored program
computer referred to as the IAS (Institute for
Advanced Study) computer.
• Stores program and data in same memory.
• It was designed to overcome the limitation of
previous ENIAC computer.
• The limitation of ENIAC computer:-
The task of entering and altering programs for
the ENIAC was extremely tedious.
Structure of IAS computer
Structure of IAS computer
IAS consists of-
• A main memory, which stores both data and
instructions
• An ALU capable of operating on binary data
• A control unit, which interprets the instructions in
memory and causes them to be executed
• I/O equipment operated by the control unit
IAS consists of-
• A main memory, which stores both data and
instructions
• An ALU capable of operating on binary data
• A control unit, which interprets the instructions in
memory and causes them to be executed
• I/O equipment operated by the control unit
 IAS Memory Formats
 The memory of IAS consists of 1000 storage locations,
called words, of 40 binary digits(bits) each.
 Both data and instructions are stored there.
 Each number is represented by a sign bit and a 39-bit
value.
0 1 39
Sign bit
 A word may also contain two 20-bit instructions, with
each instruction consisting of an 8-bit operation
code(opcode) specifying the operation to be performed
and a12-bit address designating one of the words in
memory .
Left Instruction Right Instruction
0 8 20 28
39
Opcode OpcodeAddress Address
 IAS Memory Formats
 The memory of IAS consists of 1000 storage locations,
called words, of 40 binary digits(bits) each.
 Both data and instructions are stored there.
 Each number is represented by a sign bit and a 39-bit
value.
0 1 39
Sign bit
 A word may also contain two 20-bit instructions, with
each instruction consisting of an 8-bit operation
code(opcode) specifying the operation to be performed
and a12-bit address designating one of the words in
memory .
Left Instruction Right Instruction
0 8 20 28
39
Harvard Architecture
• Physically separate storage and signal pathways for
instructions and data.
• Originated from the Harvard Mark I relay-based
computer, which stored
 Instructions on punched tape (24 bits wide)
 Data in electro-mechanical counters
• In some systems, instructions can be stored in read-only
memory while data memory generally requires read-write
memory.
• In some systems, there is much more instruction
memory than data memory.
• Used in MCS-51, MIPS etc.
• Physically separate storage and signal pathways for
instructions and data.
• Originated from the Harvard Mark I relay-based
computer, which stored
 Instructions on punched tape (24 bits wide)
 Data in electro-mechanical counters
• In some systems, instructions can be stored in read-only
memory while data memory generally requires read-write
memory.
• In some systems, there is much more instruction
memory than data memory.
• Used in MCS-51, MIPS etc.
Harvard Architecture
Register Organization
• CPU must have some working space (temporary
storage) called registers.
• A computer system employs a memory hierarchy.
• At the highest level of hierarchy, memory is faster,
smaller and more expensive.
• Within the CPU, there is a set of registers which can be
treated as a memory in the highest level of hierarchy.
• CPU must have some working space (temporary
storage) called registers.
• A computer system employs a memory hierarchy.
• At the highest level of hierarchy, memory is faster,
smaller and more expensive.
• Within the CPU, there is a set of registers which can be
treated as a memory in the highest level of hierarchy.
Register Organization
• The registers in the CPU can be categorized into two
groups
1. User-visible registers:
– These enables the machine - or assembly-language
programmer to minimize main memory reference by optimizing
use of registers.
2. Control and status registers:
– These are used by the control unit to control the operation of the
CPU.
– Operating system programs may also use these in privileged
mode to control the execution of program.
• The registers in the CPU can be categorized into two
groups
1. User-visible registers:
– These enables the machine - or assembly-language
programmer to minimize main memory reference by optimizing
use of registers.
2. Control and status registers:
– These are used by the control unit to control the operation of the
CPU.
– Operating system programs may also use these in privileged
mode to control the execution of program.
User-visible registers
• General Purpose
• Data
• Address
• Condition Codes
• General Purpose
• Data
• Address
• Condition Codes
1. General Purpose Registers:
 Used for a variety of functions by the programmer.
 Sometimes used for holding operands(data) of an
instruction.
 Sometimes used for addressing functions (e.g., register
indirect, displacement).
2. Data registers:
 Used to hold only data.
 Cannot be employed in the calculation of an operand address.
1. General Purpose Registers:
 Used for a variety of functions by the programmer.
 Sometimes used for holding operands(data) of an
instruction.
 Sometimes used for addressing functions (e.g., register
indirect, displacement).
2. Data registers:
 Used to hold only data.
 Cannot be employed in the calculation of an operand address.
3. Address registers:
 Used exclusively for the purpose of addressing.
 Examples include the following:
1. Segment pointer:
– In a machine with segment addressing, a segment register
holds the address of the base of the segment.
– There may be multiple registers, one for the code segment and
one for the data segment.
2. Index registers:
– These are used for indexed addressing and may be auto
indexed.
3. Stack pointer:
– A dedicated register that points to the top of the stack.
– Auto incremented or auto decremented using PUSH or POP
operation
3. Address registers:
 Used exclusively for the purpose of addressing.
 Examples include the following:
1. Segment pointer:
– In a machine with segment addressing, a segment register
holds the address of the base of the segment.
– There may be multiple registers, one for the code segment and
one for the data segment.
2. Index registers:
– These are used for indexed addressing and may be auto
indexed.
3. Stack pointer:
– A dedicated register that points to the top of the stack.
– Auto incremented or auto decremented using PUSH or POP
operation
4. Condition Codes Registers:
 Sets of individual bits
• e.g. result of last operation was zero
 Can be read (implicitly) by programs
• e.g. Jump if zero
 Can not (usually) be set by programs
Control and status registers
• Four registers are essential to instruction execution:
1. Program Counter (PC):
– Contains the address of an instruction to be fetched.
2. Instruction Register (IR):
– Contains the instruction most recently fetched.
3. Memory Address Register (MAR):
– Contains the address of a location of main memory from where
information has to be fetched or information has to be stored.
4. Memory Buffer Register (MBR):
– Contains a word of data to be written to memory or the word
most recently read.
• Four registers are essential to instruction execution:
1. Program Counter (PC):
– Contains the address of an instruction to be fetched.
2. Instruction Register (IR):
– Contains the instruction most recently fetched.
3. Memory Address Register (MAR):
– Contains the address of a location of main memory from where
information has to be fetched or information has to be stored.
4. Memory Buffer Register (MBR):
– Contains a word of data to be written to memory or the word
most recently read.
 Program Status Word (PSW)
 Condition code bits are collected into one or more registers,
known as the program status word (PSW), that contains
status information.
 Common fields or flags include the following:
• Sign: Contains the sign bit of the result of the last arithmetic
operation.
• Zero: Set when the result is zero.
• Carry: Set if an operation resulted in a carry (addition) into or
borrow (subtraction) out of a high order bit.
• Equal: Set if a logical compare result is equal.
• Overflow: Used to indicate arithmetic overflow.
• Interrupt enable/disable: Used to enable or disable
interrupts.
Control and status registers
 Program Status Word (PSW)
 Condition code bits are collected into one or more registers,
known as the program status word (PSW), that contains
status information.
 Common fields or flags include the following:
• Sign: Contains the sign bit of the result of the last arithmetic
operation.
• Zero: Set when the result is zero.
• Carry: Set if an operation resulted in a carry (addition) into or
borrow (subtraction) out of a high order bit.
• Equal: Set if a logical compare result is equal.
• Overflow: Used to indicate arithmetic overflow.
• Interrupt enable/disable: Used to enable or disable
interrupts.
Register organization of INTEL
8086 processor
Register organization of INTEL
8086 processor
• 16-bit flags, Instruction Pointer
• General Registers, 16 bits
 AX – Accumulator, favored in calculations
 BX – Base, normally holds an address of a variable or func
 CX – Count, normally used for loops
 DX – Data, normally used for multiply/divide
• Segment, 16 bits
 SS – Stack, base segment of stack in memory
 CS – Code, base location of code
 DS – Data, base location of variable data
 ES – Extra, additional location for memory data
• 16-bit flags, Instruction Pointer
• General Registers, 16 bits
 AX – Accumulator, favored in calculations
 BX – Base, normally holds an address of a variable or func
 CX – Count, normally used for loops
 DX – Data, normally used for multiply/divide
• Segment, 16 bits
 SS – Stack, base segment of stack in memory
 CS – Code, base location of code
 DS – Data, base location of variable data
 ES – Extra, additional location for memory data
Register organization of INTEL
8086 processor
• Index, 16 bits
 BP – Base Pointer, offset from SS for locating subroutines
 SP – Stack Pointer, offset from SS for top of stack
 SI – Source Index, used for copying data/strings
 DI – Destination Index, used for copy data/strings
• Index, 16 bits
 BP – Base Pointer, offset from SS for locating subroutines
 SP – Stack Pointer, offset from SS for top of stack
 SI – Source Index, used for copying data/strings
 DI – Destination Index, used for copy data/strings
INSTRUCTION FORMAT
• The operation of the computer system are determined by
the instructions executed by the central processing
unit.
• These instructions are known as machine instruction
and are in the form of binary codes.
• Each instruction of the CPU has specific information
field which are required to execute it.
• These information field of instructions are called
elements of instruction.
• The operation of the computer system are determined by
the instructions executed by the central processing
unit.
• These instructions are known as machine instruction
and are in the form of binary codes.
• Each instruction of the CPU has specific information
field which are required to execute it.
• These information field of instructions are called
elements of instruction.
Elements of Instruction
1. Operation Code:
Binary code that specifies which operation to be performed.
2. Source operand address:
Specifies one or more source operands
3. Destination operand address:
The operation executed by the CPU may produce result which is
stored in the destination address.
4. Next instruction address:
Tells the CPU from where to fetch the next instruction after
completion of execution of current instruction.
1. Operation Code:
Binary code that specifies which operation to be performed.
2. Source operand address:
Specifies one or more source operands
3. Destination operand address:
The operation executed by the CPU may produce result which is
stored in the destination address.
4. Next instruction address:
Tells the CPU from where to fetch the next instruction after
completion of execution of current instruction.
Representation of Instruction
OpcodeOpcode Operand address1Operand address1 Operand address2Operand address2OpcodeOpcode Operand address1Operand address1 Operand address2Operand address2
Instruction Types According to
Number of Addresses
Three Address Instruction
Two Address Instruction
One Address Instruction
Zero Address Instruction
• The location of the operands are defined implicitly
• For implicit reference, a processor register is used and it
is termed as accumulator(AC).
• E.g. CMA //complements the content of accumulator
• i.e. AC AC
• The location of the operands are defined implicitly
• For implicit reference, a processor register is used and it
is termed as accumulator(AC).
• E.g. CMA //complements the content of accumulator
• i.e. AC AC
Instruction Format Design
Issues:
• An instruction consists of an opcode and one or more operands,
implicitly or explicitly.
• Each explicit operand is referenced using one of the addressing
mode that is available for that machine.
• An instruction format is used to define the layout of the bits
allocated to these elements of instructions.
• Some of issues effecting instruction design are:
1. Instruction Length
2. Allocation of bits for different fields in an instruction
3. Variable length instruction
• An instruction consists of an opcode and one or more operands,
implicitly or explicitly.
• Each explicit operand is referenced using one of the addressing
mode that is available for that machine.
• An instruction format is used to define the layout of the bits
allocated to these elements of instructions.
• Some of issues effecting instruction design are:
1. Instruction Length
2. Allocation of bits for different fields in an instruction
3. Variable length instruction
1. Instruction Length
• A longer instruction means more time in fetching an
instruction.
• For e.g. an instruction of length 32 bit on a machine with
word size of 16 bit will need two memory fetch to bring
the instruction.
• Programmer desires:
– More opcode and operands in a instruction as it will reduce the
program length.
– More addressing mode for greater flexibility in accessing
various types of data.
• A longer instruction means more time in fetching an
instruction.
• For e.g. an instruction of length 32 bit on a machine with
word size of 16 bit will need two memory fetch to bring
the instruction.
• Programmer desires:
– More opcode and operands in a instruction as it will reduce the
program length.
– More addressing mode for greater flexibility in accessing
various types of data.
Factors for deciding the
instruction length:
A. Memory Size
– More bits are required in address field to access larger memory
range.
B. Memory Organization
– If the system supports virtual memory then memory range is
larger than the physical memory. Hence required the more
number of addressing bits.
C. Bus Structure
– The instruction length should be equal to data bus length or
multiple of it.
D. Processor Speed
– The data transfer rate from the memory should be equal to the
processor speed.
A. Memory Size
– More bits are required in address field to access larger memory
range.
B. Memory Organization
– If the system supports virtual memory then memory range is
larger than the physical memory. Hence required the more
number of addressing bits.
C. Bus Structure
– The instruction length should be equal to data bus length or
multiple of it.
D. Processor Speed
– The data transfer rate from the memory should be equal to the
processor speed.
2. Allocation of Bits
• More opcodes obviously mean more bits in the opcode
field.
• Factors which are considered for selection of addressing
bits are:
A. Number of Addressing modes:
• More addressing modes, more bits will be needed.
B. Number of Operands:
• More operands – more number of bits needed
C. Register versus memory:
• If more and more registers can be used for operand
reference then the fewer bits are needed
• As number of register are far less than memory size.
• More opcodes obviously mean more bits in the opcode
field.
• Factors which are considered for selection of addressing
bits are:
A. Number of Addressing modes:
• More addressing modes, more bits will be needed.
B. Number of Operands:
• More operands – more number of bits needed
C. Register versus memory:
• If more and more registers can be used for operand
reference then the fewer bits are needed
• As number of register are far less than memory size.
D. Number of Register Sets:
• Assume that A machine has 16 general purpose registers, a
register address require 4 bits.
• However if these 16 registers are divided into two groups, then
one of the 8 register of a group will need 3 bits for register
addressing.
E. Address Range:
• The range of addresses that can be referenced is related to the
number of address bits.
• With displacement addressing, the range is opened up to the
length of the address register.
F. Address Granularity:
• In a system with 16- or 32-bit words, an address can reference
a word or a byte at the designer's choice.
2. Allocation of Bits
D. Number of Register Sets:
• Assume that A machine has 16 general purpose registers, a
register address require 4 bits.
• However if these 16 registers are divided into two groups, then
one of the 8 register of a group will need 3 bits for register
addressing.
E. Address Range:
• The range of addresses that can be referenced is related to the
number of address bits.
• With displacement addressing, the range is opened up to the
length of the address register.
F. Address Granularity:
• In a system with 16- or 32-bit words, an address can reference
a word or a byte at the designer's choice.
3. Variable length Instruction
• Instead of looking for fixed length instruction format,
designer may choose to provide a variety of instructions
formats of different lengths.
• Addressing can be more flexible, with various
combinations of register and memory references plus
addressing modes.
• Disadvantage: an increase in the complexity of the
CPU.
• Instead of looking for fixed length instruction format,
designer may choose to provide a variety of instructions
formats of different lengths.
• Addressing can be more flexible, with various
combinations of register and memory references plus
addressing modes.
• Disadvantage: an increase in the complexity of the
CPU.
Concept of Program Execution
• The instructions constituting a program to be executed
by a computer are loaded in sequential locations in its
main memory.
• Processor fetches one instruction at a time and perform
the operation specified.
• Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
• Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
• Instruction Register (IR)
• The instructions constituting a program to be executed
by a computer are loaded in sequential locations in its
main memory.
• Processor fetches one instruction at a time and perform
the operation specified.
• Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
• Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
• Instruction Register (IR)
Executing an Instruction
1. Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
2. Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch
phase).
PC ← [PC] + 4
3. Carry out the actions specified by the instruction in
the IR (execution phase).
1. Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
2. Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch
phase).
PC ← [PC] + 4
3. Carry out the actions specified by the instruction in
the IR (execution phase).
Addressing Modes
• The term addressing mode refers to the mechanism employed for
specifying operands.
• An operand can be specified as part of the instruction or
reference of the memory locations can be given.
• An operand could also be an address of CPU register.
• The most common addressing techniques are:
 Immediate
 Direct
 Indirect
 Register
 Register Indirect
 Displacement
 Stack
• The term addressing mode refers to the mechanism employed for
specifying operands.
• An operand can be specified as part of the instruction or
reference of the memory locations can be given.
• An operand could also be an address of CPU register.
• The most common addressing techniques are:
 Immediate
 Direct
 Indirect
 Register
 Register Indirect
 Displacement
 Stack
Addressing Modes
To explain the addressing modes, we use the following notation:
A=contents of an address field in the instruction that refers to a
memory
R=contents of an address field in the instruction that refers to a register
EA=actual (effective) address of the location containing the referenced
operand
(X)=contents of memory location X or register X
To explain the addressing modes, we use the following notation:
A=contents of an address field in the instruction that refers to a
memory
R=contents of an address field in the instruction that refers to a register
EA=actual (effective) address of the location containing the referenced
operand
(X)=contents of memory location X or register X
1. Immediate Addressing:
• The operand is actually present in the instruction
• OPERAND = A
• This mode can be used to define and use constants or
set initial values of variables.
• The advantage of immediate addressing is that no
memory reference other than the instruction fetch is
required to obtain the operand.
• e.g. MOVE R0,300
• The operand is actually present in the instruction
• OPERAND = A
• This mode can be used to define and use constants or
set initial values of variables.
• The advantage of immediate addressing is that no
memory reference other than the instruction fetch is
required to obtain the operand.
• e.g. MOVE R0,300
• The operand is actually present in the instruction
• OPERAND = A
• This mode can be used to define and use constants or
set initial values of variables.
• The advantage of immediate addressing is that no
memory reference other than the instruction fetch is
required to obtain the operand.
• e.g. MOVE R0,300
• The operand is actually present in the instruction
• OPERAND = A
• This mode can be used to define and use constants or
set initial values of variables.
• The advantage of immediate addressing is that no
memory reference other than the instruction fetch is
required to obtain the operand.
• e.g. MOVE R0,300 Immediate addressing
2. Direct Addressing
• The address field contains the effective address of the
operand: EA= A
• It requires only one memory reference and no special
calculation.
• Here, 'A' indicates the memory address field for the
operand.
• e.g. MOVE R1, 1001
• The address field contains the effective address of the
operand: EA= A
• It requires only one memory reference and no special
calculation.
• Here, 'A' indicates the memory address field for the
operand.
• e.g. MOVE R1, 1001
Direct addressing
3. Indirect Addressing
• The effective address of the operand is stored in the
memory and the instruction contains the address of the
memory containing the address of the data.
• This is know as indirect addressing:
EA = (A)
• Here 'A' indicates the memory
address field of the required
Operands.
• E.g. MOVE R0,(1000)
• The effective address of the operand is stored in the
memory and the instruction contains the address of the
memory containing the address of the data.
• This is know as indirect addressing:
EA = (A)
• Here 'A' indicates the memory
address field of the required
Operands.
• E.g. MOVE R0,(1000)
Indirect addressing
4. Register Addressing
• The instruction specifies the address of the register
containing the operand.
• The instruction contains the name of the a CPU register.
EA =R indicates a register where the operand is present.
• E.g. MOVE R1, 1010
• The instruction specifies the address of the register
containing the operand.
• The instruction contains the name of the a CPU register.
EA =R indicates a register where the operand is present.
• E.g. MOVE R1, 1010
Register addressing
5. Register Indirect Addressing
• The effective address of the operand is stored in a
register and instruction contains the address of the
register containing the address of the data.
• EA = (R)
• Here 'R' indicates the memory
address field of the required
Operands.
• E.g. MOVE R0,(R1)
• The effective address of the operand is stored in a
register and instruction contains the address of the
register containing the address of the data.
• EA = (R)
• Here 'R' indicates the memory
address field of the required
Operands.
• E.g. MOVE R0,(R1)
Register addressing
Register indirect
addressing
6. Displacement Addressing
• A combination of both direct addressing and register
indirect addressing modes.
EA = A + (R)
• The value contained in one address field (value = A) is
used directly. The other address field refers to a register
whose contents are added to A to produce the effective
address.
• A combination of both direct addressing and register
indirect addressing modes.
EA = A + (R)
• The value contained in one address field (value = A) is
used directly. The other address field refers to a register
whose contents are added to A to produce the effective
address.
6. Displacement Addressing
Three of the most common use of displacement addressing
are:
Relative addressing
Base-register addressing
Indexing
Three of the most common use of displacement addressing
are:
Relative addressing
Base-register addressing
Indexing
Relative addressing
• For relative addressing, the implicitly referenced register is the
program counter (PC).
• The current instruction address is added to the address field to
produce the EA.
• Thus, the effective address is a displacement relative to the
address of the instruction.
• e.g. 1001 JC X1
1050 X1: ADD R1,5
• X1=address of the target instruction-address of the current instruction
=1050-1001=49
• For relative addressing, the implicitly referenced register is the
program counter (PC).
• The current instruction address is added to the address field to
produce the EA.
• Thus, the effective address is a displacement relative to the
address of the instruction.
• e.g. 1001 JC X1
1050 X1: ADD R1,5
• X1=address of the target instruction-address of the current instruction
=1050-1001=49
Base Register Addressing
• The base register(reference register) contains a memory
address, and the address field contains a displacement from
that base address specified by the base register.
EA=A+(B)
Indexing or Indexed
Addressing
• Used to access elements an array which are stored in
consecutive location of memory.
EA = A+(R)
• Address field A gives main memory address and R
contains positive displacement with respect to base
address.
• The displacement can be specified either directly in the
instruction or through another registers.
• E.g. MOVE R1, (BR+5) MOVE R0,(BR+R1)
• Used to access elements an array which are stored in
consecutive location of memory.
EA = A+(R)
• Address field A gives main memory address and R
contains positive displacement with respect to base
address.
• The displacement can be specified either directly in the
instruction or through another registers.
• E.g. MOVE R1, (BR+5) MOVE R0,(BR+R1)
Starting
address
Starting
addressOffset(index)
Offset(index)
Auto Indexing
• Generally index register are used for iterative tasks, it
is typical that there is a need to increment or decrement
the index register after each reference to it.
• Because this is such a common operation, some system
will automatically do this as part of the same instruction
cycle.
• This is known as auto-indexing.
• Two types of auto-indexing
1. auto-incrementing
2. auto-decrementing.
• Generally index register are used for iterative tasks, it
is typical that there is a need to increment or decrement
the index register after each reference to it.
• Because this is such a common operation, some system
will automatically do this as part of the same instruction
cycle.
• This is known as auto-indexing.
• Two types of auto-indexing
1. auto-incrementing
2. auto-decrementing.
a. Auto Increment Mode
• If register R contains the address of the operand
• After accessing the operand, the contents of register R is
incremented to point to the next item in the list.
• Auto-indexing using increment can be depicted as
follows:
EA = A + (R) or EA = (R)+
(R) = (R) + 1
• E.g. MOVE R1,1010 /*starting Memory location 1010
is stored in R1*/
• ADD AC,(R1)+ /*contents of 1010 ML are added to AC and
the contents of R1 is incremented by 1*/
• If register R contains the address of the operand
• After accessing the operand, the contents of register R is
incremented to point to the next item in the list.
• Auto-indexing using increment can be depicted as
follows:
EA = A + (R) or EA = (R)+
(R) = (R) + 1
• E.g. MOVE R1,1010 /*starting Memory location 1010
is stored in R1*/
• ADD AC,(R1)+ /*contents of 1010 ML are added to AC and
the contents of R1 is incremented by 1*/
b. Auto Decrement Mode
• The contents of register specified in the instruction are
decremented and these contents are then used as the
effective address of the operand.
• Auto-indexing using decrement can be depicted as
follows:
EA = A – (R) or EA = –(R)
(R) = (R) – 1
• The contents of the register are to be decremented
before used as the effective address.
• E.g. ADD R1,-(R2)
• The contents of register specified in the instruction are
decremented and these contents are then used as the
effective address of the operand.
• Auto-indexing using decrement can be depicted as
follows:
EA = A – (R) or EA = –(R)
(R) = (R) – 1
• The contents of the register are to be decremented
before used as the effective address.
• E.g. ADD R1,-(R2)
7. Stack Addressing
• A stack is a linear array or list of locations.
• Sometimes referred to as a pushdown list or last-in-
first-out queue.
• Associated with the stack is a pointer whose value is the
address of the top of the stack.
• The stack pointer is maintained in a register. Thus,
references to stack locations in memory are in fact
register indirect addresses.
• The stack mode of addressing is a form of implied
addressing.
• E.g. PUSH and POP
• A stack is a linear array or list of locations.
• Sometimes referred to as a pushdown list or last-in-
first-out queue.
• Associated with the stack is a pointer whose value is the
address of the top of the stack.
• The stack pointer is maintained in a register. Thus,
references to stack locations in memory are in fact
register indirect addresses.
• The stack mode of addressing is a form of implied
addressing.
• E.g. PUSH and POP
Basic Instruction Cycle
• Fetch cycle basically involves read the next instruction from
the memory into the CPU and along with that update
the contents of the program counter.
• In the execution phase, it interprets the opcode and
perform the indicated operation.
• The instruction fetch and execution phase together known as
instruction cycle.
Basic Instruction Cycle with
Interrupt
An instruction cycle includes the following sub cycles:
1. Fetch: Read the next instruction from memory into the
processor.
2. Execute: Interpret the opcode and perform the indicated
operation.
3. Interrupt: If interrupts are enabled and an interrupt has
occurred, save the current process state and service the
interrupt.
An instruction cycle includes the following sub cycles:
1. Fetch: Read the next instruction from memory into the
processor.
2. Execute: Interpret the opcode and perform the indicated
operation.
3. Interrupt: If interrupts are enabled and an interrupt has
occurred, save the current process state and service the
interrupt.
Basic Instruction Cycle with
Interrupt
The Indirect Cycle
• The execution of an instruction may involve one or more
operands in memory, each of which requires a memory
access.
• Further, if indirect addressing is used, then additional
memory accesses are required.
• For fetching the indirect addresses as one more
instructions subcycle are required.
• After an instruction is fetched, it is examined to determine
if any indirect addressing is involved. If so, the required
operands are fetched using indirect addressing.
• The execution of an instruction may involve one or more
operands in memory, each of which requires a memory
access.
• Further, if indirect addressing is used, then additional
memory accesses are required.
• For fetching the indirect addresses as one more
instructions subcycle are required.
• After an instruction is fetched, it is examined to determine
if any indirect addressing is involved. If so, the required
operands are fetched using indirect addressing.
The Indirect Cycle
Instruction Cycle State
Diagram
• Instruction address calculation (iac): Determine the address of
the next instruction to be executed. Usually, this involves adding a
fixed number to the address of the previous instruction.
• Instruction fetch (if): Read instruction from its memory location into
the processor.
• Instruction operation decoding (iod): Analyze instruction to
determine type of operation to be performed and operand(s) to be
used.
• Operand address calculation (oac): If the operation involves
reference to an operand in memory or available via I/O, then
determine the address of the operand.
• Operand fetch (of): Fetch the operand from memory or read it in
from I/O.
• Data operation (do): Perform the operation indicated in the
instruction.
• Operand store (os): Write the result into memory or out to I/O.
• Instruction address calculation (iac): Determine the address of
the next instruction to be executed. Usually, this involves adding a
fixed number to the address of the previous instruction.
• Instruction fetch (if): Read instruction from its memory location into
the processor.
• Instruction operation decoding (iod): Analyze instruction to
determine type of operation to be performed and operand(s) to be
used.
• Operand address calculation (oac): If the operation involves
reference to an operand in memory or available via I/O, then
determine the address of the operand.
• Operand fetch (of): Fetch the operand from memory or read it in
from I/O.
• Data operation (do): Perform the operation indicated in the
instruction.
• Operand store (os): Write the result into memory or out to I/O.
Instruction Interpretation and
Sequencing
• Every processor has some basic type of instructions like
data transfer instruction, arithmetic and logical
instruction, branch instruction and so on.
• To perform a particular task on the computer it is
programmers job to select and write appropriate
instructions one after the other. This job of
programmer is known as instruction sequencing.
• Two types:
1. Straight line sequencing
2. Branch instructon
• Every processor has some basic type of instructions like
data transfer instruction, arithmetic and logical
instruction, branch instruction and so on.
• To perform a particular task on the computer it is
programmers job to select and write appropriate
instructions one after the other. This job of
programmer is known as instruction sequencing.
• Two types:
1. Straight line sequencing
2. Branch instructon
1.Straight line sequencing
• Processor executes a program with the help of Program
Counter(PC) which holds the address of the next instruction
to be executed.
• To begin execution of a program, the address of the its first
instruction is placed into the PC.
• The processor control circuit fetches instruction from the
memory address specified by the PC and executes
instruction, one at a time.
• At the same time the content of PC is incremented so as to
point to the address of next instruction.
• This is called as straight line sequencing.
• Processor executes a program with the help of Program
Counter(PC) which holds the address of the next instruction
to be executed.
• To begin execution of a program, the address of the its first
instruction is placed into the PC.
• The processor control circuit fetches instruction from the
memory address specified by the PC and executes
instruction, one at a time.
• At the same time the content of PC is incremented so as to
point to the address of next instruction.
• This is called as straight line sequencing.
2. Branch Instruction
• After executing decision making instruction, processor
have to follow one of the two program sequence.
• Branch instruction transfer the program control from one
straight line sequence to another straight line sequence
instruction.
• In branch instruction, the new address called target
address or branch target is loaded into PC and
instruction is fetched from the new address.
• After executing decision making instruction, processor
have to follow one of the two program sequence.
• Branch instruction transfer the program control from one
straight line sequence to another straight line sequence
instruction.
• In branch instruction, the new address called target
address or branch target is loaded into PC and
instruction is fetched from the new address.
Figure 7.1. Single-bus organization of the datapath inside a processor.
Register Transfers
Yin
R iin
R i
R iout
bus
Internal processor
BA
Z
ALU
Y
Z in
Z out
Constant 4
MUX
Figure 7.2. Input and output gating for the registers in Figure 7.1.
Select
Register Transfers
• The data transfer between registers and common bus is shown by a
line with arrow heads.
• But in actual practice each register has input and output gating and
these gates are controlled by corresponding control signal.
• Riin and Riout : control signals for input and output gating of register
Ri.
• When Riin=1, the data available on the common data bus is loaded
into register Ri.
• When Riout=1, the contents of Ri are placed on the common data
bus.
• The signals Riin and Riout are commonly known as input enable and
output enable signals of registers, respectively.
• The data transfer between registers and common bus is shown by a
line with arrow heads.
• But in actual practice each register has input and output gating and
these gates are controlled by corresponding control signal.
• Riin and Riout : control signals for input and output gating of register
Ri.
• When Riin=1, the data available on the common data bus is loaded
into register Ri.
• When Riout=1, the contents of Ri are placed on the common data
bus.
• The signals Riin and Riout are commonly known as input enable and
output enable signals of registers, respectively.
Register Transfers
• Consider the transfer of data from register R1 to R2. This
can be done as follows:
1. Activate R1out=1, this signal places the contents of
register R1 on the common bus.
2. Activate R2in=1, this loads data from the common data
bus into the register R2.
• Consider the transfer of data from register R1 to R2. This
can be done as follows:
1. Activate R1out=1, this signal places the contents of
register R1 on the common bus.
2. Activate R2in=1, this loads data from the common data
bus into the register R2.
Micro Operations
• The primary function of a processor unit is to execute
sequence of instructions stored in a memory.
• The sequence of operation involved in processing an
instruction constitutes an instruction cycle, which can be
divided into three major phases: fetch cycle, decode
cycle and execute cycle.
• To perform fetch, decode and execute cycle the
processor unit has to perform set of operations called
Micro-operations .
• The primary function of a processor unit is to execute
sequence of instructions stored in a memory.
• The sequence of operation involved in processing an
instruction constitutes an instruction cycle, which can be
divided into three major phases: fetch cycle, decode
cycle and execute cycle.
• To perform fetch, decode and execute cycle the
processor unit has to perform set of operations called
Micro-operations .
Micro-operation includes
• Transfer a word of data from one CPU register to
another or to the ALU.
• Perform an arithmetic or a logic operation on the data
from CPU register and store the result in a CPU register.
• Fetch the contents of a given memory location and load
them into a CPU register.
• Store a word of data from a CPU register into a given
memory location.
• Transfer a word of data from one CPU register to
another or to the ALU.
• Perform an arithmetic or a logic operation on the data
from CPU register and store the result in a CPU register.
• Fetch the contents of a given memory location and load
them into a CPU register.
• Store a word of data from a CPU register into a given
memory location.
1. Performing an Arithmetic or Logic
Operation
• The ALU is a combinational circuit that has no
internal storage.
• ALU gets the two operands from MUX and bus. The
result is temporarily stored in register Z.
• What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
Control Signals:
1. R1out, Yin
2. R2out, Select Y, Add, Zin
3. Zout, R3in
• The ALU is a combinational circuit that has no
internal storage.
• ALU gets the two operands from MUX and bus. The
result is temporarily stored in register Z.
• What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
Control Signals:
1. R1out, Yin
2. R2out, Select Y, Add, Zin
3. Zout, R3in
• The Processor loads required Address into MAR; at the same
time it issue Read signal
• When the requested data is recieved from the memory it is
stored into MDR.
2. Fetching a Word from
Memory
2. Fetching a Word from
Memory
• Consider the instruction MOVE R2,(R1)
• The processor waits until it receives an indication that
the requested operation has been completed (Memory-
Function-Completed, MFC).
• The actions needed to execute this instruction are:
 MAR ← [R1]
 Activate the read control signal to perform Read operation
 If memory is slow, activate wait for memory function complete
(WMFC)
 Load MDR from the memory bus
 R2 ← [MDR]
• Consider the instruction MOVE R2,(R1)
• The processor waits until it receives an indication that
the requested operation has been completed (Memory-
Function-Completed, MFC).
• The actions needed to execute this instruction are:
 MAR ← [R1]
 Activate the read control signal to perform Read operation
 If memory is slow, activate wait for memory function complete
(WMFC)
 Load MDR from the memory bus
 R2 ← [MDR]
2. Fetching a Word from
Memory
• The various control signals which are necessary to
activate to perform given action in each step:
Control Signals:
1. R1out, MARin, Read
2. WMFC
3. MDRout, R2in
• The various control signals which are necessary to
activate to perform given action in each step:
Control Signals:
1. R1out, MARin, Read
2. WMFC
3. MDRout, R2in
3. Storing A Word In Memory:
 To write a word of data into a memory location processor
has to load the address of the desired memory location
in the MAR, load the data to be written in memory in
MDR and activate Write control signal.
 Consider the instruction MOVE (R2),R1.
 The actions needed to execute this instruction are:
 MAR ← [R2]
 MDR ← [R1]
 Activate the write control signal to perform write operation
 If memory is slow, activate wait for memory function complete
(WMFC)
 To write a word of data into a memory location processor
has to load the address of the desired memory location
in the MAR, load the data to be written in memory in
MDR and activate Write control signal.
 Consider the instruction MOVE (R2),R1.
 The actions needed to execute this instruction are:
 MAR ← [R2]
 MDR ← [R1]
 Activate the write control signal to perform write operation
 If memory is slow, activate wait for memory function complete
(WMFC)
3. Storing A Word In Memory:
 The various control signals which are necessary to
activate to perform given action in each step:
Control Signals:
1. R2out, MARin
2. R1out, MDRin, Write
3. WMFC
 The various control signals which are necessary to
activate to perform given action in each step:
Control Signals:
1. R2out, MARin
2. R1out, MDRin, Write
3. WMFC
Execution of a Complete
Instruction
• ADD R1,(R2)
This instruction adds the contents of register R1 and the content of
memory location specified by register R2 and store result in register
R1
1.Fetch the instruction
2.Fetch the first operand (the contents of the
memory location pointed to by R2)
3.Perform the addition
4.Load the result into R1
• ADD R1,(R2)
This instruction adds the contents of register R1 and the content of
memory location specified by register R2 and store result in register
R1
1.Fetch the instruction
2.Fetch the first operand (the contents of the
memory location pointed to by R2)
3.Perform the addition
4.Load the result into R1
Execution of a Complete
Instruction
1. PCout, MARin, Read,
Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R2out, MARin, Read
5. MDRout, Yin, WMFC
6. R1out, Select Y, Add, Zin
7. Zout, R1in, End
ADD R1,(R2)
Micro Instructions: Control Signals:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R2
MDR ← M(MAR)
Execute Cycle(5-7):
Y ← MDR
Z ← R1+Y
R1 ← Z
1. PCout, MARin, Read,
Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R2out, MARin, Read
5. MDRout, Yin, WMFC
6. R1out, Select Y, Add, Zin
7. Zout, R1in, End
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R2
MDR ← M(MAR)
Execute Cycle(5-7):
Y ← MDR
Z ← R1+Y
R1 ← Z
Execution of Branch
Instructions
• A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
• The offset X is usually the difference
between the branch target address and the
address immediately following the branch
instruction.
• A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
• The offset X is usually the difference
between the branch target address and the
address immediately following the branch
instruction.
Execution of Branch
Instructions
The control sequence for unconditional
branch instruction is as follows:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
The first three steps constitute the opcode fetch operation
4. Offset_field_of _IRout, Select Y, Add, Zin
The contents of PC and the offset field of IR register are added and
result is stored in register Z.
5. Zout, PCin, End
The control sequence for unconditional
branch instruction is as follows:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
The first three steps constitute the opcode fetch operation
4. Offset_field_of _IRout, Select Y, Add, Zin
The contents of PC and the offset field of IR register are added and
result is stored in register Z.
5. Zout, PCin, End
Execution of Branch
Instructions
 In case of conditional branch instruction the status of the
condition code specified in the instruction is checked.
 If the status specified within the instruction matches with
the current status of condition codes, the branch target
address is loaded in the PC by adding the offset
specified in the instruction to the contents of PC
 Otherwise processor fetches the next instruction in the
sequence.
 In case of conditional branch instruction the status of the
condition code specified in the instruction is checked.
 If the status specified within the instruction matches with
the current status of condition codes, the branch target
address is loaded in the PC by adding the offset
specified in the instruction to the contents of PC
 Otherwise processor fetches the next instruction in the
sequence.
Quiz
• Write control sequence
with micro-program for
ADD R1,R2
including the instruction
fetch phase? (Assume
single bus architecture)
• Write control sequence
for
SUB (R3)+,R1
Hint:
R1 = R1 – (R3)
R3 = R3 + 1
lines
Data
Address
lines
bus
Memory
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
bus
IR
TEMP
R0
control
ALU
lines
Control signals
R n 1-( )
Instruction
decoder and
Internal processor
control logic
A B
Figure 7.1. Single-bus organization of the datapath inside a processor.
MUXSelect
Constant 4
• Write control sequence
with micro-program for
ADD R1,R2
including the instruction
fetch phase? (Assume
single bus architecture)
• Write control sequence
for
SUB (R3)+,R1
Hint:
R1 = R1 – (R3)
R3 = R3 + 1
lines
Data
Address
lines
bus
Memory
Carry-in
ALU
PC
MAR
MDR
Y
Z
Add
XOR
Sub
bus
IR
TEMP
R0
control
ALU
lines
Control signals
R n 1-( )
Instruction
decoder and
Internal processor
control logic
A B
Figure 7.1. Single-bus organization of the datapath inside a processor.
MUXSelect
Constant 4
ADD R1,R2
Control Signals:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R1out, Yin, R2out, Select Y, Add, Zin
5. Zout, R1in, End
Micro Instructions:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch:
Not required
Execute Cycle(4,5):
Y ← R1
Z ← R2+Y
R1 ← Z
Control Signals:
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R1out, Yin, R2out, Select Y, Add, Zin
5. Zout, R1in, End
Micro Instructions:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch:
Not required
Execute Cycle(4,5):
Y ← R1
Z ← R2+Y
R1 ← Z
SUB (R3)+,R1
1. PCout, MARin, Read,
Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. MDRout, Yin, WMFC
6. R1out, Select Y, Sub, Zin
7. Zout, R1in
8. R3out, Select4, Add, Zin
9. Zout, R3in, End
Micro Instructions: Control Signals:
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R3
MDR ← M(MAR)
Execute Cycle(5-9):
Y ← MDR
Z ← R1-Y
R1 ← Z
R3 ← R3+1
1. PCout, MARin, Read,
Select4, Add, Zin
2. Zout, PCin, Yin, WMFC
3. MDRout, IRin
4. R3out, MARin, Read
5. MDRout, Yin, WMFC
6. R1out, Select Y, Sub, Zin
7. Zout, R1in
8. R3out, Select4, Add, Zin
9. Zout, R3in, End
Instruction Fetch(1-3):
MAR ← PC
MDR ← M(MAR)
PC ← PC+1
IR ← MDR (opcode)
Operand Fetch(4):
MAR ← R3
MDR ← M(MAR)
Execute Cycle(5-9):
Y ← MDR
Z ← R1-Y
R1 ← Z
R3 ← R3+1
Arithmetic Logic Unit (ALU)
 In this and the next section, we deal with detailed design
of typical ALUs and shifters
 Decompose the ALU into:
• An arithmetic circuit
• A logic circuit
• A selector to pick between the two circuits
 Arithmetic circuit design
• Decompose the arithmetic circuit into:
 An n-bit parallel adder
 A block of logic that selects four choices for the B input to the adder
 In this and the next section, we deal with detailed design
of typical ALUs and shifters
 Decompose the ALU into:
• An arithmetic circuit
• A logic circuit
• A selector to pick between the two circuits
 Arithmetic circuit design
• Decompose the arithmetic circuit into:
 An n-bit parallel adder
 A block of logic that selects four choices for the B input to the adder
Arithmetic Circuit Design
 There are only four functions of B to select as Y in G = A + Y:
• All 0’s
• B
• B
• All 1’s
Cin = 0 Cin = 1
G = A
G = A + 1
G = A – 1
G = A + B
G = A
G = A + B
G = A + B + 1
G = A + B + 1
Transfer of A
Addition
Subtraction 1C
Decrement
Increment
Transfer of A
Subtraction 2C
Operation Operation
S1
S0
B
n
B input
logic
n
A
n
X
Cin
Y
n G = X + Y + Cin
Cout
n-bit
parallel
adder
Logic Circuit
 The text gives a circuit implemented using a multiplexer
plus gates implementing: AND, OR, XOR and NOT
 Here we custom design a circuit for bit Gi by beginning
with a truth table organized as logic operation K-map
and assigning (S1, S0) codes to AND, OR, etc.
 Gi = S0 Ai Bi + S1 Ai Bi
+ S0 Ai Bi + S1 S0 Ai
 Custom design better
S1S0 AND OR XOR NOT
 The text gives a circuit implemented using a multiplexer
plus gates implementing: AND, OR, XOR and NOT
 Here we custom design a circuit for bit Gi by beginning
with a truth table organized as logic operation K-map
and assigning (S1, S0) codes to AND, OR, etc.
 Gi = S0 Ai Bi + S1 Ai Bi
+ S0 Ai Bi + S1 S0 Ai
 Custom design better
S1S0 AND OR XOR NOT
AiBi 0 0 0 1 1 1 1 0
0 0 0 0 0 1
0 1 0 1 1 1
1 1 1 1 0 0
1 0 0 1 1 0
Arithmetic Logic Unit (ALU)
 The custom circuit has interchanged the (S1,S0) codes for XOR
and NOT compared to the MUX circuit. To preserve compatibility
with the text, we use the MUX solution.
 Next, use the arithmetic circuit, the logic circuit, and a 2-way
multiplexer to form the ALU.
 The input connections to the arithmetic circuit and logic circuit
have been assigned to prepare for seamless addition of the shifter,
keeping the selection codes for the combined ALU and the shifter
at 4 bits:
• Carry-in Ci and Carry-out Ci+1 go between bits
• Ai and Bi are connected to both units
• A new signal S2 performs the arithmetic/logic selection
• The select signal entering the LSB of the arithmetic circuit, Cin,
is connected to the least significant selection input for the logic
circuit, S0.
 The custom circuit has interchanged the (S1,S0) codes for XOR
and NOT compared to the MUX circuit. To preserve compatibility
with the text, we use the MUX solution.
 Next, use the arithmetic circuit, the logic circuit, and a 2-way
multiplexer to form the ALU.
 The input connections to the arithmetic circuit and logic circuit
have been assigned to prepare for seamless addition of the shifter,
keeping the selection codes for the combined ALU and the shifter
at 4 bits:
• Carry-in Ci and Carry-out Ci+1 go between bits
• Ai and Bi are connected to both units
• A new signal S2 performs the arithmetic/logic selection
• The select signal entering the LSB of the arithmetic circuit, Cin,
is connected to the least significant selection input for the logic
circuit, S0.
Arithmetic Logic Unit (ALU)
 The next most significant select signals, S0 for the arithmetic circuit
and S1 for the logic circuit, are wired together, completing the two
select signals for the logic circuit.
 The remaining S1 completes the three select signals for the arithmetic
circuit.
Ci Ci + 1
One stage of
arithmetic
circuit 2-to-1
MUX0
1
S
A i
Bi
S0
S1
Ci
G i
A i
Bi
S0
S1
A
 The next most significant select signals, S0 for the arithmetic circuit
and S1 for the logic circuit, are wired together, completing the two
select signals for the logic circuit.
 The remaining S1 completes the three select signals for the arithmetic
circuit.
One stage of
logic circuit
S
S2
A i
Bi
S0
S1
Cin
Shifters
 Required for data processing, multiplication, division etc.
 Direction: Left, Right
 Number of positions with examples:
• Single bit:
 1 position
 0 and 1 positions
• Multiple bit:
 1 to n – 1 positions
 0 to n – 1 positions
 Filling of vacant positions
• Many options depending on instruction set
• Here, will provide input lines or zero fill
 Required for data processing, multiplication, division etc.
 Direction: Left, Right
 Number of positions with examples:
• Single bit:
 1 position
 0 and 1 positions
• Multiple bit:
 1 to n – 1 positions
 0 to n – 1 positions
 Filling of vacant positions
• Many options depending on instruction set
• Here, will provide input lines or zero fill
4-Bit Basic Left/Right Shifter
 Serial Inputs:
• IR for right shift
• IL for left shift
 Serial Outputs
• R for right shift (Same as MSB input)
• L for left shift (Same as LSB input)
 Shift Functions:
(S1, S0) = 00 Pass B unchanged
01 Right shift
10 Left shift
11 Unused
B3
IR IL
Serial
output L
Serial
output R
B2 B1 B0
S
M
U
X
0 1 2
S
M
U
X
0 1 2
S
M
U
X
0 1 2
S
M
U
X
0 1 2
 Serial Inputs:
• IR for right shift
• IL for left shift
 Serial Outputs
• R for right shift (Same as MSB input)
• L for left shift (Same as LSB input)
 Shift Functions:
(S1, S0) = 00 Pass B unchanged
01 Right shift
10 Left shift
11 Unused
S
2
H0H1H2H3
X X X X
What is Pipelining?
 A technique used in advanced microprocessors where the
microprocessor begins executing a second instruction
before the first has been completed.
 A Pipeline is a series of stages, where some work is done
at each stage. The work is not finished until it has passed
through all stages.
 With pipelining, the computer architecture allows the next
instructions to be fetched while the processor is
performing arithmetic operations, holding them in a buffer
close to the processor until each instruction operation can
performed.
 A technique used in advanced microprocessors where the
microprocessor begins executing a second instruction
before the first has been completed.
 A Pipeline is a series of stages, where some work is done
at each stage. The work is not finished until it has passed
through all stages.
 With pipelining, the computer architecture allows the next
instructions to be fetched while the processor is
performing arithmetic operations, holding them in a buffer
close to the processor until each instruction operation can
performed.
How Pipeline Works?
 The pipeline is divided into segments and each
segment can execute its operation concurrently
with the other segments. Once a segment
completes an operation, it passes the result to
the next segment in the pipeline and fetches the
next operations from the preceding segment.
 The pipeline is divided into segments and each
segment can execute its operation concurrently
with the other segments. Once a segment
completes an operation, it passes the result to
the next segment in the pipeline and fetches the
next operations from the preceding segment.
Basic Ideas
Data Dependence
Advantages/Disadvantages
 Advantages:
• More efficient use of processor
• Quicker time of execution of large number of
instructions
 Disadvantages:
• Pipelining involves adding hardware to the chip
• Inability to continuously run the pipeline at full speed
because of pipeline hazards which disrupt the smooth
execution of the pipeline.
 Advantages:
• More efficient use of processor
• Quicker time of execution of large number of
instructions
 Disadvantages:
• Pipelining involves adding hardware to the chip
• Inability to continuously run the pipeline at full speed
because of pipeline hazards which disrupt the smooth
execution of the pipeline.
Instruction Pipelining
 Pipeline can also occur in instruction stream as with data
stream
 Consecutive instructions are read from memory while
previous instructions are executed in various pipeline
stages.
 Difficulties
• Different execution times for different pipeline stages
• Some instructions may skip some of the stages. E.g. No need of
effective address calculation in immediate or register addressing
• Two or more stages require access of memory at same time.
E.g. instruction fetch and operand fetch at same time
 Pipeline can also occur in instruction stream as with data
stream
 Consecutive instructions are read from memory while
previous instructions are executed in various pipeline
stages.
 Difficulties
• Different execution times for different pipeline stages
• Some instructions may skip some of the stages. E.g. No need of
effective address calculation in immediate or register addressing
• Two or more stages require access of memory at same time.
E.g. instruction fetch and operand fetch at same time
Pipeline Stages
 Consider the following decomposition for processing the
instructions
 Fetch instruction (FI) – Read into a buffer
 Decode instruction (DI) – Determine opcode, operands
 Calculate operands (CO) – Indirect, Register indirect, etc.
 Fetch operands (FO) – Fetch operands from memory
 Execute instructions (EI)- Execute
 Write operand (WO) – Store result if applicable
 Overlap these operations to make a 6 stage pipeline
 Consider the following decomposition for processing the
instructions
 Fetch instruction (FI) – Read into a buffer
 Decode instruction (DI) – Determine opcode, operands
 Calculate operands (CO) – Indirect, Register indirect, etc.
 Fetch operands (FO) – Fetch operands from memory
 Execute instructions (EI)- Execute
 Write operand (WO) – Store result if applicable
 Overlap these operations to make a 6 stage pipeline
Timing of Instruction Pipeline -
six stages
Reduction in
instruction
execution time
from 54 time
units to 14 time
units
Reduction in
instruction
execution time
from 54 time
units to 14 time
units
Data Dependencies
• When two instructions access the same register.
• RAW: Read-After-Write
• True dependency
• WAR: Write-After-Read
• Anti-dependency
• WAW: Write-After-Write
• False-dependency
• Key problem with regular in-order pipelines is
RAW.
• When two instructions access the same register.
• RAW: Read-After-Write
• True dependency
• WAR: Write-After-Read
• Anti-dependency
• WAW: Write-After-Write
• False-dependency
• Key problem with regular in-order pipelines is
RAW.
Pipeline Hazards
 Structural Hazard or Resource Conflict – Two instructions
need to access the same resource at same time.
 Data Hazard or Data Dependency – An instruction uses
the result of the previous instruction before it is ready.
 A hazard occurs exactly when an instruction tries to
read a register in its ID stage that an earlier instruction
intends to write in its WO stage.
 Control Hazard or Branch Difficulty – The location of an
instruction depends on previous instruction
 Conditional branches break the pipeline
 Stuff we fetched in advance is useless if we take the branch
 Pipeline implementation need to detect and resolve
hazards.
 Structural Hazard or Resource Conflict – Two instructions
need to access the same resource at same time.
 Data Hazard or Data Dependency – An instruction uses
the result of the previous instruction before it is ready.
 A hazard occurs exactly when an instruction tries to
read a register in its ID stage that an earlier instruction
intends to write in its WO stage.
 Control Hazard or Branch Difficulty – The location of an
instruction depends on previous instruction
 Conditional branches break the pipeline
 Stuff we fetched in advance is useless if we take the branch
 Pipeline implementation need to detect and resolve
hazards.
Structural Hazard
• When IF stage requires memory access for instruction fetch, and MEM
stage need memory access for operand fetch at the same time.
• Solutions:
 Stalling (Waiting/Delaying)
 Delayed by one clock
cycle
 Split cache
 Separate cache for
instructions(code cache)
and operands(data cache)
• When IF stage requires memory access for instruction fetch, and MEM
stage need memory access for operand fetch at the same time.
• Solutions:
 Stalling (Waiting/Delaying)
 Delayed by one clock
cycle
 Split cache
 Separate cache for
instructions(code cache)
and operands(data cache)
Data Hazard
• Data hazards occur when data is used before it is ready.
Read After Write (RAW)
InstrJ tries to read operand before InstrI writes it
• Caused by a “Dependence” (in compiler nomenclature).
This hazard results from an actual need for
communication.
Execution Order is:
InstrI
InstrJ
Read After Write (RAW)
InstrJ tries to read operand before InstrI writes it
• Caused by a “Dependence” (in compiler nomenclature).
This hazard results from an actual need for
communication.
I: add r1,r2,r3
J: sub r4,r1,r3
Data Hazard (cont.)
Write After Read (WAR)
InstrJ tries to write operand before InstrI reads it
– Gets wrong operand
– Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
Execution Order is:
InstrI
InstrJ
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
Write After Read (WAR)
InstrJ tries to write operand before InstrI reads it
– Gets wrong operand
– Called an “anti-dependence” by compiler writers.
This results from reuse of the name “r1”.
I: sub r4,r1,r3
J: add r1,r2,r3
K: mul r6,r1,r7
Data Hazard (cont.)
Write After Write (WAW)
InstrJ tries to write operand before InstrI writes it
– Leaves wrong result ( InstrI not InstrJ )
• Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
Execution Order is:
InstrI
InstrJ
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
Write After Write (WAW)
InstrJ tries to write operand before InstrI writes it
– Leaves wrong result ( InstrI not InstrJ )
• Called an “output dependence” by compiler writers
This also results from the reuse of name “r1”.
I: sub r1,r4,r3
J: add r1,r2,r3
K: mul r6,r1,r7
Data Hazard (cont.)
• Solutions for Data Hazards
– Stalling
– Forwarding:
» connect new value directly to next stage
– Reordering
• Solutions for Data Hazards
– Stalling
– Forwarding:
» connect new value directly to next stage
– Reordering
Data Hazard - Stalling
0 2 4 6 8 10 12
IF ID EX MEM
16
add $s0,$t0,$t1
STALL
18
sub $t2, $s0,$t3 IF EX MEM
STALL
BUBBLE BUBBLE BUBBLE BUBBLE
BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE
$s0
written
here
W
s0
WB
$s0 read
here
R
s0
BUBBLE
0 2 4 6 8 10 12
IF ID EX MEM
16
add $s0,$t0,$t1
STALL
18
sub $t2, $s0,$t3 IF EX MEM
STALL
BUBBLE BUBBLE BUBBLE BUBBLE
BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE
$s0
written
here
W
s0
WB
$s0 read
here
R
s0
BUBBLE
Data Hazard - Forwarding
• Key idea: connect new value directly to next stage
• Still read s0, but ignore in favor of new result
ID
0 2 4 6 8 10 12
IF ID EX MEM
16
add $s0 ,$t0,$t1
18
sub $t2, $s0 ,$t3 IF EX MEM
W
s0
WBR
s0
new value
of s0
• Key idea: connect new value directly to next stage
• Still read s0, but ignore in favor of new result
ID
0 2 4 6 8 10 12
IF ID EX MEM
16
add $s0 ,$t0,$t1
18
sub $t2, $s0 ,$t3 IF EX MEM
W
s0
WBR
s0
new value
of s0
Data Hazard
This is another representation of the stall.
LW R1, 0(R2) IF ID EX MEM WB
SUB R4, R1, R5 IF ID EX MEM WB
AND R6, R1, R7 IF ID EX MEM WBAND R6, R1, R7 IF ID EX MEM WB
OR R8, R1, R9 IF ID EX MEM WB
LW R1, 0(R2) IF ID EX MEM WB
SUB R4, R1, R5 IF ID stall EX MEM WB
AND R6, R1, R7 IF stall ID EX MEM WB
OR R8, R1, R9 stall IF ID EX MEM WB
Data Hazard - Reordering
• Consider a program segment
1. t = 5
2. x = y + z
3. p = x + 1
• Instruction 3 is dependent on instruction 2 but
Instruction 2 and 3 has no dependency on instruction 1
• So after reordering program segment will be
1. x = y + z
2. t = 5
3. p = x + 1
1 2 3 4 5 6
IF ID OF EX
IF ID x OF EX
Pipelined
execution
• Consider a program segment
1. t = 5
2. x = y + z
3. p = x + 1
• Instruction 3 is dependent on instruction 2 but
Instruction 2 and 3 has no dependency on instruction 1
• So after reordering program segment will be
1. x = y + z
2. t = 5
3. p = x + 1
1 2 3 4 5 6
IF ID OF EX
IF ID OF EX
IF ID OF EX
Pipelined execution
after reordering
Control Hazard
• Caused by branch instructions – unconditional and
conditional branches.
 Unconditional – branch always
 Conditional – may or may not cause branching
• In pipelined processor following actions are critical:
 Timely detection of a branch instruction
 Early calculation of branch address
 Early testing of branch condition for conditional branch instructions
• Caused by branch instructions – unconditional and
conditional branches.
 Unconditional – branch always
 Conditional – may or may not cause branching
• In pipelined processor following actions are critical:
 Timely detection of a branch instruction
 Early calculation of branch address
 Early testing of branch condition for conditional branch instructions
Control Hazard (cont.)
• Branch Not Taken
Control Hazard (cont.)
• Branch in a Pipeline – Flushed Pipeline
Dealing with Branches
• Delayed branches
• Branch prediction
• Multiple Streams
• Prefetch Branch Target
• Loop buffer
• Delayed branches
• Branch prediction
• Multiple Streams
• Prefetch Branch Target
• Loop buffer
Delayed Branch
 Delayed Branch – used with RISC machines
 Requires some clever rearrangement of instructions
 Burden on programmers but can increase performance
 Most RISC machines: Doesn’t flush the pipeline in case of a
branch
 Called the Delayed Branch
 This means if we take a branch, we’ll still continue to execute
whatever is currently in the pipeline, at a minimum the next instruction
 Benefit: Simplifies the hardware quite a bit
 But we need to make sure it is safe to execute the remaining
instructions in the pipeline
 Simple solution to get same behavior as a flushed pipeline: Insert
NOP – No Operation – instructions after a branch
 Called the Delay Slot
 Delayed Branch – used with RISC machines
 Requires some clever rearrangement of instructions
 Burden on programmers but can increase performance
 Most RISC machines: Doesn’t flush the pipeline in case of a
branch
 Called the Delayed Branch
 This means if we take a branch, we’ll still continue to execute
whatever is currently in the pipeline, at a minimum the next instruction
 Benefit: Simplifies the hardware quite a bit
 But we need to make sure it is safe to execute the remaining
instructions in the pipeline
 Simple solution to get same behavior as a flushed pipeline: Insert
NOP – No Operation – instructions after a branch
 Called the Delay Slot
Delayed Branch (cont.)
 Normal vs Delayed Branch
 One delay slot - Next instruction is always in the pipeline.
“Normal” path contains an implicit “NOP” instruction as the
pipeline gets flushed. Delayed branch requires explicit NOP
instruction placed in the code!
 Normal vs Delayed Branch
 One delay slot - Next instruction is always in the pipeline.
“Normal” path contains an implicit “NOP” instruction as the
pipeline gets flushed. Delayed branch requires explicit NOP
instruction placed in the code!
Delayed Branch (cont.)
 Optimized Delayed Branch
 But we can optimize this code by rearrangement! Notice we
always Add 1 to A so we can use this instruction to fill the delay
slot
Branch Prediction
 Predict never taken
 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction
 Studies indicate branches are taken around 60% of the
time in most programs
 Predict never taken
 Assume that jump will not happen
 Always fetch next instruction
 68020 & VAX 11/780
 VAX will not prefetch after branch if a page fault would
result
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction
 Studies indicate branches are taken around 60% of the
time in most programs
Branch Prediction (cont.)
 Predict by Opcode
 Some types of branch instructions are more likely to result in a
jump than others (e.g. LOOP vs. JUMP)
 Can get up to 75% success
 Taken/Not taken switch – 1 bit branch predictor
 Based on previous history
 If a branch was taken last time, predict it will be taken again
 If a branch was not taken last time, predict it will not be taken again
 Good for loops
 Could use a single bit to indicate history of the previous result
 Need to somehow store this bit with each branch instruction
 Could use more bits to remember a more elaborate history
 Predict by Opcode
 Some types of branch instructions are more likely to result in a
jump than others (e.g. LOOP vs. JUMP)
 Can get up to 75% success
 Taken/Not taken switch – 1 bit branch predictor
 Based on previous history
 If a branch was taken last time, predict it will be taken again
 If a branch was not taken last time, predict it will not be taken again
 Good for loops
 Could use a single bit to indicate history of the previous result
 Need to somehow store this bit with each branch instruction
 Could use more bits to remember a more elaborate history
Performance Measures
• The most important measure of the performance of a
computer is how quickly it can execute program.
• The performance of a computer is affected by the design
of its hardware, the complier and its machine language
instructions, instruction set, implementation language etc.
• The computer user is always interested in reducing the
execution time.
• The execution time is also referred as response time.
• Reduction in response time increases the throughput.
• Throughput: the total amount of work done in a given time.
• The most important measure of the performance of a
computer is how quickly it can execute program.
• The performance of a computer is affected by the design
of its hardware, the complier and its machine language
instructions, instruction set, implementation language etc.
• The computer user is always interested in reducing the
execution time.
• The execution time is also referred as response time.
• Reduction in response time increases the throughput.
• Throughput: the total amount of work done in a given time.
1. The System Clock
• Processor circuits are controlled by a timing signal called,
a clock.
• The clock defines regular time intervals, called clock
cycles.
• To execute a machine instruction, the processor divides
the action to be performed into a sequence of basic steps,
such that each step can be completed in one clock cycles.
• The constant cycle time(in nanoseconds) is denoted by t.
• The clock rate is given by f=1/t which is measured in cycle
per second(CPS).
• The electrical unit for this measurement of CPS is
hertz(Hz).
• Processor circuits are controlled by a timing signal called,
a clock.
• The clock defines regular time intervals, called clock
cycles.
• To execute a machine instruction, the processor divides
the action to be performed into a sequence of basic steps,
such that each step can be completed in one clock cycles.
• The constant cycle time(in nanoseconds) is denoted by t.
• The clock rate is given by f=1/t which is measured in cycle
per second(CPS).
• The electrical unit for this measurement of CPS is
hertz(Hz).
2. Instruction Execution Rate
2. Instruction Execution Rate
3. Speedup
• Increase in speed due to parallel system compared to uni-
processor system
Where,
T(N) represents the execution time taken by program running on N processors.
and
T(1) represents time taken by best serial implementation of a program
measured on one processor.
4. Efficiency
Speedup(N) = speedup measured on N processors
5. Throughput (ωp)
6. Amdahl’s Law
• Gene Amdahl [AMDA67]
• Potential speed up of program using
multiple processors
• Concluded that:
—Code needs to be parallelizable
—Speed up is bound, giving diminishing returns
for more processors
• Task dependent
—Servers gain by maintaining multiple
connections on multiple processors
—Databases can be split into parallel tasks
• Gene Amdahl [AMDA67]
• Potential speed up of program using
multiple processors
• Concluded that:
—Code needs to be parallelizable
—Speed up is bound, giving diminishing returns
for more processors
• Task dependent
—Servers gain by maintaining multiple
connections on multiple processors
—Databases can be split into parallel tasks
6. Amdahl’s Law
6. Amdahl’s Law
• For program running on single processor
—Fraction f of code infinitely parallelizable with no
scheduling overhead
—Fraction (1-f) of code inherently serial
—T is total execution time for program on single processor
—N is number of processors that fully exploit parallel
portions of code
• Conclusions
—f small, parallel processors has little effect
—N ->∞, speedup bound by 1/(1 – f)
– Diminishing returns for using more processors
• For program running on single processor
—Fraction f of code infinitely parallelizable with no
scheduling overhead
—Fraction (1-f) of code inherently serial
—T is total execution time for program on single processor
—N is number of processors that fully exploit parallel
portions of code
• Conclusions
—f small, parallel processors has little effect
—N ->∞, speedup bound by 1/(1 – f)
– Diminishing returns for using more processors
6. Amdahl’s Law
Fig. Speed-up vs number of processors for Amdahl’s law
6. Amdahl’s Law Exercise
6. Amdahl’s Law Exercise (?)
Thank YouThank You

Contenu connexe

Tendances

Processor organization & register organization
Processor organization & register organizationProcessor organization & register organization
Processor organization & register organizationGhanshyam Patel
 
Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1) Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1) Subhasis Dash
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and ArchitectureDhaval Bagal
 
Control unit design
Control unit designControl unit design
Control unit designDhaval Bagal
 
Micro Programmed Control Unit
Micro Programmed Control UnitMicro Programmed Control Unit
Micro Programmed Control UnitKamal Acharya
 
Registers and-common-bus
Registers and-common-busRegisters and-common-bus
Registers and-common-busAnuj Modi
 
Computer architecture input output organization
Computer architecture input output organizationComputer architecture input output organization
Computer architecture input output organizationMazin Alwaaly
 
Register organization, stack
Register organization, stackRegister organization, stack
Register organization, stackAsif Iqbal
 
Input output organization
Input output organizationInput output organization
Input output organizationabdulugc
 
Instruction Formats
Instruction FormatsInstruction Formats
Instruction FormatsRaaviKapoor
 
Addressing mode Computer Architecture
Addressing mode  Computer ArchitectureAddressing mode  Computer Architecture
Addressing mode Computer ArchitectureHaris456
 

Tendances (20)

Micro program example
Micro program exampleMicro program example
Micro program example
 
Assembly Language
Assembly LanguageAssembly Language
Assembly Language
 
Processor organization & register organization
Processor organization & register organizationProcessor organization & register organization
Processor organization & register organization
 
Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1) Computer Organisation & Architecture (chapter 1)
Computer Organisation & Architecture (chapter 1)
 
Processor Organization and Architecture
Processor Organization and ArchitectureProcessor Organization and Architecture
Processor Organization and Architecture
 
Bus aribration
Bus aribrationBus aribration
Bus aribration
 
Control unit design
Control unit designControl unit design
Control unit design
 
ADDRESSING MODES
ADDRESSING MODESADDRESSING MODES
ADDRESSING MODES
 
Micro Programmed Control Unit
Micro Programmed Control UnitMicro Programmed Control Unit
Micro Programmed Control Unit
 
Registers and-common-bus
Registers and-common-busRegisters and-common-bus
Registers and-common-bus
 
Instruction cycle
Instruction cycleInstruction cycle
Instruction cycle
 
pipelining
pipeliningpipelining
pipelining
 
Computer architecture input output organization
Computer architecture input output organizationComputer architecture input output organization
Computer architecture input output organization
 
Register organization, stack
Register organization, stackRegister organization, stack
Register organization, stack
 
Computer Organization
Computer OrganizationComputer Organization
Computer Organization
 
Addressing sequencing
Addressing sequencingAddressing sequencing
Addressing sequencing
 
8086 micro processor
8086 micro processor8086 micro processor
8086 micro processor
 
Input output organization
Input output organizationInput output organization
Input output organization
 
Instruction Formats
Instruction FormatsInstruction Formats
Instruction Formats
 
Addressing mode Computer Architecture
Addressing mode  Computer ArchitectureAddressing mode  Computer Architecture
Addressing mode Computer Architecture
 

Similaire à Processor Organization and Architecture

Computer Organization and Design.pptx
Computer Organization and Design.pptxComputer Organization and Design.pptx
Computer Organization and Design.pptxSherinRappai
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberasodariyabhavesh
 
Computer organization
Computer organizationComputer organization
Computer organizationishapadhy
 
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdf
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdfPlease send the answers to my email. Mirre06@hotmail.comSomeone se.pdf
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdfebrahimbadushata00
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architectureTaha Malampatti
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerAmrutaMehata
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processorRAMPRAKASHT1
 
ICT-Lecture_12(VonNeumannArchitecture).pptx
ICT-Lecture_12(VonNeumannArchitecture).pptxICT-Lecture_12(VonNeumannArchitecture).pptx
ICT-Lecture_12(VonNeumannArchitecture).pptxSirRafiLectures
 
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002SOLOMONCHINAEMEUCHEA
 
Computer Organization and Architechuture basics
Computer Organization and Architechuture basicsComputer Organization and Architechuture basics
Computer Organization and Architechuture basicsLucky Sithole
 
DG M 4 ppt.pptx
DG M 4 ppt.pptxDG M 4 ppt.pptx
DG M 4 ppt.pptxJumana74
 

Similaire à Processor Organization and Architecture (20)

Mod 3.pptx
Mod 3.pptxMod 3.pptx
Mod 3.pptx
 
Computer Organization and Design.pptx
Computer Organization and Design.pptxComputer Organization and Design.pptx
Computer Organization and Design.pptx
 
esunit1.pptx
esunit1.pptxesunit1.pptx
esunit1.pptx
 
Chapter 8
Chapter 8Chapter 8
Chapter 8
 
CPU Architecture
CPU ArchitectureCPU Architecture
CPU Architecture
 
Various type of register
Various type of registerVarious type of register
Various type of register
 
Cpu & its execution of instruction
Cpu & its execution of instructionCpu & its execution of instruction
Cpu & its execution of instruction
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
Computer organization
Computer organizationComputer organization
Computer organization
 
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdf
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdfPlease send the answers to my email. Mirre06@hotmail.comSomeone se.pdf
Please send the answers to my email. Mirre06@hotmail.comSomeone se.pdf
 
Microprocessor
MicroprocessorMicroprocessor
Microprocessor
 
CAO.pptx
CAO.pptxCAO.pptx
CAO.pptx
 
The sunsparc architecture
The sunsparc architectureThe sunsparc architecture
The sunsparc architecture
 
Computer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and MicrocontrollerComputer Organization: Introduction to Microprocessor and Microcontroller
Computer Organization: Introduction to Microprocessor and Microcontroller
 
microprocessor
 microprocessor microprocessor
microprocessor
 
Introduction to arm processor
Introduction to arm processorIntroduction to arm processor
Introduction to arm processor
 
ICT-Lecture_12(VonNeumannArchitecture).pptx
ICT-Lecture_12(VonNeumannArchitecture).pptxICT-Lecture_12(VonNeumannArchitecture).pptx
ICT-Lecture_12(VonNeumannArchitecture).pptx
 
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002
HHCJ AMUMARA:COMPUTER STUDIES LECTURE NOTE FOR SS1:002
 
Computer Organization and Architechuture basics
Computer Organization and Architechuture basicsComputer Organization and Architechuture basics
Computer Organization and Architechuture basics
 
DG M 4 ppt.pptx
DG M 4 ppt.pptxDG M 4 ppt.pptx
DG M 4 ppt.pptx
 

Dernier

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfisabel213075
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHSneha Padhiar
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSneha Padhiar
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewsandhya757531
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsResearcher Researcher
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 

Dernier (20)

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
List of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdfList of Accredited Concrete Batching Plant.pdf
List of Accredited Concrete Batching Plant.pdf
 
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACHTEST CASE GENERATION GENERATION BLOCK BOX APPROACH
TEST CASE GENERATION GENERATION BLOCK BOX APPROACH
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATIONSOFTWARE ESTIMATION COCOMO AND FP CALCULATION
SOFTWARE ESTIMATION COCOMO AND FP CALCULATION
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
Artificial Intelligence in Power System overview
Artificial Intelligence in Power System overviewArtificial Intelligence in Power System overview
Artificial Intelligence in Power System overview
 
Novel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending ActuatorsNovel 3D-Printed Soft Linear and Bending Actuators
Novel 3D-Printed Soft Linear and Bending Actuators
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 

Processor Organization and Architecture

  • 2. Classification of Processors • Categorized by memory organization  Von-Neumann architecture  Harvard architecture • Categorized by instruction type  CISC  RISC  VLIW • Categorized by memory organization  Von-Neumann architecture  Harvard architecture • Categorized by instruction type  CISC  RISC  VLIW
  • 3. Von Neumann Model • In 1946, John von Neumann and his colleagues began the design of a new stored program computer referred to as the IAS (Institute for Advanced Study) computer. • Stores program and data in same memory. • It was designed to overcome the limitation of previous ENIAC computer. • The limitation of ENIAC computer:- The task of entering and altering programs for the ENIAC was extremely tedious. • In 1946, John von Neumann and his colleagues began the design of a new stored program computer referred to as the IAS (Institute for Advanced Study) computer. • Stores program and data in same memory. • It was designed to overcome the limitation of previous ENIAC computer. • The limitation of ENIAC computer:- The task of entering and altering programs for the ENIAC was extremely tedious.
  • 4. Structure of IAS computer
  • 5. Structure of IAS computer IAS consists of- • A main memory, which stores both data and instructions • An ALU capable of operating on binary data • A control unit, which interprets the instructions in memory and causes them to be executed • I/O equipment operated by the control unit IAS consists of- • A main memory, which stores both data and instructions • An ALU capable of operating on binary data • A control unit, which interprets the instructions in memory and causes them to be executed • I/O equipment operated by the control unit
  • 6.  IAS Memory Formats  The memory of IAS consists of 1000 storage locations, called words, of 40 binary digits(bits) each.  Both data and instructions are stored there.  Each number is represented by a sign bit and a 39-bit value. 0 1 39 Sign bit  A word may also contain two 20-bit instructions, with each instruction consisting of an 8-bit operation code(opcode) specifying the operation to be performed and a12-bit address designating one of the words in memory . Left Instruction Right Instruction 0 8 20 28 39 Opcode OpcodeAddress Address  IAS Memory Formats  The memory of IAS consists of 1000 storage locations, called words, of 40 binary digits(bits) each.  Both data and instructions are stored there.  Each number is represented by a sign bit and a 39-bit value. 0 1 39 Sign bit  A word may also contain two 20-bit instructions, with each instruction consisting of an 8-bit operation code(opcode) specifying the operation to be performed and a12-bit address designating one of the words in memory . Left Instruction Right Instruction 0 8 20 28 39
  • 7. Harvard Architecture • Physically separate storage and signal pathways for instructions and data. • Originated from the Harvard Mark I relay-based computer, which stored  Instructions on punched tape (24 bits wide)  Data in electro-mechanical counters • In some systems, instructions can be stored in read-only memory while data memory generally requires read-write memory. • In some systems, there is much more instruction memory than data memory. • Used in MCS-51, MIPS etc. • Physically separate storage and signal pathways for instructions and data. • Originated from the Harvard Mark I relay-based computer, which stored  Instructions on punched tape (24 bits wide)  Data in electro-mechanical counters • In some systems, instructions can be stored in read-only memory while data memory generally requires read-write memory. • In some systems, there is much more instruction memory than data memory. • Used in MCS-51, MIPS etc.
  • 9. Register Organization • CPU must have some working space (temporary storage) called registers. • A computer system employs a memory hierarchy. • At the highest level of hierarchy, memory is faster, smaller and more expensive. • Within the CPU, there is a set of registers which can be treated as a memory in the highest level of hierarchy. • CPU must have some working space (temporary storage) called registers. • A computer system employs a memory hierarchy. • At the highest level of hierarchy, memory is faster, smaller and more expensive. • Within the CPU, there is a set of registers which can be treated as a memory in the highest level of hierarchy.
  • 10. Register Organization • The registers in the CPU can be categorized into two groups 1. User-visible registers: – These enables the machine - or assembly-language programmer to minimize main memory reference by optimizing use of registers. 2. Control and status registers: – These are used by the control unit to control the operation of the CPU. – Operating system programs may also use these in privileged mode to control the execution of program. • The registers in the CPU can be categorized into two groups 1. User-visible registers: – These enables the machine - or assembly-language programmer to minimize main memory reference by optimizing use of registers. 2. Control and status registers: – These are used by the control unit to control the operation of the CPU. – Operating system programs may also use these in privileged mode to control the execution of program.
  • 11. User-visible registers • General Purpose • Data • Address • Condition Codes • General Purpose • Data • Address • Condition Codes
  • 12. 1. General Purpose Registers:  Used for a variety of functions by the programmer.  Sometimes used for holding operands(data) of an instruction.  Sometimes used for addressing functions (e.g., register indirect, displacement). 2. Data registers:  Used to hold only data.  Cannot be employed in the calculation of an operand address. 1. General Purpose Registers:  Used for a variety of functions by the programmer.  Sometimes used for holding operands(data) of an instruction.  Sometimes used for addressing functions (e.g., register indirect, displacement). 2. Data registers:  Used to hold only data.  Cannot be employed in the calculation of an operand address.
  • 13. 3. Address registers:  Used exclusively for the purpose of addressing.  Examples include the following: 1. Segment pointer: – In a machine with segment addressing, a segment register holds the address of the base of the segment. – There may be multiple registers, one for the code segment and one for the data segment. 2. Index registers: – These are used for indexed addressing and may be auto indexed. 3. Stack pointer: – A dedicated register that points to the top of the stack. – Auto incremented or auto decremented using PUSH or POP operation 3. Address registers:  Used exclusively for the purpose of addressing.  Examples include the following: 1. Segment pointer: – In a machine with segment addressing, a segment register holds the address of the base of the segment. – There may be multiple registers, one for the code segment and one for the data segment. 2. Index registers: – These are used for indexed addressing and may be auto indexed. 3. Stack pointer: – A dedicated register that points to the top of the stack. – Auto incremented or auto decremented using PUSH or POP operation
  • 14. 4. Condition Codes Registers:  Sets of individual bits • e.g. result of last operation was zero  Can be read (implicitly) by programs • e.g. Jump if zero  Can not (usually) be set by programs
  • 15. Control and status registers • Four registers are essential to instruction execution: 1. Program Counter (PC): – Contains the address of an instruction to be fetched. 2. Instruction Register (IR): – Contains the instruction most recently fetched. 3. Memory Address Register (MAR): – Contains the address of a location of main memory from where information has to be fetched or information has to be stored. 4. Memory Buffer Register (MBR): – Contains a word of data to be written to memory or the word most recently read. • Four registers are essential to instruction execution: 1. Program Counter (PC): – Contains the address of an instruction to be fetched. 2. Instruction Register (IR): – Contains the instruction most recently fetched. 3. Memory Address Register (MAR): – Contains the address of a location of main memory from where information has to be fetched or information has to be stored. 4. Memory Buffer Register (MBR): – Contains a word of data to be written to memory or the word most recently read.
  • 16.  Program Status Word (PSW)  Condition code bits are collected into one or more registers, known as the program status word (PSW), that contains status information.  Common fields or flags include the following: • Sign: Contains the sign bit of the result of the last arithmetic operation. • Zero: Set when the result is zero. • Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high order bit. • Equal: Set if a logical compare result is equal. • Overflow: Used to indicate arithmetic overflow. • Interrupt enable/disable: Used to enable or disable interrupts. Control and status registers  Program Status Word (PSW)  Condition code bits are collected into one or more registers, known as the program status word (PSW), that contains status information.  Common fields or flags include the following: • Sign: Contains the sign bit of the result of the last arithmetic operation. • Zero: Set when the result is zero. • Carry: Set if an operation resulted in a carry (addition) into or borrow (subtraction) out of a high order bit. • Equal: Set if a logical compare result is equal. • Overflow: Used to indicate arithmetic overflow. • Interrupt enable/disable: Used to enable or disable interrupts.
  • 17. Register organization of INTEL 8086 processor
  • 18. Register organization of INTEL 8086 processor • 16-bit flags, Instruction Pointer • General Registers, 16 bits  AX – Accumulator, favored in calculations  BX – Base, normally holds an address of a variable or func  CX – Count, normally used for loops  DX – Data, normally used for multiply/divide • Segment, 16 bits  SS – Stack, base segment of stack in memory  CS – Code, base location of code  DS – Data, base location of variable data  ES – Extra, additional location for memory data • 16-bit flags, Instruction Pointer • General Registers, 16 bits  AX – Accumulator, favored in calculations  BX – Base, normally holds an address of a variable or func  CX – Count, normally used for loops  DX – Data, normally used for multiply/divide • Segment, 16 bits  SS – Stack, base segment of stack in memory  CS – Code, base location of code  DS – Data, base location of variable data  ES – Extra, additional location for memory data
  • 19. Register organization of INTEL 8086 processor • Index, 16 bits  BP – Base Pointer, offset from SS for locating subroutines  SP – Stack Pointer, offset from SS for top of stack  SI – Source Index, used for copying data/strings  DI – Destination Index, used for copy data/strings • Index, 16 bits  BP – Base Pointer, offset from SS for locating subroutines  SP – Stack Pointer, offset from SS for top of stack  SI – Source Index, used for copying data/strings  DI – Destination Index, used for copy data/strings
  • 20. INSTRUCTION FORMAT • The operation of the computer system are determined by the instructions executed by the central processing unit. • These instructions are known as machine instruction and are in the form of binary codes. • Each instruction of the CPU has specific information field which are required to execute it. • These information field of instructions are called elements of instruction. • The operation of the computer system are determined by the instructions executed by the central processing unit. • These instructions are known as machine instruction and are in the form of binary codes. • Each instruction of the CPU has specific information field which are required to execute it. • These information field of instructions are called elements of instruction.
  • 21. Elements of Instruction 1. Operation Code: Binary code that specifies which operation to be performed. 2. Source operand address: Specifies one or more source operands 3. Destination operand address: The operation executed by the CPU may produce result which is stored in the destination address. 4. Next instruction address: Tells the CPU from where to fetch the next instruction after completion of execution of current instruction. 1. Operation Code: Binary code that specifies which operation to be performed. 2. Source operand address: Specifies one or more source operands 3. Destination operand address: The operation executed by the CPU may produce result which is stored in the destination address. 4. Next instruction address: Tells the CPU from where to fetch the next instruction after completion of execution of current instruction.
  • 22. Representation of Instruction OpcodeOpcode Operand address1Operand address1 Operand address2Operand address2OpcodeOpcode Operand address1Operand address1 Operand address2Operand address2
  • 23. Instruction Types According to Number of Addresses
  • 27. Zero Address Instruction • The location of the operands are defined implicitly • For implicit reference, a processor register is used and it is termed as accumulator(AC). • E.g. CMA //complements the content of accumulator • i.e. AC AC • The location of the operands are defined implicitly • For implicit reference, a processor register is used and it is termed as accumulator(AC). • E.g. CMA //complements the content of accumulator • i.e. AC AC
  • 28. Instruction Format Design Issues: • An instruction consists of an opcode and one or more operands, implicitly or explicitly. • Each explicit operand is referenced using one of the addressing mode that is available for that machine. • An instruction format is used to define the layout of the bits allocated to these elements of instructions. • Some of issues effecting instruction design are: 1. Instruction Length 2. Allocation of bits for different fields in an instruction 3. Variable length instruction • An instruction consists of an opcode and one or more operands, implicitly or explicitly. • Each explicit operand is referenced using one of the addressing mode that is available for that machine. • An instruction format is used to define the layout of the bits allocated to these elements of instructions. • Some of issues effecting instruction design are: 1. Instruction Length 2. Allocation of bits for different fields in an instruction 3. Variable length instruction
  • 29. 1. Instruction Length • A longer instruction means more time in fetching an instruction. • For e.g. an instruction of length 32 bit on a machine with word size of 16 bit will need two memory fetch to bring the instruction. • Programmer desires: – More opcode and operands in a instruction as it will reduce the program length. – More addressing mode for greater flexibility in accessing various types of data. • A longer instruction means more time in fetching an instruction. • For e.g. an instruction of length 32 bit on a machine with word size of 16 bit will need two memory fetch to bring the instruction. • Programmer desires: – More opcode and operands in a instruction as it will reduce the program length. – More addressing mode for greater flexibility in accessing various types of data.
  • 30. Factors for deciding the instruction length: A. Memory Size – More bits are required in address field to access larger memory range. B. Memory Organization – If the system supports virtual memory then memory range is larger than the physical memory. Hence required the more number of addressing bits. C. Bus Structure – The instruction length should be equal to data bus length or multiple of it. D. Processor Speed – The data transfer rate from the memory should be equal to the processor speed. A. Memory Size – More bits are required in address field to access larger memory range. B. Memory Organization – If the system supports virtual memory then memory range is larger than the physical memory. Hence required the more number of addressing bits. C. Bus Structure – The instruction length should be equal to data bus length or multiple of it. D. Processor Speed – The data transfer rate from the memory should be equal to the processor speed.
  • 31. 2. Allocation of Bits • More opcodes obviously mean more bits in the opcode field. • Factors which are considered for selection of addressing bits are: A. Number of Addressing modes: • More addressing modes, more bits will be needed. B. Number of Operands: • More operands – more number of bits needed C. Register versus memory: • If more and more registers can be used for operand reference then the fewer bits are needed • As number of register are far less than memory size. • More opcodes obviously mean more bits in the opcode field. • Factors which are considered for selection of addressing bits are: A. Number of Addressing modes: • More addressing modes, more bits will be needed. B. Number of Operands: • More operands – more number of bits needed C. Register versus memory: • If more and more registers can be used for operand reference then the fewer bits are needed • As number of register are far less than memory size.
  • 32. D. Number of Register Sets: • Assume that A machine has 16 general purpose registers, a register address require 4 bits. • However if these 16 registers are divided into two groups, then one of the 8 register of a group will need 3 bits for register addressing. E. Address Range: • The range of addresses that can be referenced is related to the number of address bits. • With displacement addressing, the range is opened up to the length of the address register. F. Address Granularity: • In a system with 16- or 32-bit words, an address can reference a word or a byte at the designer's choice. 2. Allocation of Bits D. Number of Register Sets: • Assume that A machine has 16 general purpose registers, a register address require 4 bits. • However if these 16 registers are divided into two groups, then one of the 8 register of a group will need 3 bits for register addressing. E. Address Range: • The range of addresses that can be referenced is related to the number of address bits. • With displacement addressing, the range is opened up to the length of the address register. F. Address Granularity: • In a system with 16- or 32-bit words, an address can reference a word or a byte at the designer's choice.
  • 33. 3. Variable length Instruction • Instead of looking for fixed length instruction format, designer may choose to provide a variety of instructions formats of different lengths. • Addressing can be more flexible, with various combinations of register and memory references plus addressing modes. • Disadvantage: an increase in the complexity of the CPU. • Instead of looking for fixed length instruction format, designer may choose to provide a variety of instructions formats of different lengths. • Addressing can be more flexible, with various combinations of register and memory references plus addressing modes. • Disadvantage: an increase in the complexity of the CPU.
  • 34. Concept of Program Execution • The instructions constituting a program to be executed by a computer are loaded in sequential locations in its main memory. • Processor fetches one instruction at a time and perform the operation specified. • Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered. • Processor keeps track of the address of the memory location containing the next instruction to be fetched using Program Counter (PC). • Instruction Register (IR) • The instructions constituting a program to be executed by a computer are loaded in sequential locations in its main memory. • Processor fetches one instruction at a time and perform the operation specified. • Instructions are fetched from successive memory locations until a branch or a jump instruction is encountered. • Processor keeps track of the address of the memory location containing the next instruction to be fetched using Program Counter (PC). • Instruction Register (IR)
  • 35. Executing an Instruction 1. Fetch the contents of the memory location pointed to by the PC. The contents of this location are loaded into the IR (fetch phase). IR ← [[PC]] 2. Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase). PC ← [PC] + 4 3. Carry out the actions specified by the instruction in the IR (execution phase). 1. Fetch the contents of the memory location pointed to by the PC. The contents of this location are loaded into the IR (fetch phase). IR ← [[PC]] 2. Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch phase). PC ← [PC] + 4 3. Carry out the actions specified by the instruction in the IR (execution phase).
  • 36. Addressing Modes • The term addressing mode refers to the mechanism employed for specifying operands. • An operand can be specified as part of the instruction or reference of the memory locations can be given. • An operand could also be an address of CPU register. • The most common addressing techniques are:  Immediate  Direct  Indirect  Register  Register Indirect  Displacement  Stack • The term addressing mode refers to the mechanism employed for specifying operands. • An operand can be specified as part of the instruction or reference of the memory locations can be given. • An operand could also be an address of CPU register. • The most common addressing techniques are:  Immediate  Direct  Indirect  Register  Register Indirect  Displacement  Stack
  • 37. Addressing Modes To explain the addressing modes, we use the following notation: A=contents of an address field in the instruction that refers to a memory R=contents of an address field in the instruction that refers to a register EA=actual (effective) address of the location containing the referenced operand (X)=contents of memory location X or register X To explain the addressing modes, we use the following notation: A=contents of an address field in the instruction that refers to a memory R=contents of an address field in the instruction that refers to a register EA=actual (effective) address of the location containing the referenced operand (X)=contents of memory location X or register X
  • 38. 1. Immediate Addressing: • The operand is actually present in the instruction • OPERAND = A • This mode can be used to define and use constants or set initial values of variables. • The advantage of immediate addressing is that no memory reference other than the instruction fetch is required to obtain the operand. • e.g. MOVE R0,300 • The operand is actually present in the instruction • OPERAND = A • This mode can be used to define and use constants or set initial values of variables. • The advantage of immediate addressing is that no memory reference other than the instruction fetch is required to obtain the operand. • e.g. MOVE R0,300 • The operand is actually present in the instruction • OPERAND = A • This mode can be used to define and use constants or set initial values of variables. • The advantage of immediate addressing is that no memory reference other than the instruction fetch is required to obtain the operand. • e.g. MOVE R0,300 • The operand is actually present in the instruction • OPERAND = A • This mode can be used to define and use constants or set initial values of variables. • The advantage of immediate addressing is that no memory reference other than the instruction fetch is required to obtain the operand. • e.g. MOVE R0,300 Immediate addressing
  • 39. 2. Direct Addressing • The address field contains the effective address of the operand: EA= A • It requires only one memory reference and no special calculation. • Here, 'A' indicates the memory address field for the operand. • e.g. MOVE R1, 1001 • The address field contains the effective address of the operand: EA= A • It requires only one memory reference and no special calculation. • Here, 'A' indicates the memory address field for the operand. • e.g. MOVE R1, 1001 Direct addressing
  • 40. 3. Indirect Addressing • The effective address of the operand is stored in the memory and the instruction contains the address of the memory containing the address of the data. • This is know as indirect addressing: EA = (A) • Here 'A' indicates the memory address field of the required Operands. • E.g. MOVE R0,(1000) • The effective address of the operand is stored in the memory and the instruction contains the address of the memory containing the address of the data. • This is know as indirect addressing: EA = (A) • Here 'A' indicates the memory address field of the required Operands. • E.g. MOVE R0,(1000) Indirect addressing
  • 41. 4. Register Addressing • The instruction specifies the address of the register containing the operand. • The instruction contains the name of the a CPU register. EA =R indicates a register where the operand is present. • E.g. MOVE R1, 1010 • The instruction specifies the address of the register containing the operand. • The instruction contains the name of the a CPU register. EA =R indicates a register where the operand is present. • E.g. MOVE R1, 1010 Register addressing
  • 42. 5. Register Indirect Addressing • The effective address of the operand is stored in a register and instruction contains the address of the register containing the address of the data. • EA = (R) • Here 'R' indicates the memory address field of the required Operands. • E.g. MOVE R0,(R1) • The effective address of the operand is stored in a register and instruction contains the address of the register containing the address of the data. • EA = (R) • Here 'R' indicates the memory address field of the required Operands. • E.g. MOVE R0,(R1) Register addressing Register indirect addressing
  • 43. 6. Displacement Addressing • A combination of both direct addressing and register indirect addressing modes. EA = A + (R) • The value contained in one address field (value = A) is used directly. The other address field refers to a register whose contents are added to A to produce the effective address. • A combination of both direct addressing and register indirect addressing modes. EA = A + (R) • The value contained in one address field (value = A) is used directly. The other address field refers to a register whose contents are added to A to produce the effective address.
  • 44. 6. Displacement Addressing Three of the most common use of displacement addressing are: Relative addressing Base-register addressing Indexing Three of the most common use of displacement addressing are: Relative addressing Base-register addressing Indexing
  • 45. Relative addressing • For relative addressing, the implicitly referenced register is the program counter (PC). • The current instruction address is added to the address field to produce the EA. • Thus, the effective address is a displacement relative to the address of the instruction. • e.g. 1001 JC X1 1050 X1: ADD R1,5 • X1=address of the target instruction-address of the current instruction =1050-1001=49 • For relative addressing, the implicitly referenced register is the program counter (PC). • The current instruction address is added to the address field to produce the EA. • Thus, the effective address is a displacement relative to the address of the instruction. • e.g. 1001 JC X1 1050 X1: ADD R1,5 • X1=address of the target instruction-address of the current instruction =1050-1001=49
  • 46. Base Register Addressing • The base register(reference register) contains a memory address, and the address field contains a displacement from that base address specified by the base register. EA=A+(B)
  • 47. Indexing or Indexed Addressing • Used to access elements an array which are stored in consecutive location of memory. EA = A+(R) • Address field A gives main memory address and R contains positive displacement with respect to base address. • The displacement can be specified either directly in the instruction or through another registers. • E.g. MOVE R1, (BR+5) MOVE R0,(BR+R1) • Used to access elements an array which are stored in consecutive location of memory. EA = A+(R) • Address field A gives main memory address and R contains positive displacement with respect to base address. • The displacement can be specified either directly in the instruction or through another registers. • E.g. MOVE R1, (BR+5) MOVE R0,(BR+R1) Starting address Starting addressOffset(index) Offset(index)
  • 48. Auto Indexing • Generally index register are used for iterative tasks, it is typical that there is a need to increment or decrement the index register after each reference to it. • Because this is such a common operation, some system will automatically do this as part of the same instruction cycle. • This is known as auto-indexing. • Two types of auto-indexing 1. auto-incrementing 2. auto-decrementing. • Generally index register are used for iterative tasks, it is typical that there is a need to increment or decrement the index register after each reference to it. • Because this is such a common operation, some system will automatically do this as part of the same instruction cycle. • This is known as auto-indexing. • Two types of auto-indexing 1. auto-incrementing 2. auto-decrementing.
  • 49. a. Auto Increment Mode • If register R contains the address of the operand • After accessing the operand, the contents of register R is incremented to point to the next item in the list. • Auto-indexing using increment can be depicted as follows: EA = A + (R) or EA = (R)+ (R) = (R) + 1 • E.g. MOVE R1,1010 /*starting Memory location 1010 is stored in R1*/ • ADD AC,(R1)+ /*contents of 1010 ML are added to AC and the contents of R1 is incremented by 1*/ • If register R contains the address of the operand • After accessing the operand, the contents of register R is incremented to point to the next item in the list. • Auto-indexing using increment can be depicted as follows: EA = A + (R) or EA = (R)+ (R) = (R) + 1 • E.g. MOVE R1,1010 /*starting Memory location 1010 is stored in R1*/ • ADD AC,(R1)+ /*contents of 1010 ML are added to AC and the contents of R1 is incremented by 1*/
  • 50. b. Auto Decrement Mode • The contents of register specified in the instruction are decremented and these contents are then used as the effective address of the operand. • Auto-indexing using decrement can be depicted as follows: EA = A – (R) or EA = –(R) (R) = (R) – 1 • The contents of the register are to be decremented before used as the effective address. • E.g. ADD R1,-(R2) • The contents of register specified in the instruction are decremented and these contents are then used as the effective address of the operand. • Auto-indexing using decrement can be depicted as follows: EA = A – (R) or EA = –(R) (R) = (R) – 1 • The contents of the register are to be decremented before used as the effective address. • E.g. ADD R1,-(R2)
  • 51. 7. Stack Addressing • A stack is a linear array or list of locations. • Sometimes referred to as a pushdown list or last-in- first-out queue. • Associated with the stack is a pointer whose value is the address of the top of the stack. • The stack pointer is maintained in a register. Thus, references to stack locations in memory are in fact register indirect addresses. • The stack mode of addressing is a form of implied addressing. • E.g. PUSH and POP • A stack is a linear array or list of locations. • Sometimes referred to as a pushdown list or last-in- first-out queue. • Associated with the stack is a pointer whose value is the address of the top of the stack. • The stack pointer is maintained in a register. Thus, references to stack locations in memory are in fact register indirect addresses. • The stack mode of addressing is a form of implied addressing. • E.g. PUSH and POP
  • 52. Basic Instruction Cycle • Fetch cycle basically involves read the next instruction from the memory into the CPU and along with that update the contents of the program counter. • In the execution phase, it interprets the opcode and perform the indicated operation. • The instruction fetch and execution phase together known as instruction cycle.
  • 53. Basic Instruction Cycle with Interrupt An instruction cycle includes the following sub cycles: 1. Fetch: Read the next instruction from memory into the processor. 2. Execute: Interpret the opcode and perform the indicated operation. 3. Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process state and service the interrupt. An instruction cycle includes the following sub cycles: 1. Fetch: Read the next instruction from memory into the processor. 2. Execute: Interpret the opcode and perform the indicated operation. 3. Interrupt: If interrupts are enabled and an interrupt has occurred, save the current process state and service the interrupt.
  • 54. Basic Instruction Cycle with Interrupt
  • 55. The Indirect Cycle • The execution of an instruction may involve one or more operands in memory, each of which requires a memory access. • Further, if indirect addressing is used, then additional memory accesses are required. • For fetching the indirect addresses as one more instructions subcycle are required. • After an instruction is fetched, it is examined to determine if any indirect addressing is involved. If so, the required operands are fetched using indirect addressing. • The execution of an instruction may involve one or more operands in memory, each of which requires a memory access. • Further, if indirect addressing is used, then additional memory accesses are required. • For fetching the indirect addresses as one more instructions subcycle are required. • After an instruction is fetched, it is examined to determine if any indirect addressing is involved. If so, the required operands are fetched using indirect addressing.
  • 58. • Instruction address calculation (iac): Determine the address of the next instruction to be executed. Usually, this involves adding a fixed number to the address of the previous instruction. • Instruction fetch (if): Read instruction from its memory location into the processor. • Instruction operation decoding (iod): Analyze instruction to determine type of operation to be performed and operand(s) to be used. • Operand address calculation (oac): If the operation involves reference to an operand in memory or available via I/O, then determine the address of the operand. • Operand fetch (of): Fetch the operand from memory or read it in from I/O. • Data operation (do): Perform the operation indicated in the instruction. • Operand store (os): Write the result into memory or out to I/O. • Instruction address calculation (iac): Determine the address of the next instruction to be executed. Usually, this involves adding a fixed number to the address of the previous instruction. • Instruction fetch (if): Read instruction from its memory location into the processor. • Instruction operation decoding (iod): Analyze instruction to determine type of operation to be performed and operand(s) to be used. • Operand address calculation (oac): If the operation involves reference to an operand in memory or available via I/O, then determine the address of the operand. • Operand fetch (of): Fetch the operand from memory or read it in from I/O. • Data operation (do): Perform the operation indicated in the instruction. • Operand store (os): Write the result into memory or out to I/O.
  • 59. Instruction Interpretation and Sequencing • Every processor has some basic type of instructions like data transfer instruction, arithmetic and logical instruction, branch instruction and so on. • To perform a particular task on the computer it is programmers job to select and write appropriate instructions one after the other. This job of programmer is known as instruction sequencing. • Two types: 1. Straight line sequencing 2. Branch instructon • Every processor has some basic type of instructions like data transfer instruction, arithmetic and logical instruction, branch instruction and so on. • To perform a particular task on the computer it is programmers job to select and write appropriate instructions one after the other. This job of programmer is known as instruction sequencing. • Two types: 1. Straight line sequencing 2. Branch instructon
  • 60. 1.Straight line sequencing • Processor executes a program with the help of Program Counter(PC) which holds the address of the next instruction to be executed. • To begin execution of a program, the address of the its first instruction is placed into the PC. • The processor control circuit fetches instruction from the memory address specified by the PC and executes instruction, one at a time. • At the same time the content of PC is incremented so as to point to the address of next instruction. • This is called as straight line sequencing. • Processor executes a program with the help of Program Counter(PC) which holds the address of the next instruction to be executed. • To begin execution of a program, the address of the its first instruction is placed into the PC. • The processor control circuit fetches instruction from the memory address specified by the PC and executes instruction, one at a time. • At the same time the content of PC is incremented so as to point to the address of next instruction. • This is called as straight line sequencing.
  • 61. 2. Branch Instruction • After executing decision making instruction, processor have to follow one of the two program sequence. • Branch instruction transfer the program control from one straight line sequence to another straight line sequence instruction. • In branch instruction, the new address called target address or branch target is loaded into PC and instruction is fetched from the new address. • After executing decision making instruction, processor have to follow one of the two program sequence. • Branch instruction transfer the program control from one straight line sequence to another straight line sequence instruction. • In branch instruction, the new address called target address or branch target is loaded into PC and instruction is fetched from the new address.
  • 62. Figure 7.1. Single-bus organization of the datapath inside a processor.
  • 63. Register Transfers Yin R iin R i R iout bus Internal processor BA Z ALU Y Z in Z out Constant 4 MUX Figure 7.2. Input and output gating for the registers in Figure 7.1. Select
  • 64. Register Transfers • The data transfer between registers and common bus is shown by a line with arrow heads. • But in actual practice each register has input and output gating and these gates are controlled by corresponding control signal. • Riin and Riout : control signals for input and output gating of register Ri. • When Riin=1, the data available on the common data bus is loaded into register Ri. • When Riout=1, the contents of Ri are placed on the common data bus. • The signals Riin and Riout are commonly known as input enable and output enable signals of registers, respectively. • The data transfer between registers and common bus is shown by a line with arrow heads. • But in actual practice each register has input and output gating and these gates are controlled by corresponding control signal. • Riin and Riout : control signals for input and output gating of register Ri. • When Riin=1, the data available on the common data bus is loaded into register Ri. • When Riout=1, the contents of Ri are placed on the common data bus. • The signals Riin and Riout are commonly known as input enable and output enable signals of registers, respectively.
  • 65. Register Transfers • Consider the transfer of data from register R1 to R2. This can be done as follows: 1. Activate R1out=1, this signal places the contents of register R1 on the common bus. 2. Activate R2in=1, this loads data from the common data bus into the register R2. • Consider the transfer of data from register R1 to R2. This can be done as follows: 1. Activate R1out=1, this signal places the contents of register R1 on the common bus. 2. Activate R2in=1, this loads data from the common data bus into the register R2.
  • 66. Micro Operations • The primary function of a processor unit is to execute sequence of instructions stored in a memory. • The sequence of operation involved in processing an instruction constitutes an instruction cycle, which can be divided into three major phases: fetch cycle, decode cycle and execute cycle. • To perform fetch, decode and execute cycle the processor unit has to perform set of operations called Micro-operations . • The primary function of a processor unit is to execute sequence of instructions stored in a memory. • The sequence of operation involved in processing an instruction constitutes an instruction cycle, which can be divided into three major phases: fetch cycle, decode cycle and execute cycle. • To perform fetch, decode and execute cycle the processor unit has to perform set of operations called Micro-operations .
  • 67. Micro-operation includes • Transfer a word of data from one CPU register to another or to the ALU. • Perform an arithmetic or a logic operation on the data from CPU register and store the result in a CPU register. • Fetch the contents of a given memory location and load them into a CPU register. • Store a word of data from a CPU register into a given memory location. • Transfer a word of data from one CPU register to another or to the ALU. • Perform an arithmetic or a logic operation on the data from CPU register and store the result in a CPU register. • Fetch the contents of a given memory location and load them into a CPU register. • Store a word of data from a CPU register into a given memory location.
  • 68. 1. Performing an Arithmetic or Logic Operation • The ALU is a combinational circuit that has no internal storage. • ALU gets the two operands from MUX and bus. The result is temporarily stored in register Z. • What is the sequence of operations to add the contents of register R1 to those of R2 and store the result in R3? Control Signals: 1. R1out, Yin 2. R2out, Select Y, Add, Zin 3. Zout, R3in • The ALU is a combinational circuit that has no internal storage. • ALU gets the two operands from MUX and bus. The result is temporarily stored in register Z. • What is the sequence of operations to add the contents of register R1 to those of R2 and store the result in R3? Control Signals: 1. R1out, Yin 2. R2out, Select Y, Add, Zin 3. Zout, R3in
  • 69. • The Processor loads required Address into MAR; at the same time it issue Read signal • When the requested data is recieved from the memory it is stored into MDR. 2. Fetching a Word from Memory
  • 70. 2. Fetching a Word from Memory • Consider the instruction MOVE R2,(R1) • The processor waits until it receives an indication that the requested operation has been completed (Memory- Function-Completed, MFC). • The actions needed to execute this instruction are:  MAR ← [R1]  Activate the read control signal to perform Read operation  If memory is slow, activate wait for memory function complete (WMFC)  Load MDR from the memory bus  R2 ← [MDR] • Consider the instruction MOVE R2,(R1) • The processor waits until it receives an indication that the requested operation has been completed (Memory- Function-Completed, MFC). • The actions needed to execute this instruction are:  MAR ← [R1]  Activate the read control signal to perform Read operation  If memory is slow, activate wait for memory function complete (WMFC)  Load MDR from the memory bus  R2 ← [MDR]
  • 71. 2. Fetching a Word from Memory • The various control signals which are necessary to activate to perform given action in each step: Control Signals: 1. R1out, MARin, Read 2. WMFC 3. MDRout, R2in • The various control signals which are necessary to activate to perform given action in each step: Control Signals: 1. R1out, MARin, Read 2. WMFC 3. MDRout, R2in
  • 72. 3. Storing A Word In Memory:  To write a word of data into a memory location processor has to load the address of the desired memory location in the MAR, load the data to be written in memory in MDR and activate Write control signal.  Consider the instruction MOVE (R2),R1.  The actions needed to execute this instruction are:  MAR ← [R2]  MDR ← [R1]  Activate the write control signal to perform write operation  If memory is slow, activate wait for memory function complete (WMFC)  To write a word of data into a memory location processor has to load the address of the desired memory location in the MAR, load the data to be written in memory in MDR and activate Write control signal.  Consider the instruction MOVE (R2),R1.  The actions needed to execute this instruction are:  MAR ← [R2]  MDR ← [R1]  Activate the write control signal to perform write operation  If memory is slow, activate wait for memory function complete (WMFC)
  • 73. 3. Storing A Word In Memory:  The various control signals which are necessary to activate to perform given action in each step: Control Signals: 1. R2out, MARin 2. R1out, MDRin, Write 3. WMFC  The various control signals which are necessary to activate to perform given action in each step: Control Signals: 1. R2out, MARin 2. R1out, MDRin, Write 3. WMFC
  • 74. Execution of a Complete Instruction • ADD R1,(R2) This instruction adds the contents of register R1 and the content of memory location specified by register R2 and store result in register R1 1.Fetch the instruction 2.Fetch the first operand (the contents of the memory location pointed to by R2) 3.Perform the addition 4.Load the result into R1 • ADD R1,(R2) This instruction adds the contents of register R1 and the content of memory location specified by register R2 and store result in register R1 1.Fetch the instruction 2.Fetch the first operand (the contents of the memory location pointed to by R2) 3.Perform the addition 4.Load the result into R1
  • 75. Execution of a Complete Instruction 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R2out, MARin, Read 5. MDRout, Yin, WMFC 6. R1out, Select Y, Add, Zin 7. Zout, R1in, End ADD R1,(R2) Micro Instructions: Control Signals: Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch(4): MAR ← R2 MDR ← M(MAR) Execute Cycle(5-7): Y ← MDR Z ← R1+Y R1 ← Z 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R2out, MARin, Read 5. MDRout, Yin, WMFC 6. R1out, Select Y, Add, Zin 7. Zout, R1in, End Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch(4): MAR ← R2 MDR ← M(MAR) Execute Cycle(5-7): Y ← MDR Z ← R1+Y R1 ← Z
  • 76. Execution of Branch Instructions • A branch instruction replaces the contents of PC with the branch target address, which is usually obtained by adding an offset X given in the branch instruction. • The offset X is usually the difference between the branch target address and the address immediately following the branch instruction. • A branch instruction replaces the contents of PC with the branch target address, which is usually obtained by adding an offset X given in the branch instruction. • The offset X is usually the difference between the branch target address and the address immediately following the branch instruction.
  • 77. Execution of Branch Instructions The control sequence for unconditional branch instruction is as follows: 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin The first three steps constitute the opcode fetch operation 4. Offset_field_of _IRout, Select Y, Add, Zin The contents of PC and the offset field of IR register are added and result is stored in register Z. 5. Zout, PCin, End The control sequence for unconditional branch instruction is as follows: 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin The first three steps constitute the opcode fetch operation 4. Offset_field_of _IRout, Select Y, Add, Zin The contents of PC and the offset field of IR register are added and result is stored in register Z. 5. Zout, PCin, End
  • 78. Execution of Branch Instructions  In case of conditional branch instruction the status of the condition code specified in the instruction is checked.  If the status specified within the instruction matches with the current status of condition codes, the branch target address is loaded in the PC by adding the offset specified in the instruction to the contents of PC  Otherwise processor fetches the next instruction in the sequence.  In case of conditional branch instruction the status of the condition code specified in the instruction is checked.  If the status specified within the instruction matches with the current status of condition codes, the branch target address is loaded in the PC by adding the offset specified in the instruction to the contents of PC  Otherwise processor fetches the next instruction in the sequence.
  • 79. Quiz • Write control sequence with micro-program for ADD R1,R2 including the instruction fetch phase? (Assume single bus architecture) • Write control sequence for SUB (R3)+,R1 Hint: R1 = R1 – (R3) R3 = R3 + 1 lines Data Address lines bus Memory Carry-in ALU PC MAR MDR Y Z Add XOR Sub bus IR TEMP R0 control ALU lines Control signals R n 1-( ) Instruction decoder and Internal processor control logic A B Figure 7.1. Single-bus organization of the datapath inside a processor. MUXSelect Constant 4 • Write control sequence with micro-program for ADD R1,R2 including the instruction fetch phase? (Assume single bus architecture) • Write control sequence for SUB (R3)+,R1 Hint: R1 = R1 – (R3) R3 = R3 + 1 lines Data Address lines bus Memory Carry-in ALU PC MAR MDR Y Z Add XOR Sub bus IR TEMP R0 control ALU lines Control signals R n 1-( ) Instruction decoder and Internal processor control logic A B Figure 7.1. Single-bus organization of the datapath inside a processor. MUXSelect Constant 4
  • 80. ADD R1,R2 Control Signals: 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R1out, Yin, R2out, Select Y, Add, Zin 5. Zout, R1in, End Micro Instructions: Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch: Not required Execute Cycle(4,5): Y ← R1 Z ← R2+Y R1 ← Z Control Signals: 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R1out, Yin, R2out, Select Y, Add, Zin 5. Zout, R1in, End Micro Instructions: Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch: Not required Execute Cycle(4,5): Y ← R1 Z ← R2+Y R1 ← Z
  • 81. SUB (R3)+,R1 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R3out, MARin, Read 5. MDRout, Yin, WMFC 6. R1out, Select Y, Sub, Zin 7. Zout, R1in 8. R3out, Select4, Add, Zin 9. Zout, R3in, End Micro Instructions: Control Signals: Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch(4): MAR ← R3 MDR ← M(MAR) Execute Cycle(5-9): Y ← MDR Z ← R1-Y R1 ← Z R3 ← R3+1 1. PCout, MARin, Read, Select4, Add, Zin 2. Zout, PCin, Yin, WMFC 3. MDRout, IRin 4. R3out, MARin, Read 5. MDRout, Yin, WMFC 6. R1out, Select Y, Sub, Zin 7. Zout, R1in 8. R3out, Select4, Add, Zin 9. Zout, R3in, End Instruction Fetch(1-3): MAR ← PC MDR ← M(MAR) PC ← PC+1 IR ← MDR (opcode) Operand Fetch(4): MAR ← R3 MDR ← M(MAR) Execute Cycle(5-9): Y ← MDR Z ← R1-Y R1 ← Z R3 ← R3+1
  • 82. Arithmetic Logic Unit (ALU)  In this and the next section, we deal with detailed design of typical ALUs and shifters  Decompose the ALU into: • An arithmetic circuit • A logic circuit • A selector to pick between the two circuits  Arithmetic circuit design • Decompose the arithmetic circuit into:  An n-bit parallel adder  A block of logic that selects four choices for the B input to the adder  In this and the next section, we deal with detailed design of typical ALUs and shifters  Decompose the ALU into: • An arithmetic circuit • A logic circuit • A selector to pick between the two circuits  Arithmetic circuit design • Decompose the arithmetic circuit into:  An n-bit parallel adder  A block of logic that selects four choices for the B input to the adder
  • 83. Arithmetic Circuit Design  There are only four functions of B to select as Y in G = A + Y: • All 0’s • B • B • All 1’s Cin = 0 Cin = 1 G = A G = A + 1 G = A – 1 G = A + B G = A G = A + B G = A + B + 1 G = A + B + 1 Transfer of A Addition Subtraction 1C Decrement Increment Transfer of A Subtraction 2C Operation Operation S1 S0 B n B input logic n A n X Cin Y n G = X + Y + Cin Cout n-bit parallel adder
  • 84. Logic Circuit  The text gives a circuit implemented using a multiplexer plus gates implementing: AND, OR, XOR and NOT  Here we custom design a circuit for bit Gi by beginning with a truth table organized as logic operation K-map and assigning (S1, S0) codes to AND, OR, etc.  Gi = S0 Ai Bi + S1 Ai Bi + S0 Ai Bi + S1 S0 Ai  Custom design better S1S0 AND OR XOR NOT  The text gives a circuit implemented using a multiplexer plus gates implementing: AND, OR, XOR and NOT  Here we custom design a circuit for bit Gi by beginning with a truth table organized as logic operation K-map and assigning (S1, S0) codes to AND, OR, etc.  Gi = S0 Ai Bi + S1 Ai Bi + S0 Ai Bi + S1 S0 Ai  Custom design better S1S0 AND OR XOR NOT AiBi 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0
  • 85. Arithmetic Logic Unit (ALU)  The custom circuit has interchanged the (S1,S0) codes for XOR and NOT compared to the MUX circuit. To preserve compatibility with the text, we use the MUX solution.  Next, use the arithmetic circuit, the logic circuit, and a 2-way multiplexer to form the ALU.  The input connections to the arithmetic circuit and logic circuit have been assigned to prepare for seamless addition of the shifter, keeping the selection codes for the combined ALU and the shifter at 4 bits: • Carry-in Ci and Carry-out Ci+1 go between bits • Ai and Bi are connected to both units • A new signal S2 performs the arithmetic/logic selection • The select signal entering the LSB of the arithmetic circuit, Cin, is connected to the least significant selection input for the logic circuit, S0.  The custom circuit has interchanged the (S1,S0) codes for XOR and NOT compared to the MUX circuit. To preserve compatibility with the text, we use the MUX solution.  Next, use the arithmetic circuit, the logic circuit, and a 2-way multiplexer to form the ALU.  The input connections to the arithmetic circuit and logic circuit have been assigned to prepare for seamless addition of the shifter, keeping the selection codes for the combined ALU and the shifter at 4 bits: • Carry-in Ci and Carry-out Ci+1 go between bits • Ai and Bi are connected to both units • A new signal S2 performs the arithmetic/logic selection • The select signal entering the LSB of the arithmetic circuit, Cin, is connected to the least significant selection input for the logic circuit, S0.
  • 86. Arithmetic Logic Unit (ALU)  The next most significant select signals, S0 for the arithmetic circuit and S1 for the logic circuit, are wired together, completing the two select signals for the logic circuit.  The remaining S1 completes the three select signals for the arithmetic circuit. Ci Ci + 1 One stage of arithmetic circuit 2-to-1 MUX0 1 S A i Bi S0 S1 Ci G i A i Bi S0 S1 A  The next most significant select signals, S0 for the arithmetic circuit and S1 for the logic circuit, are wired together, completing the two select signals for the logic circuit.  The remaining S1 completes the three select signals for the arithmetic circuit. One stage of logic circuit S S2 A i Bi S0 S1 Cin
  • 87. Shifters  Required for data processing, multiplication, division etc.  Direction: Left, Right  Number of positions with examples: • Single bit:  1 position  0 and 1 positions • Multiple bit:  1 to n – 1 positions  0 to n – 1 positions  Filling of vacant positions • Many options depending on instruction set • Here, will provide input lines or zero fill  Required for data processing, multiplication, division etc.  Direction: Left, Right  Number of positions with examples: • Single bit:  1 position  0 and 1 positions • Multiple bit:  1 to n – 1 positions  0 to n – 1 positions  Filling of vacant positions • Many options depending on instruction set • Here, will provide input lines or zero fill
  • 88. 4-Bit Basic Left/Right Shifter  Serial Inputs: • IR for right shift • IL for left shift  Serial Outputs • R for right shift (Same as MSB input) • L for left shift (Same as LSB input)  Shift Functions: (S1, S0) = 00 Pass B unchanged 01 Right shift 10 Left shift 11 Unused B3 IR IL Serial output L Serial output R B2 B1 B0 S M U X 0 1 2 S M U X 0 1 2 S M U X 0 1 2 S M U X 0 1 2  Serial Inputs: • IR for right shift • IL for left shift  Serial Outputs • R for right shift (Same as MSB input) • L for left shift (Same as LSB input)  Shift Functions: (S1, S0) = 00 Pass B unchanged 01 Right shift 10 Left shift 11 Unused S 2 H0H1H2H3 X X X X
  • 89. What is Pipelining?  A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed.  A Pipeline is a series of stages, where some work is done at each stage. The work is not finished until it has passed through all stages.  With pipelining, the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations, holding them in a buffer close to the processor until each instruction operation can performed.  A technique used in advanced microprocessors where the microprocessor begins executing a second instruction before the first has been completed.  A Pipeline is a series of stages, where some work is done at each stage. The work is not finished until it has passed through all stages.  With pipelining, the computer architecture allows the next instructions to be fetched while the processor is performing arithmetic operations, holding them in a buffer close to the processor until each instruction operation can performed.
  • 90. How Pipeline Works?  The pipeline is divided into segments and each segment can execute its operation concurrently with the other segments. Once a segment completes an operation, it passes the result to the next segment in the pipeline and fetches the next operations from the preceding segment.  The pipeline is divided into segments and each segment can execute its operation concurrently with the other segments. Once a segment completes an operation, it passes the result to the next segment in the pipeline and fetches the next operations from the preceding segment.
  • 93. Advantages/Disadvantages  Advantages: • More efficient use of processor • Quicker time of execution of large number of instructions  Disadvantages: • Pipelining involves adding hardware to the chip • Inability to continuously run the pipeline at full speed because of pipeline hazards which disrupt the smooth execution of the pipeline.  Advantages: • More efficient use of processor • Quicker time of execution of large number of instructions  Disadvantages: • Pipelining involves adding hardware to the chip • Inability to continuously run the pipeline at full speed because of pipeline hazards which disrupt the smooth execution of the pipeline.
  • 94. Instruction Pipelining  Pipeline can also occur in instruction stream as with data stream  Consecutive instructions are read from memory while previous instructions are executed in various pipeline stages.  Difficulties • Different execution times for different pipeline stages • Some instructions may skip some of the stages. E.g. No need of effective address calculation in immediate or register addressing • Two or more stages require access of memory at same time. E.g. instruction fetch and operand fetch at same time  Pipeline can also occur in instruction stream as with data stream  Consecutive instructions are read from memory while previous instructions are executed in various pipeline stages.  Difficulties • Different execution times for different pipeline stages • Some instructions may skip some of the stages. E.g. No need of effective address calculation in immediate or register addressing • Two or more stages require access of memory at same time. E.g. instruction fetch and operand fetch at same time
  • 95. Pipeline Stages  Consider the following decomposition for processing the instructions  Fetch instruction (FI) – Read into a buffer  Decode instruction (DI) – Determine opcode, operands  Calculate operands (CO) – Indirect, Register indirect, etc.  Fetch operands (FO) – Fetch operands from memory  Execute instructions (EI)- Execute  Write operand (WO) – Store result if applicable  Overlap these operations to make a 6 stage pipeline  Consider the following decomposition for processing the instructions  Fetch instruction (FI) – Read into a buffer  Decode instruction (DI) – Determine opcode, operands  Calculate operands (CO) – Indirect, Register indirect, etc.  Fetch operands (FO) – Fetch operands from memory  Execute instructions (EI)- Execute  Write operand (WO) – Store result if applicable  Overlap these operations to make a 6 stage pipeline
  • 96. Timing of Instruction Pipeline - six stages Reduction in instruction execution time from 54 time units to 14 time units Reduction in instruction execution time from 54 time units to 14 time units
  • 97. Data Dependencies • When two instructions access the same register. • RAW: Read-After-Write • True dependency • WAR: Write-After-Read • Anti-dependency • WAW: Write-After-Write • False-dependency • Key problem with regular in-order pipelines is RAW. • When two instructions access the same register. • RAW: Read-After-Write • True dependency • WAR: Write-After-Read • Anti-dependency • WAW: Write-After-Write • False-dependency • Key problem with regular in-order pipelines is RAW.
  • 98. Pipeline Hazards  Structural Hazard or Resource Conflict – Two instructions need to access the same resource at same time.  Data Hazard or Data Dependency – An instruction uses the result of the previous instruction before it is ready.  A hazard occurs exactly when an instruction tries to read a register in its ID stage that an earlier instruction intends to write in its WO stage.  Control Hazard or Branch Difficulty – The location of an instruction depends on previous instruction  Conditional branches break the pipeline  Stuff we fetched in advance is useless if we take the branch  Pipeline implementation need to detect and resolve hazards.  Structural Hazard or Resource Conflict – Two instructions need to access the same resource at same time.  Data Hazard or Data Dependency – An instruction uses the result of the previous instruction before it is ready.  A hazard occurs exactly when an instruction tries to read a register in its ID stage that an earlier instruction intends to write in its WO stage.  Control Hazard or Branch Difficulty – The location of an instruction depends on previous instruction  Conditional branches break the pipeline  Stuff we fetched in advance is useless if we take the branch  Pipeline implementation need to detect and resolve hazards.
  • 99. Structural Hazard • When IF stage requires memory access for instruction fetch, and MEM stage need memory access for operand fetch at the same time. • Solutions:  Stalling (Waiting/Delaying)  Delayed by one clock cycle  Split cache  Separate cache for instructions(code cache) and operands(data cache) • When IF stage requires memory access for instruction fetch, and MEM stage need memory access for operand fetch at the same time. • Solutions:  Stalling (Waiting/Delaying)  Delayed by one clock cycle  Split cache  Separate cache for instructions(code cache) and operands(data cache)
  • 100. Data Hazard • Data hazards occur when data is used before it is ready. Read After Write (RAW) InstrJ tries to read operand before InstrI writes it • Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. Execution Order is: InstrI InstrJ Read After Write (RAW) InstrJ tries to read operand before InstrI writes it • Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication. I: add r1,r2,r3 J: sub r4,r1,r3
  • 101. Data Hazard (cont.) Write After Read (WAR) InstrJ tries to write operand before InstrI reads it – Gets wrong operand – Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. Execution Order is: InstrI InstrJ I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Write After Read (WAR) InstrJ tries to write operand before InstrI reads it – Gets wrong operand – Called an “anti-dependence” by compiler writers. This results from reuse of the name “r1”. I: sub r4,r1,r3 J: add r1,r2,r3 K: mul r6,r1,r7
  • 102. Data Hazard (cont.) Write After Write (WAW) InstrJ tries to write operand before InstrI writes it – Leaves wrong result ( InstrI not InstrJ ) • Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. Execution Order is: InstrI InstrJ I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7 Write After Write (WAW) InstrJ tries to write operand before InstrI writes it – Leaves wrong result ( InstrI not InstrJ ) • Called an “output dependence” by compiler writers This also results from the reuse of name “r1”. I: sub r1,r4,r3 J: add r1,r2,r3 K: mul r6,r1,r7
  • 103. Data Hazard (cont.) • Solutions for Data Hazards – Stalling – Forwarding: » connect new value directly to next stage – Reordering • Solutions for Data Hazards – Stalling – Forwarding: » connect new value directly to next stage – Reordering
  • 104. Data Hazard - Stalling 0 2 4 6 8 10 12 IF ID EX MEM 16 add $s0,$t0,$t1 STALL 18 sub $t2, $s0,$t3 IF EX MEM STALL BUBBLE BUBBLE BUBBLE BUBBLE BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE $s0 written here W s0 WB $s0 read here R s0 BUBBLE 0 2 4 6 8 10 12 IF ID EX MEM 16 add $s0,$t0,$t1 STALL 18 sub $t2, $s0,$t3 IF EX MEM STALL BUBBLE BUBBLE BUBBLE BUBBLE BUBBLEBUBBLE BUBBLE BUBBLE BUBBLE $s0 written here W s0 WB $s0 read here R s0 BUBBLE
  • 105. Data Hazard - Forwarding • Key idea: connect new value directly to next stage • Still read s0, but ignore in favor of new result ID 0 2 4 6 8 10 12 IF ID EX MEM 16 add $s0 ,$t0,$t1 18 sub $t2, $s0 ,$t3 IF EX MEM W s0 WBR s0 new value of s0 • Key idea: connect new value directly to next stage • Still read s0, but ignore in favor of new result ID 0 2 4 6 8 10 12 IF ID EX MEM 16 add $s0 ,$t0,$t1 18 sub $t2, $s0 ,$t3 IF EX MEM W s0 WBR s0 new value of s0
  • 106. Data Hazard This is another representation of the stall. LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID EX MEM WB AND R6, R1, R7 IF ID EX MEM WBAND R6, R1, R7 IF ID EX MEM WB OR R8, R1, R9 IF ID EX MEM WB LW R1, 0(R2) IF ID EX MEM WB SUB R4, R1, R5 IF ID stall EX MEM WB AND R6, R1, R7 IF stall ID EX MEM WB OR R8, R1, R9 stall IF ID EX MEM WB
  • 107. Data Hazard - Reordering • Consider a program segment 1. t = 5 2. x = y + z 3. p = x + 1 • Instruction 3 is dependent on instruction 2 but Instruction 2 and 3 has no dependency on instruction 1 • So after reordering program segment will be 1. x = y + z 2. t = 5 3. p = x + 1 1 2 3 4 5 6 IF ID OF EX IF ID x OF EX Pipelined execution • Consider a program segment 1. t = 5 2. x = y + z 3. p = x + 1 • Instruction 3 is dependent on instruction 2 but Instruction 2 and 3 has no dependency on instruction 1 • So after reordering program segment will be 1. x = y + z 2. t = 5 3. p = x + 1 1 2 3 4 5 6 IF ID OF EX IF ID OF EX IF ID OF EX Pipelined execution after reordering
  • 108. Control Hazard • Caused by branch instructions – unconditional and conditional branches.  Unconditional – branch always  Conditional – may or may not cause branching • In pipelined processor following actions are critical:  Timely detection of a branch instruction  Early calculation of branch address  Early testing of branch condition for conditional branch instructions • Caused by branch instructions – unconditional and conditional branches.  Unconditional – branch always  Conditional – may or may not cause branching • In pipelined processor following actions are critical:  Timely detection of a branch instruction  Early calculation of branch address  Early testing of branch condition for conditional branch instructions
  • 109. Control Hazard (cont.) • Branch Not Taken
  • 110. Control Hazard (cont.) • Branch in a Pipeline – Flushed Pipeline
  • 111. Dealing with Branches • Delayed branches • Branch prediction • Multiple Streams • Prefetch Branch Target • Loop buffer • Delayed branches • Branch prediction • Multiple Streams • Prefetch Branch Target • Loop buffer
  • 112. Delayed Branch  Delayed Branch – used with RISC machines  Requires some clever rearrangement of instructions  Burden on programmers but can increase performance  Most RISC machines: Doesn’t flush the pipeline in case of a branch  Called the Delayed Branch  This means if we take a branch, we’ll still continue to execute whatever is currently in the pipeline, at a minimum the next instruction  Benefit: Simplifies the hardware quite a bit  But we need to make sure it is safe to execute the remaining instructions in the pipeline  Simple solution to get same behavior as a flushed pipeline: Insert NOP – No Operation – instructions after a branch  Called the Delay Slot  Delayed Branch – used with RISC machines  Requires some clever rearrangement of instructions  Burden on programmers but can increase performance  Most RISC machines: Doesn’t flush the pipeline in case of a branch  Called the Delayed Branch  This means if we take a branch, we’ll still continue to execute whatever is currently in the pipeline, at a minimum the next instruction  Benefit: Simplifies the hardware quite a bit  But we need to make sure it is safe to execute the remaining instructions in the pipeline  Simple solution to get same behavior as a flushed pipeline: Insert NOP – No Operation – instructions after a branch  Called the Delay Slot
  • 113. Delayed Branch (cont.)  Normal vs Delayed Branch  One delay slot - Next instruction is always in the pipeline. “Normal” path contains an implicit “NOP” instruction as the pipeline gets flushed. Delayed branch requires explicit NOP instruction placed in the code!  Normal vs Delayed Branch  One delay slot - Next instruction is always in the pipeline. “Normal” path contains an implicit “NOP” instruction as the pipeline gets flushed. Delayed branch requires explicit NOP instruction placed in the code!
  • 114. Delayed Branch (cont.)  Optimized Delayed Branch  But we can optimize this code by rearrangement! Notice we always Add 1 to A so we can use this instruction to fill the delay slot
  • 115. Branch Prediction  Predict never taken  Assume that jump will not happen  Always fetch next instruction  68020 & VAX 11/780  VAX will not prefetch after branch if a page fault would result  Predict always taken  Assume that jump will happen  Always fetch target instruction  Studies indicate branches are taken around 60% of the time in most programs  Predict never taken  Assume that jump will not happen  Always fetch next instruction  68020 & VAX 11/780  VAX will not prefetch after branch if a page fault would result  Predict always taken  Assume that jump will happen  Always fetch target instruction  Studies indicate branches are taken around 60% of the time in most programs
  • 116. Branch Prediction (cont.)  Predict by Opcode  Some types of branch instructions are more likely to result in a jump than others (e.g. LOOP vs. JUMP)  Can get up to 75% success  Taken/Not taken switch – 1 bit branch predictor  Based on previous history  If a branch was taken last time, predict it will be taken again  If a branch was not taken last time, predict it will not be taken again  Good for loops  Could use a single bit to indicate history of the previous result  Need to somehow store this bit with each branch instruction  Could use more bits to remember a more elaborate history  Predict by Opcode  Some types of branch instructions are more likely to result in a jump than others (e.g. LOOP vs. JUMP)  Can get up to 75% success  Taken/Not taken switch – 1 bit branch predictor  Based on previous history  If a branch was taken last time, predict it will be taken again  If a branch was not taken last time, predict it will not be taken again  Good for loops  Could use a single bit to indicate history of the previous result  Need to somehow store this bit with each branch instruction  Could use more bits to remember a more elaborate history
  • 117. Performance Measures • The most important measure of the performance of a computer is how quickly it can execute program. • The performance of a computer is affected by the design of its hardware, the complier and its machine language instructions, instruction set, implementation language etc. • The computer user is always interested in reducing the execution time. • The execution time is also referred as response time. • Reduction in response time increases the throughput. • Throughput: the total amount of work done in a given time. • The most important measure of the performance of a computer is how quickly it can execute program. • The performance of a computer is affected by the design of its hardware, the complier and its machine language instructions, instruction set, implementation language etc. • The computer user is always interested in reducing the execution time. • The execution time is also referred as response time. • Reduction in response time increases the throughput. • Throughput: the total amount of work done in a given time.
  • 118. 1. The System Clock • Processor circuits are controlled by a timing signal called, a clock. • The clock defines regular time intervals, called clock cycles. • To execute a machine instruction, the processor divides the action to be performed into a sequence of basic steps, such that each step can be completed in one clock cycles. • The constant cycle time(in nanoseconds) is denoted by t. • The clock rate is given by f=1/t which is measured in cycle per second(CPS). • The electrical unit for this measurement of CPS is hertz(Hz). • Processor circuits are controlled by a timing signal called, a clock. • The clock defines regular time intervals, called clock cycles. • To execute a machine instruction, the processor divides the action to be performed into a sequence of basic steps, such that each step can be completed in one clock cycles. • The constant cycle time(in nanoseconds) is denoted by t. • The clock rate is given by f=1/t which is measured in cycle per second(CPS). • The electrical unit for this measurement of CPS is hertz(Hz).
  • 121. 3. Speedup • Increase in speed due to parallel system compared to uni- processor system Where, T(N) represents the execution time taken by program running on N processors. and T(1) represents time taken by best serial implementation of a program measured on one processor.
  • 122. 4. Efficiency Speedup(N) = speedup measured on N processors
  • 124. 6. Amdahl’s Law • Gene Amdahl [AMDA67] • Potential speed up of program using multiple processors • Concluded that: —Code needs to be parallelizable —Speed up is bound, giving diminishing returns for more processors • Task dependent —Servers gain by maintaining multiple connections on multiple processors —Databases can be split into parallel tasks • Gene Amdahl [AMDA67] • Potential speed up of program using multiple processors • Concluded that: —Code needs to be parallelizable —Speed up is bound, giving diminishing returns for more processors • Task dependent —Servers gain by maintaining multiple connections on multiple processors —Databases can be split into parallel tasks
  • 126. 6. Amdahl’s Law • For program running on single processor —Fraction f of code infinitely parallelizable with no scheduling overhead —Fraction (1-f) of code inherently serial —T is total execution time for program on single processor —N is number of processors that fully exploit parallel portions of code • Conclusions —f small, parallel processors has little effect —N ->∞, speedup bound by 1/(1 – f) – Diminishing returns for using more processors • For program running on single processor —Fraction f of code infinitely parallelizable with no scheduling overhead —Fraction (1-f) of code inherently serial —T is total execution time for program on single processor —N is number of processors that fully exploit parallel portions of code • Conclusions —f small, parallel processors has little effect —N ->∞, speedup bound by 1/(1 – f) – Diminishing returns for using more processors
  • 127. 6. Amdahl’s Law Fig. Speed-up vs number of processors for Amdahl’s law
  • 128. 6. Amdahl’s Law Exercise
  • 129. 6. Amdahl’s Law Exercise (?)