2. WHAT IS A DSP PROCESSOR?
• It is a type of processor which is generally used to process real time data.
• DSP applications such as convolution , correlation need array multiplication.
• In such cases it is required that multiplication should be completed before arrival of next input sample
in the array.
• Most DSP algorithms involve repetitive arithmetic operations such as multiply and add, multiple
memory access , heavy dataflow through CPU.
• For these functions to be performed advanced DSP architecture is required.
3. DSP BLOCKS
• The internal hardware of a digital signal processor consists of many blocks:
1. CPU
2. Arithmetic Logic Unit (ALU)
3. Accumulators
4. Barrel shifter
5. Multiplier unit
6. Compare Select and Store Unit ( CSSU )
7. Memory cache
8. DMA controller
5. DSP MEMORY ARCHITECTURE
• The DSP architecture is of three types:
1. Von Neumann Architecture
2. Harvard Architecture
3. Super Harvard Architecture (SHARC)
6. VON NEUMANN ARCHITECTURE
• Von Neumann architecture contains a single memory and a single bus for transferring data into and out
of the central processing unit (CPU).
• Multiplying two numbers requires at least three clock cycles, one to transfer each of the three numbers
over the bus from the memory to the CPU. We don't count the time to transfer the result back to
memory, because we assume that it remains in the CPU for additional manipulation (such as the sum of
products in an FIR filter).
• The Von Neumann design is quite satisfactory when you are content to execute all of the required tasks
in serial.
7. HARVARD ARCHITECTURE
• It has separate memories for data and program instructions, with separate buses for each. Since the
buses operate independently, program instructions and data can be fetched at the same time,
improving the speed over the single bus design.
• This architecture increases the speed of computation as compared to Von Neumann architecture.
8. SUPER HARVARD ARCHITECTURE (SHARC)
• SHARC® DSPs, a contraction of the longer term, Super Harvard ARChitecture.
• SHARC DSPs are optimized by addition of: an instruction cache, and an I/O controller.
INSTRUCTION CACHE
• DSP algorithms generally spend most of their execution time in loops, such as instructions . This means
that the same set of program instructions will continually pass from program memory to the CPU. By
including an instruction cache in the CPU. It is a small memory that contains about 32 of the most
recent program instructions. On additional executions of the loop, the program instructions can be
pulled from the instruction cache. This means that all of the memory to CPU information transfers can
be accomplished in a single cycle.
9. SUPER HARVARD ARCHITECTURE (SHARC)
I/O CONTROLLER
• The SHARC DSPs provides both serial and parallel communications ports. These are extremely high
speed connections. For example, at a 40 MHz clock speed, there are two serial ports that operate at 40
Mbits/second each, while six parallel ports each provide a 40 Mbytes/second data transfer. When all six
parallel ports are used together, the data transfer rate is an incredible 240 Mbytes/second.
• Thus the I/O port helps in faster execution.
10. PIPELINING
• It is a technique which allows two or more operations to overlap during execution.
• DSP algorithms are repetitive making them suitable for pipelining .
• It ensures a steady flow of instructions to the CPU and increases system performance.
• In pipelining each instruction still takes three clock cycles but at each cycle the processor is executing up
to three different instructions.
• It has an impact upon the system memory . The no.of memory accesses increases by the no.of stages.
• In Harvard architecture the separation of data and instruction memory promotes pipelining.
• It also allows better utilisation of arithmetic unit.
12. MULTIPLIER-ACCUMULATOR UNIT (MAC)
• DSP operations involve many time consuming multiplications and additions.
• To make real-time operation faster multiplier-accumulator (MAC) unit using fixed or floating point
arithmetic is mandatory.
• The MAC unit consists of a multiplier that has a pair of input registers that holds the inputs to the
multiplier and a 32 bit product register which holds the result of a multiplication.
• The output of the product register is connected to a double precision accumulator where the products
are accumulated.
• Floating point MACs allow fast computation with minimal errors.
14. MULTIPLIER/ADDER UNIT
• The multiplier/adder block consists of several elements:
1. A multiplier , an adder ,signed/unsigned input
2. Control logic
3. Zero detector, a rounder, overflow logic
• The multiplier/adder unit performs 17 X 17 bit multiplication with 40 bit addition in a single instruction
cycle.