Wednesday, August 29, 2007

Digital Signal Processor

* RISC for data manipulation and DSP for math processing

* Harvard Arch
Separate memories for data and program instructions, with separate buses for each.

* Super Harvard Arch
Harvard Arch + Instruction cache + I/O DMA

The filter coefficients are put in program memory, while the input signal in data memory. For one operation of addition, one input signal value needs to be over the data memory bus, but two values over the program memory bus (the program instruction and the coefficient). The first time through a loop, the program instructions must be passed over the program memory bus. This results in slower operation because of the conflict with the coefficients that must also be fetched along this path. However, on additional executions of the loop, the program instructions can be pulled from the instruction cache. This means that all of the memory to CPU information transfers can be accomplished in a single cycle: the sample from the input signal comes over the data memory bus, the coefficient comes over the program memory bus, and the program instruction comes from the instruction cache. In the jargon of the field, this efficient transfer of data is called a high memory-access bandwidth.

In order to improve throughput, I/O controller, DMA, is connected to data memory. This is how the signals enter and exit the system.

* Circular Buffer and Data Address Generator
Data Address Generator (DAG), one for each of the program and data memory. These control the addresses sent to the program and data memories, specifying where the information is to be read from or written to. DSPs are designed to operate with circular buffers, and benefit from the extra hardware to manage them efficiently. This avoids needing to use precious CPU clock cycles to keep track of how the data are stored.

There are many circular buffers in DSP. Some DSP algorithms are best carried out in stages. For instance, IIR filters are more stable if implemented as a cascade of biquads (a stage containing two poles and up to two zeros). Multiple stages require multiple circular buffers for the fastest operation. The DAGs are also designed to efficiently carry out the Fast Fourier transform. In this mode, the DAGs are configured to generate bit-reversed addresses into the circular buffers, a necessary part of the FFT algorithm. In addition, an abundance of circular buffers greatly simplifies DSP code generation- both for the human programmer as well as high-level language compilers, such as C.

* ALU, MAC and Barrel Shifter
A long bit accumulator is built into the multiplier to reduce the round-off error associated with multiple fixed-point math operations, esp. for IIR. The multiplier is designed with combination logic and could execute MAC operation in one cycle.

In some cases, shadow registers could be used for all the CPU's key registers. These are duplicate registers that can be switched with their counterparts in a single clock cycle. They are used for fast context switching, the ability to handle interrupts quickly. When an interrupt occurs in traditional microprocessors, all the internal data must be saved before the interrupt can be handled. This usually involves pushing all of the occupied registers onto the stack, one at a time. In comparison, an interrupt is handled by moving the internal data into the shadow registers in a single clock cycle. When the interrupt routine is completed, the registers are just as quickly restored.

No comments: