Showing posts with label DSP. Show all posts
Showing posts with label DSP. Show all posts

Monday, November 19, 2007

Energy and Power Spectral Desity

Energy and/or power are the key property of one signal in DSP domain. The various transformations, like Fourier trans, DCT trans, wavelet trans, etc. are used to study the energy and/or power of the signal in essence.

* Energy signals
- Deterministic or random signals.
- Square integrable or square summable.
- The energy spectral density would be derived from its Fourier transformation.

* Power signals
- Deterministic or random signals.
- Not square integrable or square summable.
- The power spectral density would be the Fourier transformation of its autocorrelation function.

* Property for ESD and PSD
- Non negative
- The area under the energy spectral density curve is equal to the area under the square of the magnitude of the signal, no matter if the signal is continuous or discrete. The total power in a power spectral density being equal to the corresponding MEAN total signal power, which is the autocorrelation function at zero lag.

Sunday, September 23, 2007

Fixed Point Arithmetic in DSP

* Rules for Fixed-Point Arithmetic
- Unsigned wordlength: U(a, b) is a + b
Range: [0, 2**a - 2**(-b)]
- Signed wordlength: A(a, b) is a + b + 1(sign)
Range: [-2**a, 2**a - 2**(-b)]
- Addition
Scale must be the same for all the operands
X(e, f) + Y(e, f) = Z(e+1, f)
- Multiplication
X(a, b) * Y(e, f) = Z(a+e, b+f) Note: exclude the sign bit
- Shifting to divide/multiple by a power of two, or to change the scaling
1) multiply/divide by a power of two: X(a, b) >> n = X(a, b)
where if n greater than zero, right shift and if n less than zero, left shift
2) modifying scale: X(a, b) >> n = X(a+n, b-n)
Note these operations might result in a loss of precision or overflow.

* How to scale
- For A(a, b) scale, the fixed point num would be X = (x + 2**(-b-1)) * 2**b
- Round is better than truncation for scaling in terms of error. Always add 2**(b-1) when scaling to b
- The maximal value of output for FIR
Ymax = Xmax * sum(abs(bi)) where bi is the coefficients of FIR. So sum(abs(bi)) is generally one.
- Scale the coefficients for FIR
For signed number with M bit wordlength, in order to represent the maximal value of the coefficients, S1 = floor(log2(2**(M-1)-1)/max(abs(bi)))
In order to make the final sum not overflow, S2 = A - L - ceiling(log2(sum(abs(bi)))), where A is the length of accumulator, L is the bitwidth of inputs.
So the final scale for the coefficients of FIR should be S = min(S1, S2)

Wednesday, August 29, 2007

Digital Signal Processing

*Compared to IIR filters, FIR filters offer the following advantages:
- They can easily be designed to be "linear phase" (and usually are). Put simply, linear-phase filters delay the input signal, but don’t distort its phase. A FIR filter is linear-phase if (and only if) its coefficients are symmetrical around the center coefficient. FIR filter which has N taps, the delay is: (N - 1) / (2 * Fs), where Fs is the sampling frequency. So the center of FIR (1-D or 2-D) generally determines the exact output.
- They could be designed effectively by inverse-transforming desired freq response and windowing. And they are simple to implement. On most DSP microprocessors, the FIR calculation can be done by looping a single instruction.
- They are suited to multi-rate applications. By multi-rate, we mean either "decimation" (reducing the sampling rate), "interpolation" (increasing the sampling rate), or both. Whether decimating or interpolating, the use of FIR filters allows some of the calculations to be omitted, thus providing an important computational efficiency. In contrast, if IIR filters are used, each output must be individually calculated, even if it that output will discarded (so the feedback will be incorporated into the filter).
- They have desirable numeric properties. In practice, all DSP filters must be implemented using "finite-precision" arithmetic, that is, a limited number of bits. The use of finite-precision arithmetic in IIR filters can cause significant problems like overflow/fluctuation due to the use of feedback, but FIR filters have no feedback, so they can usually be implemented using fewer bits, and the designer has fewer practical problems to solve related to non-ideal arithmetic.
- They can be implemented using fractional arithmetic. Unlike IIR filters, it is always possible to implement a FIR filter using coefficients with magnitude of less than 1.0. (The overall gain of the FIR filter can be adjusted at its output, if desired.) This is an important consideration when using fixed-point DSP's, because it makes the implementation much simpler.

- Compared to IIR filters, FIR filters sometimes have the disadvantage that they require more memory and/or calculation to achieve a given filter response characteristic. Also, certain responses are not practical to implement with FIR filters.

Reference: DSPGuru Fir FAQ

Digital Signal Processor

* RISC for data manipulation and DSP for math processing

* Harvard Arch
Separate memories for data and program instructions, with separate buses for each.

* Super Harvard Arch
Harvard Arch + Instruction cache + I/O DMA

The filter coefficients are put in program memory, while the input signal in data memory. For one operation of addition, one input signal value needs to be over the data memory bus, but two values over the program memory bus (the program instruction and the coefficient). The first time through a loop, the program instructions must be passed over the program memory bus. This results in slower operation because of the conflict with the coefficients that must also be fetched along this path. However, on additional executions of the loop, the program instructions can be pulled from the instruction cache. This means that all of the memory to CPU information transfers can be accomplished in a single cycle: the sample from the input signal comes over the data memory bus, the coefficient comes over the program memory bus, and the program instruction comes from the instruction cache. In the jargon of the field, this efficient transfer of data is called a high memory-access bandwidth.

In order to improve throughput, I/O controller, DMA, is connected to data memory. This is how the signals enter and exit the system.

* Circular Buffer and Data Address Generator
Data Address Generator (DAG), one for each of the program and data memory. These control the addresses sent to the program and data memories, specifying where the information is to be read from or written to. DSPs are designed to operate with circular buffers, and benefit from the extra hardware to manage them efficiently. This avoids needing to use precious CPU clock cycles to keep track of how the data are stored.

There are many circular buffers in DSP. Some DSP algorithms are best carried out in stages. For instance, IIR filters are more stable if implemented as a cascade of biquads (a stage containing two poles and up to two zeros). Multiple stages require multiple circular buffers for the fastest operation. The DAGs are also designed to efficiently carry out the Fast Fourier transform. In this mode, the DAGs are configured to generate bit-reversed addresses into the circular buffers, a necessary part of the FFT algorithm. In addition, an abundance of circular buffers greatly simplifies DSP code generation- both for the human programmer as well as high-level language compilers, such as C.

* ALU, MAC and Barrel Shifter
A long bit accumulator is built into the multiplier to reduce the round-off error associated with multiple fixed-point math operations, esp. for IIR. The multiplier is designed with combination logic and could execute MAC operation in one cycle.

In some cases, shadow registers could be used for all the CPU's key registers. These are duplicate registers that can be switched with their counterparts in a single clock cycle. They are used for fast context switching, the ability to handle interrupts quickly. When an interrupt occurs in traditional microprocessors, all the internal data must be saved before the interrupt can be handled. This usually involves pushing all of the occupied registers onto the stack, one at a time. In comparison, an interrupt is handled by moving the internal data into the shadow registers in a single clock cycle. When the interrupt routine is completed, the registers are just as quickly restored.