Sunday, September 23, 2007

Fixed Point Arithmetic in DSP

* Rules for Fixed-Point Arithmetic
- Unsigned wordlength: U(a, b) is a + b
Range: [0, 2**a - 2**(-b)]
- Signed wordlength: A(a, b) is a + b + 1(sign)
Range: [-2**a, 2**a - 2**(-b)]
- Addition
Scale must be the same for all the operands
X(e, f) + Y(e, f) = Z(e+1, f)
- Multiplication
X(a, b) * Y(e, f) = Z(a+e, b+f) Note: exclude the sign bit
- Shifting to divide/multiple by a power of two, or to change the scaling
1) multiply/divide by a power of two: X(a, b) >> n = X(a, b)
where if n greater than zero, right shift and if n less than zero, left shift
2) modifying scale: X(a, b) >> n = X(a+n, b-n)
Note these operations might result in a loss of precision or overflow.

* How to scale
- For A(a, b) scale, the fixed point num would be X = (x + 2**(-b-1)) * 2**b
- Round is better than truncation for scaling in terms of error. Always add 2**(b-1) when scaling to b
- The maximal value of output for FIR
Ymax = Xmax * sum(abs(bi)) where bi is the coefficients of FIR. So sum(abs(bi)) is generally one.
- Scale the coefficients for FIR
For signed number with M bit wordlength, in order to represent the maximal value of the coefficients, S1 = floor(log2(2**(M-1)-1)/max(abs(bi)))
In order to make the final sum not overflow, S2 = A - L - ceiling(log2(sum(abs(bi)))), where A is the length of accumulator, L is the bitwidth of inputs.
So the final scale for the coefficients of FIR should be S = min(S1, S2)

No comments: