[mus422] bit optimization equation

Wed Feb 10 22:14:00 PST 2010

Hello Music 422 Class,

For HW5 (question 4), the most convenient form of the optimal bit
allocation equation can be found at the bottom of page 217:

R_b^{opt} = (P / K_p) + ln(10) / (20 ln(2)) *
                    [ SMR_b - 1/K_p  sum_{passed b}  (N_b  SMR_b) ]

[ and on the lecture slides from last Friday in an equivalent but
   more explicit summation indexing form:

   R_b = R + ln(10)/(20 ln(2)) [ SMR_b - 1/K sum_{c=0}^{B-1} N_c SMR_c ]
    where R = average bits per mantissa,
          K = number of spectral bins (lines) in the block (such as 512)
          N_c = number of bins (lines) in subband c.
          B = total number of subblocks (subbands) in block (such as 25)
          SMR_b is the maximum signal to mask ratio in subblock (band) b.
 ]

And where (for the equation on page 217):

R_b^{opt} is the optimal mantissa bit allocation for
sub-block b (a rational number which has
to be converted to integers; resulting negative values
for bit allocations should be set to zero by borrowing
from positive allocations in other subblocks, and giving
away any allocations set to 1 to other subblocks, since
mid-tread quantization requires a minimum of two bits
in the mantissa, and having one bit is equivalent to
storing just the sign bit).

P = bit pool: number of available bits (for mantissas only
not including scale bits or other block encoding pieces)
defined at bottom of page 207.

K_p = number of spectral bins which are not allocated
zero mantissa bits (defined sort of on page 207;
"p" meaning "passed" and not related to "P", the bit pool).

P/K_p = average number of mantissa bits per
spectral bin (average for all spectral bins not
allocated zero bits).

ln(10)/(20 ln(2)) is equal to log_2(10) / 20
or in decimal value:  0.166096

---------------

SMR_b is the signal to mask ratio which is defined as
(by comparing to the previous equation on page 217):

SMR_b = 20 log_10(|x_max_b|) - 20 log_10(M_b)
or equivalently:
SMR_b = 10 log_10 (x_max_b^2 / M_b^2 )

Which is the dB value of the maximum
amplitude in a subblock minus the dB value of the
masking level for the subblock.  The two dB values
have to use the same reference (most likely
on the dB_SPL scale), or if you are starting from
amplitude values rather than dB values, then a
reference of 1 (dividing amplitudes by one before
taking the log) will work.

For the SMR_b equation above:

b is an index into the subblocks.

|x_max_b| is the maximum FFT spectral bin magnitude
in the subblock.  Marina can correct me on this point,
since it looks like you are supposed to be using
the MDCT spectral values instead of FFT values,
according to the documentation for CalcSMRs in HW4.

M_b is the amplitude of the masking curve (calculated
from your masking model) for subblock b.  Marina
mentioned in class on Friday that M_b can be defined
conservatively as the minimum masking amplitude
in the subblock, or you could also define it in different
ways.

-----------

Back to the components in the last equation on page 217:

sum_{passed b} is a summation over all subblocks
which are not assigned zero bits for mantissas.

N_b is the number of frequency bins in subblock b.

Terminology notes:

"bin" is one spectral value in an FFT or MDCT
spectrum, also called a spectral "line".

"block" is the first half of a full MDCT array.
So a signal transform of size 1024 will yeild
a block size of half that (512).

"subblock" is also called "subband" as well
as "scale factor band" which is a collection of
spectral bins which will share a single scale
value during the block floating-point
quantization process (We are cutting up the
block into 25 subblocks).

---------------

Another way of expressing the bottom equation on page 217
is found just above the bottom equation on the page, which
was listed in an equivalent equation on the lecture slides
this Friday:

R_b = R + 1/2 log_2 [ (x_max_b)^2 /(M_b)^2 ] -
     1 / (2 K) sum_{c=0}^{B-1} [ N_c * log_2[ (x_max_c)^2 / (M_c)^2 ] ]

Where R_b = optimized bit allocation for subblock (band)
             R = average bit allocation per bin
              x_max_b = max abs amplitude in FFT (MDCT?) subblock b.
              M_b = masking level for subblock b
              N_c = bin count in subblock c

-=+Craig