[mus422] Huffman Coding

Fri Mar 5 10:25:34 PST 2010

Jieun,

I just wanted to follow up on our discussion yesterday on Huffman coding.
As you may already know, Layers I and II do not do any Huffman coding but
Layer III Huffman encodes quantized mantissas using an approach similar to
that used in AAC.  I will describe below what Layer III does and then
summarize the differences between the Layer III and AAC approaches. 

In general, you can design your Huffman coding following their general
guidelines but you should base it on you coder characteristics. I would
strongly recommend not using their tables (you need to build your own
based on your coder structure).  (Also, you don't necessarily need to
follow their spectral subdivision.  The documentation should serve you as
a guideline for their approach, but you cannot apply it verbatim to your
coder.)  I will also distribute in class today some general notes for
Huffman code that may help.

In Layer III, the spectral lines are divided in up to 5 groups that can
use different Huffman coding books.  The basic idea is that the highest
frequency lines aren't passed at all (zero bits) and the number of bits
allocated gets higher as the frequency gets lower.  This leads to using
different codebooks that are geared to mantissa magnitudes that get higher
as the line frequencies get lower.   (In all cases, the Huffman encoding
applies only to the mantissa's magnitude and Huffman codes are followed by
sign bits for each non-zero magnitude.)  

The frequency lines are broken into 3 sections: the zeroes section which
includes all the highest frequency lines that aren't given any bits so
don't need encoding, the ones section which includes the next set of lines
(moving downward in frequency) that are only quantized as values {-1,0,1},
and then the "bigvalues" section which includes all other lines.

Lines in the "ones" section are encoded in groups of 4 using one of 2
Huffman Tables (one of which doesn't actually Huffman code -- it assumes
equal probabilities of all outcomes). Lines in the bigvalues section can
be broken into up to 3 groups, each of which can use its own Huffman
codebook to encode pairs of values.  (Note: to make the pairing work out,
the zeroes section must include an even number of lines and the ones
section must include a multiple of 4 lines -- a last odd zero line becomes
part of the ones section and any loose lines in the ones section get
included in the bigvalues section.)  The encoder can try different choices
of how to split the bigvalues region as well as deciding which codebook to
use in each region and then can send that information to the decoder.

The bigvalues codebooks are of 2 types:  books that only allow mantissas
up to a specified maximum absolute value and books that are capped at
mantissa magnitudes equal to 15 but which have an "escape code" allowing
bigger magnitudes to be handled.  (Note: in AAC there is the pulse method
to handle bigger values in the non-escape codebooks but this method isn't
included in Layer III.)  

The way the larger amplitudes are handled is that each escape-code book
has a specified number of bits that are read after the escape code is
received and the number represented in those bits is added to 14 to
represent the mantissa magnitude.  (The escape code is 15 in these tables
so the largest magnitude they can represent w/o the escape code is 14.
The bits after the escape code tell how much higher the actual magnitude
was.)  Clearly, large values much higher than 14 need more bits to
describe and so the main difference in the escape code tables is the
number of bits that follow each escape code.    (Note: AAC has changed the
syntax following the escape code and follows it with a sequence of 1 bits
followed by a zero bit that tells how many bits are needed to describe the
large magnitude and then those bits follow.  Due to that change in escape
syntax, AAC only has one escape code table.)

That's basically it -- AAC builds upon this basic method in a number of
ways.  Firstly, AAC also Huffman encodes scale factors (or more
accurately, differences in frequency-adjacent scale factors).  Secondly,
AAC offers a bit more flexibility in how lines are grouped into sets using
different codebooks -- AAC allows any groupings of adjacent critical bands
to get their own codebooks.  Thirdly, AAC has an additional way to handle
magnitudes beyond the table maxima (the pulse method) that applies to the
non-escape code tables so they can be used even when a few lines have
magnitudes beyond the maximum for the codebook table.  Finally, the escape
code syntax was made more flexible so each instance of an escape code
tells how many bits are needed to describe the large magnitude so all the
extra escape code tables were eliminated.  (A minor other difference is
that Layer III only Huffman encodes mantissa magnitudes, appending sign
bits to the codes as needed, while AAC also includes a few Huffman
codebooks that encode signed mantissas.)

Marina Bosi

Consulting Professor, Department of Music

Stanford University

Computer Center for Research in Music and Acoustics

The Knoll,  660 Lomita Court

Stanford, California 94305-8180, USA

http://ccrma.stanford.edu

mbosi at stanford.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cm-mail.stanford.edu/pipermail/422/attachments/20100305/b812152b/attachment.html