Codec Considerations - The Nuts and Bolts of MP3 Compression

      Date posted: April 30, 2001

MP3: Codec Considerations

Or, How to fit a Bowling Ball Through a Garden Hose

Read the Main Story MP3: The Death Knell of High End Audio?
MP3 Hardware Reviews: RCA Lyra, Audio ReQuest ARQ-1

MP3 History and Technology:

     In 1987 a large German research and development organization called the Fraunhofer Institut started work on a perceptual coding scheme for use with Digital Audio Broadcasting (DAB). The most powerful algorithm they developed eventually became standardized as ISO-MPEG Audio Layer-3, or, as now abbreviated,
MP3 (Layers 1 and 2 are essentially less efficient versions of the same algorithm). The goal of the algorithm was to drastically shrink the data size of digital audio with a minimum subjective loss of sound quality; something which could not be done by merely reducing sample rate and bit depth. Whereas as standard CD audio requires a fixed 1400 Kb/s data rate, MP3 typically operates at less than a tenth of that, the de facto standard being 128 Kb/s (it can range from 32 Kb/s to 384).

      So how do you throw away the vast majority of data and still end up with something that sounds largely like the original? The key is what’s known as perceptual coding, or, essentially, a bag of psychoacoustic tricks designed around the way human hearing works. The most important technique is an exploitation of what’s called the masking effect. Imagine a noisy LP. During the soft passages the noise can be obtrusive, you might even hear groove noise and tape hiss. In other words you’re hearing right down to the noise floor of the recording and the playback system. During loud passages, however, these problems are much less significant, and even inaudible as they are drowned out or masked by louder sounds. The clicks, pops, and hiss are still very much there, but your ear can’t distinguish them from the higher signal content overlaid during loud passages. The theory behind the perceptual coding used in MP3 is that since low level signals are masked by higher level signals, when the two exist simultaneously, at roughly the same frequency, space can be saved by not coding the lower level signal at all.

“So how do you throw away the vast majority of data and still end up with something that sounds largely like the original?”

     Say, for instance, you have a steady state tone at 1Khz and another at 1.1 Khz, but 20 db lower in level than the first. Theoretically, the second tone will be inaudible to humans as a distinct sound and the MP3 codec will toss it away. A tone at 6 Khz 20db down, however, would be coded, since it would be distinctly audible as a separate sound because of the difference in pitch. What the codec is doing during loud passages then is essentially raising the noise floor of the recording by throwing away bit depth whenever it thinks it can get away with it. The increased quantization noise created by lowering the bit rate on the fly is masked by the high level signals that make the process possible in the first place.

     If tossing away “masked” data isn’t enough, the MP3 codec can also perform various levels of what’s called Joint Stereo coding in an effort to shrink data size. Here the codec takes advantage of the fact that humans have trouble localizing sounds at the frequency extremes. The source of deep bass, for instance, is much more difficult to pinpoint than are midrange frequencies, which is exactly why sub/satellite speaker systems can get away with using only one subwoofer. MP3 exploits this by using a process called Intensity Stereo in which the codec can record very high and very low frequencies in mono. Another Joint Stereo tool is Mid/Side Stereo. Instead of left and right signals, a middle (L+R) and a side signal (L-R) are encoded, minimizing data size by using fewer bits for the side channel. When decoded, the mp3 player reconstructs the left and right channels.

“What the codec is doing during loud passages then is essentially raising the noise floor of the recording by throwing away bit depth whenever it thinks it can get away with it.”

     MP3’s remaining tools aren’t psychoacoustic in nature, but are nonetheless crucial to the algorithm’s effectiveness. One is a byte reservoir which allows the codec to draw on extra data capacity during short, difficult passages which would otherwise cause it to drastically alter sound quality. The other is the Huffman algorithm. In a similar manner to computer “zip” utilities, Huffman coding is a lossless method which replaces long strings of repetitive bytes with much shorter codes. For instance a code sequence containing ten consecutive ones (1111111111) would be encoded as a short expression which tells the decoding computer what to do. In this case ten consecutive, identical bytes could be encoded as 1 x 10. The decoding program would then reconstruct the full code during playback based on this formula.

Pretenders to the throne:

      MP3 may be the oldest and most widely used compression algorithm for audio on the internet, but it’s by no means the only one. In fact, there are probably dozens by now, but the most widely used besides MP3 appear to be
Windows Media Audio (WMA) from Microsoft and MPEG2-AAC (Advanced Audio Coding) from Fraunhofer (which is used currently in the Liquid Audio format).

AAC Logo

Indeed, it is AAC which holds the most promise for those with complaints about MP3 sound quality. Essentially a souped up version of MP3, AAC allows for up to 48 channels of audio and 15 low frequency enhancement channels, making it suitable for any number of surround applications. Some new psychoacoustic “tricks” such as a backward adaptive prediction system and temporal noise shaping allow it to be significantly more efficient than MP3 too. More importantly, for audiophiles anyway, it will support sampling rates up to 96 Khz. There is no indication, however, that it will be able to exceed bitrates of 384 Kb/s, still less than a third of the data throughput of a CD. With Fraunhofer claming that the codec is subjectively transparent at 96 Kb/s, why indeed would they bother with higher bit rates?

      Thankfully, even the mainstream consumer electronics press doesn’t agree. In a recent Stereo Review’s Sound & Vision shootout AAC handily beat out MP3 and WMA, but was not considered equal to the
original CD source. For my thoughts on MP3 sound quality at various different bit rates, have a look at my review of the Audio Request ARQ-1 home MP3 player.

Some MP3/AAC Resources:


(MP3 Tech)



Aaron Marshall

Share This on Your Favourite Social Networking Site:
  • Facebook
  • Technorati
  • StumbleUpon
  • Digg
email the author

Comment On this Post

  Outside the Speakers

    Random Thoughts on the Music Mask
    NPR on Whether Audiophiles Still Exist
    Audiophile Grade Mics?
    CDs Sales Die, LP Sales Fly
    Some High End 'Phones from CES

Got a tip?
  AIG Reviews
Loudspeakers | HT Loudspeakers | Power Amplifiers | Preamplifiers | Integrated Amplifiers | Receivers | Surround Processors | Digital Sources | CD Players | DVD Players | Network Music Players | SACD Players | Analog | Cables | Subwoofers | Portable Audio | Pro Audio | Headphones | Systems | Tuners | Video | Vintage Gear | HD Video | Blu-Ray | Online Archives

Complete List of Online Reviews

     Audio Ideas (Andrew Marshall)
     Ox Box (Bob Oxley)
     Hy End (Hy Sarick)
     Bain's Blog (John Edward Bain)

Interested in Writing about Hi-Fi and Home Theater for AIG? Click to Email a Writing Sample

  Most Popular Today
  1. Audio Ideas: CBC Radio Two - Intelligent Music Selection Becomes A Sausage Factory
  2. Audio Ideas: The Noo Radio Tiew - Amateur Radio, Paid For By You!
  3. Rega Turntable Tweaks - Aural Thrills Turntable Isolation Box
  4. Oracle Mentor Loudspeaker
  5. AIG Back Issues: Almanac 1998
  6. Grant Green - An Introduction
  7. MJ Acoustics Reference 100 Subwoofer [AIG Archives]
  8. DVD Reviews - Tom Jones
  9. AudioQuest Viper Interconnect
  10. Sunfire Theater Grand Processor II - Processor / Preamp/ Tuner
  Most Popular Overall
  • Recent Comments: