MP3 is a way of compressing a sound file to approximately 1/10th of its normal size. Prior to MP3, it was difficult and time-consuming for the average computer-user to send sound files through the Internet. In 1993, the Fraunhofer Institute, a research institute in Germany, combined many different patented algorithms to come up with a way of reducing the size of audio files, while still maintaining listenable quality. They created MP3 in partnership with Thomson Multimedia SA of France, a company that is in the business of owning patents.
Since Fraunhofer is a non-profit institute with an ethic of supporting the rapid development of new innovations into products, they did not keep MP3 a secret. The Motion Picture Experts Group (MPEG) was impressed with MP3 and made it an Official Standard for sound compression. Anyone can obtain the source code from the International Standards Organization, use this source code to build their own MP3 player or encoder, and make improvements on the MP3 formula. However, if they distribute the players or encoders, they must pay substantial royalties to Fraunhofer/Thomson.
Fraunhofer/Thomson also expects royalty payments from entities that sell more than $100,000 worth of MP3 encoded songs or material. And they have also stated that they may begin charging fees to webcasts in the future. Currently, Fraunhofer/Thomson does not charge royalties on distribution of free MP3s, MP3 streaming and broadcasting, sales of MP3s that generate less than $100,000, distribution of free MP3 players and distribution of limited-use demo-version encoders.
To date (April 2001), Fraunhofer/Thomson is only interested in any party that makes substantial income from MP3. Fraunhofer’s commitment to encouraging technological development and the financial impracticality of pursuing small time users or developers seem to be mediating factors. The liberal attitude of Fraunhofer/Thomson and the open specifications of MP3 have been the pivotal factors in its widespread popularity.
MP3 is commonly referred to as a “codec,” which literally means “compressor/decompressor.” MP3 processes the sound through an algorithm to create a compressed file, and then this file, when played, is decompressed and made again into an audio signal. The MP3 codec works through two types of compression, known as “lossy” and “lossless.” (A technical explanation of how the MP3 codec works can be found on an inner page of the Fraunhofer site. A more extensive and easy-to-understand explanation can be found online in an excerpt from Scot Hacker‘s book MP3: The Definitive Guide, published by O’Reilly.)
“Lossy” compression permanently deletes certain sound material. “Lossless” compression does not cause the file to lose any content. The “lossy” compression techniques are based on psychoacoustic principals. “Lossy” compression takes out frequencies that the average human being supposedly cannot hear. It takes out very high and low frequencies that are above and below the average human threshold of hearing. It also removes frequencies that are so low in volume that they are covered by other frequencies. These are called “masked” tones, because they are masked by the louder frequencies. There are two types of “masking.” The first is “simultaneous masking,” which means that if a louder tone and a softer tone happen at the same time, you do not hear the softer one. The second is “temporal masking,” which means when you hear a soft tone that is only milliseconds away from a loud tone, then you do not hear the soft tone.
MP3 also offers the option of using “joint stereo” in the higher audible frequencies. “Joint stereo” combines the high frequencies from the left and right tracks into a single track. Again this uses a psychoacoustic principal – humans find it difficult to determine the physical location of very high and very low frequencies.
The person encoding the MP3 file can determine, to some degree, how much material the codec takes out when selecting the bit-rate, or kilobytes-per-second (commonly written as “kbps.”). Low bit rates give the smallest files sizes, but there is obvious deterioration of the sound quality. High bit rates give better sound quality, but the files are somewhat larger.
The “lossless” compression that MP3 uses is called “Huffman coding.” Huffman coding simply reorders the data so that blank spaces in the memory are filled. For instance, a sparse passage does not use much memory, so some data from denser passages can be stored in that section. Huffman coding can typically reduce the file size by as much as 20%.
A tradition has evolved during the past century for pop music to be created and produced to cover the imperfections of radio airplay. Ideally, popular music is thick with harmonies and absent of silences. Thus popular music can stand the psychoacoustic deletions made by MP3 very well. Inversely, contemporary music often emphasizes subtleties such as tone color, room acoustics, harmonic intricacies, extreme ranges, and sparse voicings – all which are affected by lossy compression.
The following is a comparison of a sound file at regular CD quality and an MP3 file at 128 kbps. You will hear the CD-quality passage first, then the MP3, and then it will repeat the process. This sound-byte is from Alfred Zimmerlin‘s “Quintett für Klarinette, 2 Violinen, Viola und Violoncello” (1989-90), available on Edition Wandelweiser Records (EWR 9605) (used with permission). (Please note: Your computer must be able to play AIFF soundfiles in order for you to hear these examples. These soundfiles are not compressed.)
The following example features a sound-byte encoded at 32 kbps. Notice the deterioration of sound quality due to the lower bit rate.
Here is a comparison of loud white noise. First you will hear the regular sound file, then the MP3 at 128kbps, and finally the MP3 at 32kpbs, and then the process repeats.
If one subtracts an MP3 file from the original audio file, a considerable amount of audible material remains. Following is an example, again using the Zimmerlin sound-byte. Please note that there are probably some artifacts from the subtraction process, but this example should give a rough idea of what the remnants of an MP3 file at 128 kbps sound like. The volume of the subtraction was very low, probably because the sound is largely made up of “masked tones,” so I raised the volume of the example. Note the emphasis of extreme high and low frequencies, probably remnants of the filtering. (Special thanks to Scott Wilson for creating the following subtraction.)
It is quite interesting that there is so much audible material left after the subtraction. This material, according to psychoacoustic experts, is what we are not supposed to be able to hear.