I was trying to get the sample rate of some audio data. Due to licensing issue and some other restrictions, I can’t use some existing libraries to parse the MPEG header. So I decided to write a simple parser to get the sample rate from the MPEG header.

The audio data I will be mainly working about is MP3 and M4A.

MPEG Format

MPEG is a standard for coding audio-visual information. It is mainly used for audio and video compression. The MPEG audio format is divided into 3 layers and 3 different versions. The below table shows the different versions and layers of MPEG audio.

VersionLayerDescription
11MPEG-1 Layer 1
12MPEG-1 Layer 2
13MPEG-1 Layer 3
21MPEG-2 Layer 1
22MPEG-2 Layer 2
23MPEG-2 Layer 3
2.51MPEG-2.5 Layer 1
2.52MPEG-2.5 Layer 2
2.53MPEG-2.5 Layer 3

The commonly known MP3 format is MPEG-1 Layer 3.

MPEG Header

The tip of parsing MPEG audio header is to understand its structure:

AAAAAAAA AAABBCCD EEEEFFGH IIJJKLMM

Below is the structure of the MPEG header:

A: Frame sync

11 bits. It is always set to 1111 1111 111 in binary. To find the frame sync bits, we should look for something like 0xFFE0 ~ 0xFFFF in hexadecimal.

B: MPEG Audio version ID

2 bits. It indicates the MPEG version. The following table shows the version ID and the corresponding MPEG version.

IDMPEG Version
00MPEG Version 2.5
01Reserved
10MPEG Version 2
11MPEG Version 1

C: Layer description

2 bits. It indicates the MPEG audio layer. The following table shows the layer description and the corresponding MPEG layer.

IDLayer
00Reserved
01Layer III
10Layer II
11Layer I

D: Protection bit

1 bit. It shows whether the audio frame has CRC protection. If the bit is set to 0, the frame is protected by CRC.

E: Bitrate index

4 bits. Stores the bitrate index. You can find the mapping table of bitrate index to bitrate on the internet.

F: Sampling rate frequency index

This is the target I was looking for. The following table shows the sampling rate frequency index and the corresponding sampling rate.

bitsMPEG 1MPEG 2MPEG 2.5
00441002205011025
01480002400012000
1032000160008000
11ReservedReservedReserved

G: Padding bit

1 bit. If the bit is set to 1, the frame is padded with an extra slot.

H: Private bit

1 bit. It is used for private purposes.

I: Channel mode

2 bits. The following table shows the mapping of index to channel mode.

bitsMode
00Stereo
01Joint stereo(Stereo)
10Dual channel(Stereo)
11Single channel(Mono)

J: Mode extension (Only for Joint stereo)

2 bits.

1 bit. 0 for no copyright, 1 for copyright.

L: Original

1 bit. 0 for copy, 1 for original.

M: Emphasis

2 bits.

In conclusion, to get the sample rate for MP3 audio we need to first find the frame sync bits, next find the MPEG version ID, and then find the sampling rate frequency index.

Example

Finding the Frame Sync Bits

This is the step to know where the MPEG header starts, which I think is the most important but also the biggest hurdle. As I mentioned above, the frame sync bits are always set to 1111 1111 111 in binary. So we can find the frame sync bits by looking for 0xFFE0 ~ 0xFFFF in hexadecimal. alt text

Finding the MPEG Version ID

In this example, the first 2 bytes of the MPEG header FF FB, which is 1111 1111 1111 1011 in binary. Removing the frame sync bits, we get 1 1011. The first 2 bits are the MPEG version ID, which is 11. According to the table above, the MPEG version is 1.

Finding the Sampling Rate Frequency Index

The 3rd byte of the MPEG header contains the bitrate index and the sampling rate frequency index. In this example is 0xB0, which is 1011 0000 in binary. The sampling rate frequency index is the 5th and 6th bits, which is 00. According to the table above, the sampling rate is 44100Hz.

M4A

As for M4A file, the process is much simpler. All we need to do is to first find the moov atom, then find the mdhd box, and finally get the sample rate from the box.

The mdhd box contains the sample rate of the audio data. The sample rate is stored in the 15th and 16th bytes after the “mdhd” bytes in big-endian format. In the example below, the sample rate is 0x2EE0(marked in pink), which is 12000Hz in decimal.

alt text