AVI formats

Publié le par nikigui

Why another AVI le format documentation?

Even though the AVI le format has been around for more than 10 years, there is no docu-

mentation available which does not only describe the format itself, but which also informs

about issues that come from awed demuxers and awed decoders, and how to circumvent


The goal of this document is not only to explain the AVI format, as it is dened on the

paper, but rather how to use it, when working with awed muxers, awed demuxers, and

awed decompressors.


Layout of an AVI le

A RIFF-List where dwFourCC = 'AVI ' shall be called a 'RIFF-AVI-List', a RIFF-List

where dwFourCC = 'AVIX' shall be called a 'RIFF-AVIX-List'.

Every AVI le has the following layout:

RIFF AVI // mandatory

{ RIFF AVIX } // only for Open-DML files

Unlike what a uint32 suggests, the limit for the size of those lists is not 4 GB, but

• for AVI 1.0: size(RIFF-AVI) < 2 GB

• for Open-DML:

size(RIFF-AVI) < 1 GB (!!) (assumed to be 2 GB by some muxing applications,

like VirtualDub!)

size(RIFF-AVIX) < 2 GB

As Windows XP insists on reading the entire rst RIFF AVI list if no Legacy Index (see

page 12) is found, and as that Legacy Index causes overhead, it is recommended to create

RIFF-AVI-Lists as small as possible.



Audio types requiring special attention

5.1 MP3

wFormatTag = 0x0055

An MP3 audio stream consists of inseparable frames. MP3 decoders should be able to handle

partial frames, but it is nevertheless recommended to store entire MP3 frames in the AVI


The strf chunk is an MPEGLAYER3WAVEFORMAT structure, which is an extention to the


typedef struct mpeglayer3waveformat_tag {



DWORD fdwFlags;

WORD nBlockSize;

WORD nFramesPerBlock;

WORD nCodecDelay;



This is only valid for MP3 ('MPEG Layer 3'), not for MP1 or MP2 ('MPEG Layer 1 / 2').

If the MP3 stream has a variable bitrate, then you need to convince DirectShow to seek

properly. See section 5.4 (page 20) for more details on VBR audio streams in AVI les.

Unfortunately, whoever came up with the idea didn't think enough about it: It is possible

to create MP3 audio frames larger than 1152 bytes if the sample rate is 32 khz or less. After

reading and understanding section 5.4, you'll see why such audio frames render an MP3

stream unplayable if nBlockSize is set to 1152, which is usually done for MP3. Using a

larger value would resolve this issue. However, some programs read an MP3 stream as VBR

if and only if this value is exactly 1152. In other words, low sample rates in combination

with high bitrates are a problem for MP3 VBR streams in AVI les.

5.2 AC3

wFormatTag = 0x2000

Muxing AC3 into AVI is far more problematic than most other audio formats. The reason

is that a lot of decoders (software as well as hardware) are severely b0rked.

An AC3 stream consists, like MP3, of individual, inseparable frames. It is required that

any audio chunk of an AVI le contains a few (complete!) AC3 frames. Otherwise, some

AC3 decoders will miscalculate the duration of a chunk. As the audio stream is considered

the master stream for playback in DirectShow, this miscalculation will lead to jerky video



Overhead of AVI les

This section describes how to predict the overhead of an AVI le before muxing. Note: In

the case of low overhead AVI les, the wording in this section is not applicable. Basicly, one

video/audio frame causes about 8-9 bytes of overhead in low overhead AVI les.

8.1 General

The overhead of AVI les depends on the number of CHUNKs in the le. Other structures has

only very little inuence on the total overhead. Each CHUNK causes the following amount of


• 8 Bytes chunk header (all avi types)

• 16 Bytes for entry in Legacy Index (see page 12) (AVI 1.0 and the RIFF-AVI-List of

Hybride les)

• 8 Bytes for entry in Standard Index (see page 15) (Open-DML)

That means, each CHUNK causes an amount of overhead of 16, 24 or 32 bytes.

8.2 Getting number of CHUNKS

8.2.1 Video

The easier part is the video stream: Each video frame takes one CHUNK.

8.2.2 Audio

The number of chunks for an audio stream depends on its format and packing. For specic

formats, where very special packing is required or considered normal (see page 19), the

overhead can be calculated easily from the settings. Otherwise, more precise information on

the muxing settings is needed.

8.2.3 Examples

Video: 3 hours, 25 fps ( = 3,600,000 / 40ms = 90,000 frames per hour)

Audio: 2x MP3-VBR (with 1 frame per CHUNK and 24 ms per frame)

Audio: 2x AC3 (with 4 frames per CHUNK and 32ms per frame)

-> Video: 270,000 CHUNKs

-> Audio: 2*150,000 + 2*3*28,125 = 468,750 CHUNKs

-> Sum = 738,750 CHUNKs


Publié dans mpg news and knowledge

Commenter cet article