Reformat container doc

- split the doc into sections for simple and extended format and move example layouts to the end. - use ASCII tables to describe chunk formats - attempt to consistently use MUST/SHOULD, etc. - remove bold from most terms, but add them to definition lists which allow for the styling to be changed. Change-Id: I93c1cd33bde9ccf0b265b202ec4182ce98fd6b48
2025-08-11 02:20:33 +02:00 · 2012-02-07 15:06:32 -08:00
parent 85b6ff6897
commit e9a7d145e7
1 changed files with 458 additions and 381 deletions
--- a/doc/webp-container-spec.txt
+++ b/doc/webp-container-spec.txt
@ -13,7 +13,7 @@ end of this file.
 WebP Container Specification
 ============================

-_Working Draft, v0.1, 20111004_
+_Working Draft, v0.2, 20120207_


 * TOC placeholder
@ -27,8 +27,8 @@ WebP is a still image format that uses the VP8 key frame encoding, and
 possibly other encodings in the future, to compress image data in a
 lossy way. The VP8 encoding should make it more efficient than currently
 used formats. It is optimized for fast image transfer over the network
-(e.g., for websites). However, it also aims for feature parity (like
-Color Profile, XMP Metadata, Animation, etc.) with other formats. This
+(e.g., for websites). However, it also aims for feature parity
+(color profile, XMP metadata, animation, etc.) with other formats. This
 document describes the structure of a WebP file.

 The first version of WebP handled only the basic use case: a file
@ -57,6 +57,10 @@ Files not using these new features are backward compatible with the
 original format. Use of these features will produce files that are not
 compatible with older programs.

+The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
+"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
+document are to be interpreted as described in [RFC 2119][].
+

 Terminology &amp; Basics
 ------------------------
@ -64,7 +68,7 @@ Terminology &amp; Basics
 A WebP file contains either a still image (i.e., an encoded matrix of
 pixels) or an animation (see below), with possibly a color profile,
 metadata, etc. In case we need to refer only to the matrix of pixels,
-we will call it the **_canvas_** of the image.
+we will call it the _canvas_ of the image.

 The canvas of an image is built from one or multiple tiles. Each tile
 is a separately encoded VP8 key frame (other encodings are possible in
@ -74,42 +78,69 @@ of the file:  they are not supposed to be exposed to the user.

 Below are additional terms used throughout this document:

-Code that reads WebP files is referred to as a **_reader_**, while
-code that writes them is referred to as a **_writer_**.
+Code that reads WebP files is referred to as a _reader_, while
+code that writes them is referred to as a _writer_.

-A 16-bit, little-endian, unsigned integer will be denoted as
-**_uint16_**.
+_uint16_

-A 32-bit, little-endian, unsigned integer will be denoted as
-**_uint32_**.
+: A 16-bit, little-endian, unsigned integer.

-The basic element of a RIFF file is a **_chunk_**. It consists of:
+_uint32_

-  * 4 ASCII characters that will be called the **_chunk tag_**.
+: A 32-bit, little-endian, unsigned integer.

-  * uint32 with the size of the chunk content (that will be denoted as
-    **_ckSize_**).
+The basic element of a RIFF file is a _chunk_. It consists of:

-  * _ckSize_ bytes of content.
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         Chunk FourCC                          |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          Chunk Size                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         Chunk Payload                         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

-  * If _ckSize_ is odd, a single padding byte that **SHOULD** be `0`.
+Chunk FourCC: 32 bits

-A chunk with a tag "ABCD" will be also called a **_chunk of type_**
-"ABCD". Note that, in this specification, all chunk tag characters are
-in file order, not in byte order of a uint32 of any particular
+: ASCII four character code or _chunk tag_ used for chunk identification.
+
+Chunk Size: 32 bits (_uint32_)
+
+: The size of the chunk (_ckSize_) not including this field, the chunk
+  identifier and padding.
+
+Chunk Payload: _Chunk Size_ bytes
+
+: The data payload. If _Chunk Size_ is odd a single padding byte that
+  SHOULD be `0` is added.
+
+_ChunkHeader('ABCD')_
+
+: This is used to describe the fourcc and size header of individual
+  chunks, where 'ABCD' is the fourcc for the chunk. This element's
+  size is 8 bytes.
+
+_chunk of type_
+
+: A chunk with a tag "ABCD".
+
+: Note that, in this specification, all chunk tag characters are in
+  file order, not in byte order of a uint32 of any particular
  architecture.

-Note that the padding **MUST** be added to the last chunk of the file.
+_list of chunks_

-A **_list of chunks_** is a concatenation of multiple chunks. We will
-refer to the first chunk as having _position_ 0, the second as position
-1, etc. By _chunk with index 0 among "ABCD"_ we mean the first chunk
-among the chunks of type "ABCD" in the list, the _chunk with index 1
-among "ABCD"_ is the second such chunk, etc.
+: A concatenation of multiple chunks.

-A WebP file **MUST** begin with a single chunk with a tag "RIFF". All
-other defined chunks are contained within this chunk. The file **SHOULD
-NOT** contain anything after it.
+: We will refer to the first chunk as having _position_ 0, the second
+  as position 1, etc. By _chunk with index 0 among "ABCD"_ we mean
+  the first chunk among the chunks of type "ABCD" in the list, the
+  _chunk with index 1 among "ABCD"_ is the second such chunk, etc.
+
+A WebP file MUST begin with a single chunk with a tag 'RIFF'. All
+other defined chunks are contained within this chunk. The file SHOULD
+NOT contain anything after it.

 The maximum size of RIFF's _ckSize_ is 2^32 minus 10 bytes. The size
 of the whole file is at most 4GiB minus 2 bytes.
@ -118,115 +149,115 @@ of the whole file is at most 4GiB minus 2 bytes.
 larger than 1GiB or 2GiB. If you are using an existing library, check
 that it handles large files correctly.

-The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the
-file) **MUST** be the ASCII string "WEBP". They are followed by a list
-of chunks. Note that as the size of any chunk is even, the size of the
-RIFF chunk is also even.
-
-The contents of the chunks in that list will be described in the
-following sections.
+The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the file)
+MUST be the ASCII string "WEBP". They are followed by a list of chunks. As the
+size of any chunk is even, the size of the RIFF chunk is also even.  The
+contents of the chunks in that list will be described in the following sections.

 **Note:** RIFF has a convention that all-uppercase chunks are standard
 chunks that apply to any RIFF file format, while chunks specific to a
-file format are all-lowercase. WebP doesn't follow this convention.
+file format are all lowercase. WebP does not follow this convention.


-Single-image WebP Files
-----------------------
+WebP file header
+----------------

-First, we will describe a subset of WebP files:  files containing only
-one image. Later, we will define multi-image files, which contain
-several images.
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |      'R'      |      'I'      |      'F'      |      'F'      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                           File Size                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |      'W'      |      'E'      |      'B'      |      'P'      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

+'RIFF': 32 bits

-### Chunks Layout
+: The ASCII characters 'R' 'I' 'F' 'F'.

-This section describes which chunks may appear in a single-image WebP
-file, and their order. The contents of these chunks will be described
-in subsequent sections.
+File Size: 32 bits (_uint32_)

-The first chunk inside the RIFF chunk **MUST** have a tag of "VP8 "
-(note that the fourth character is a space, and is significant) or
-"VP8X". Other tags for the first chunk **MAY** be introduced by future
-specifications if new encodings are added. This tag of the first chunk
-determines which of the two possible layouts is used.
+: The size of the file in bytes starting at offset 8.

-**Rationale:**  We fix the possible tags of the first chunk so that it
-is possible to introduce other codecs, to keep the "WEBP" signature at
-the beginning of the RIFF chunk while still being able to check the
-codec used by the image by inspecting the byte stream at a fixed
-position.
+'WEBP': 32 bits

-The two possible layouts will be called _images without special layout_
-and _images with special layout_.
+: The ASCII characters 'W' 'E' 'B' 'P'.

+Simple file format
+------------------
+Simple WebP file header:

-#### Images Without Special Layout
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                    WebP file header (12 bytes)                |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('VP8 ')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                           VP8 data                            |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

-If the first subchunk of RIFF has the tag "VP8 ", the file contains an
-_image without special layout_.
+VP8 data: _Chunk Size_ bytes

-This layout **SHOULD** be used if the image doesn't require advanced
+: VP8 bitstream data.
+
+The content of a 'VP8 ' chunk (note the last character is a space) MUST be one
+VP8 key frame (with optional padding).
+
+The current [VP8 Data Format and Decoding Guide][vp8spec] can be found
+at the IETF website, <http://www.ietf.org/>.
+
+The VP8 specification describes how to decode the image into Y'CbCr
+format. To convert to RGB, Rec. 601 SHOULD be used.
+
+This layout SHOULD be used if the image does not require advanced
 features: color profiles, XMP metadata, animation or tiling. Files with
 this layout are smaller and supported by older software.

-Such images consist of:
+Extended file format
+--------------------

-  * A "VP8 " chunk with the bitstream of the single tile.
+**Note:** Older readers may not support files using the extended format.

-**Example:** An example layout of such a file is as follows:
+An extended format file consists of:

-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-RIFF/WEBP
-+- VP8 (bitstream of the single tile of the image)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+  * A 'VP8X' chunk with information about features used in the file.

+  * An optional 'ICCP' chunk with color profile.

-#### Images With Special Layout
-
-If the first subchunk of RIFF has the tag "VP8X", the file contains an
-_image with special layout_.
-
-**Note:**  Older readers may not support images with special layout.
-
-Such an image consists of:
-
-  * A "VP8X" chunk with information about features used in the file.
-
-  * An optional "ICCP" chunk with color profile.
-
-  * An optional "LOOP" chunk with animation control data.
+  * An optional 'LOOP' chunk with animation control data.

  * Data for all the frames.

-  * An optional "META" chunk with XMP metadata.
+  * An optional 'META' chunk with XMP metadata.

  * Some other chunk types may be defined by future specifications and
    placed anywhere in the file.

-As will be described in the "VP8X" chunk description, by checking a
+As will be described in the 'VP8X' chunk description, by checking a
 flag one can distinguish animated and non-animated images. A
 non-animated image has exactly one frame. An animated one may have
 multiple frames. Data for each frame consists of:

-  * An optional "FRM " (fourth character is a significant space) chunk
-    with animation frame metadata. It **MUST** be present in animated
-    images at the beginning of data for that frame. It **MUST NOT** be
+  * An optional 'FRM ' (fourth character is a significant space) chunk
+    with animation frame metadata. It MUST be present in animated
+    images at the beginning of data for that frame. It MUST NOT be
    present in non-animated images.

-  * An optional "TILE" chunk with tile position metadata. It **MUST** be
+  * An optional 'TILE' chunk with tile position metadata. It MUST be
    present at the beginning of data for an image that's represented as
    multiple tile images.

-  * An optional "ALPH" chunk with alpha bitstream of the tile. It **MUST** be
-    present for an image containing transparency. It **MUST NOT** be present
+  * An optional 'ALPH' chunk with alpha bitstream of the tile. It MUST be
+    present for an image containing transparency. It MUST NOT be present
    in non-transparent images.

-  * A "VP8 " chunk with the bitstream of the tile.
+  * A 'VP8 ' chunk with the bitstream of the tile.

-All chunks **MUST** be placed in the same order as listed above (except
-for unknown chunks, which **MAY** appear anywhere). If a chunk appears
-in the wrong place, the file is invalid, but readers **MAY** parse the
+All chunks SHOULD be placed in the same order as listed above (except
+for unknown chunks, which MAY appear anywhere). If a chunk appears
+in the wrong place, the file is invalid, but readers MAY parse the
 file, ignoring the chunks that come too late.

 **Rationale:** Setting the order of chunks should allow quicker file
@ -235,49 +266,303 @@ position, a decoder can choose to stop searching for it.  The rule of
 ignoring late chunks should make programs that need to do a full search
 give the same results as the ones stopping early.

-**Example:** An example layout of a non-animated, tiled image without
-transparency may look as follows:
+Extended WebP file header:

-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-RIFF/WEBP
-+- VP8X (descriptions of features used)
-+- ICCP (color profile)
-+- TILE (First tile parameters)
-+- VP8 (bitstream - first tile)
-+- TILE (Second tile parameters)
-+- VP8 (bitstream - second tile)
-+- TILE (third tile parameters)
-+- VP8 (bitstream - third tile)
-+- TILE (fourth tile parameters)
-+- VP8 (bitstream - fourth tile)
-+- META (XMP metadata)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                   WebP file header (12 bytes)                 |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('VP8X')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  Rsrv |M|I|A|T|                   Reserved                    |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          Canvas Width                         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          Canvas Height                        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

-**Example:** An example layout of an animated image with transparency may look
-as follows:
+Tiling (T): 1 bit

-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-RIFF/WEBP
-+- VP8X (descriptions of features used)
-+- LOOP (animation control parameters)
-+- FRM (first animation frame parameters)
-+- ALPH (alpha bitstream - first image frame)
-+- VP8 (bitstream - first image frame)
-+- FRM (second animation frame parameters)
-+- ALPH (alpha bitstream - second image frame)
-+- VP8 (bitstream - second image frame)
-+- META (XMP metadata)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+: Set if the image is represented by tiles.

+Animation (A): 1 bit
+
+: Set if the file is an animation. Data in 'LOOP' and 'FRM ' chunks
+  should be used to control the animation.
+
+ICC profile (I): 1 bit
+
+: Set if the file contains an 'ICCP' chunk.
+
+Metadata (M): 1 bit
+
+: Set if the file contains a 'META' chunk.
+
+Reserved (Rsrv): 4 bits
+
+: SHOULD be `0`.
+
+Reserved: 16 bits
+
+: SHOULD be `0`.
+
+Canvas Width: 32 bits
+
+: Width of the canvas in pixels.
+
+Canvas Height: 32 bits
+
+: Height of the canvas in pixels.
+
+Future specifications MAY add more fields. If a chunk of larger size is found,
+programs MUST ignore the extra bytes but SHOULD preserve them when modifying
+the file.
+
+### Chunks
+
+#### Animation
+
+Loop Chunk:
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('LOOP')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |          Loop Count           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Loop Count: 16 bits (_uint16_)
+
+: The number of times to loop the animation. `0` means infinitely.
+
+For images that are animations, this chunk contains the global
+parameters of the animation.
+
+This chunk MUST appear if the _Animation_ flag in chunk VP8X is set.
+If the _Animation_ flag is not set and this chunk is present, it
+SHOULD be ignored.
+
+Per-frame parameters of the animation:
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('FRM ')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                            Frame X                            |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                            Frame Y                            |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          Frame Width                          |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                          Frame Height                         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |         Frame Duration        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Frame X: 32 bits (_uint32_)
+
+: The X coordinate of the upper left corner of the frame.
+
+Frame Y: 32 bits (_uint32_)
+
+: The Y coordinate of the upper left corner of the frame.
+
+Frame Width: 32 bits (_uint32_)
+
+: The width of the frame.
+
+Frame Height: 32 bits (_uint32_)
+
+: The height of the frame.
+
+Frame Duration: 16 bits (_uint16_)
+
+: Time to wait before displaying the next tile, in 1 millisecond units.
+
+Notes for frames containing VP8 data:
+
+  * _Frame X_ and _Frame Y_ values MUST be divisible by `32`.
+
+    **Rationale:** This ensures that pixels on U and V planes are aligned to a
+    16-byte boundary (even after a rotation), which may help with vector
+    instructions on some architectures. This also makes the tiles align to
+    16-pixel macroblock boundaries.
+
+  * _Frame Width_ MUST be divisible by `16` or
+    `Frame X + Frame Width == Canvas Width` MUST be true.
+
+  * _Frame Height_ MUST be divisible by `16` or
+    `Frame Y + Frame Height == Canvas Height` MUST be true.
+
+    **Rationale:** The width and height constraints simplify the handling of
+    macroblocks that are on the edge of a tile. VP8 decoders can overwrite
+    pixels outside the boundary in such a macroblock, and this guarantees they
+    won't overwrite any data.
+
+#### Tiling
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('TILE')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         Tile Canvas X                         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                         Tile Canvas Y                         |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                           Tile Data                           |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Tile Canvas X: 32 bits (_uint32_)
+
+: X coordinate of the upper left corner of the tile.
+
+Tile Canvas Y: 32 bits (_uint32_)
+
+: Y coordinate of the upper left corner of the tile.
+
+Tile Data: _Chunk Size_ - `8` bytes
+
+: VP8 data.
+
+This chunk contains information about a single tile and describes the
+bitstream chunk that follows it.
+
+Notes for tiles containing VP8 data:
+
+  * _Tile Canvas X_ and _Tile Canvas Y_ values MUST be
+    divisible by `32`.
+
+  * The _Tile Width_ and _Tile Height_ can be extracted from the VP8 data.
+    See 'Section 9' in the [VP8 RFC][vp8spec].
+
+  * The width of a tile MUST be divisible by `16` or
+    `Tile Canvas X + Tile Width == Canvas Width` MUST be true.
+
+  * The height of a tile MUST be divisible by `16` or
+    `Tile Canvas Y + Tile Height == Canvas Height` MUST be true.
+
+
+#### Alpha
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('ALPH')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |   F   |   C   |    Reserved   |        Alpha Bitstream        |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Filtering method (F): 4 bits
+
+: The filtering method used:
+
+  * `0`: None.
+  * `1`: Horizontal filter.
+  * `2`: Vertical filter.
+  * `3`: Gradient filter.
+
+Compression method (C): 4 bits
+
+: The compression method used:
+
+  * `0`: No compression.
+  * `1`: Backward reference counts encoded with arithmetic encoder.
+
+Reserved: 8 bits
+
+: SHOULD be `0`.
+
+Alpha bitstream: _Chunk Size_ - `2` bytes
+
+: Encoded alpha bitstream.
+
+This optional chunk contains encoded alpha data for a single tile.
+Either **ALL or NONE** of the tiles must contain this chunk.
+
+The alpha channel can be encoded either losslessly or with lossy
+preprocessing (quantization). After the optional preprocessing, the
+alpha values are encoded with a lossless compression method like
+zlib. Work is in progress to improve the compression gain further by
+exploring alternate compression methods and hence, the bitstream for
+the Alpha-chunk is still experimental and expected to change.
+
+#### Color profile
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('ICCP')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  Compression  |                Color Profile                  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Compression: 8 bits
+
+: Compression method used:
+
+  * `0`: None.
+  * `1`: Deflate/inflate.
+
+Color Profile: _Chunk Size_ - `1` bytes
+
+: ICC profile.
+
+There SHOULD be at most one 'ICCP' chunk.
+See <http://www.color.org> for specifications.
+
+If this chunk is not present, sRGB SHOULD be assumed.
+
+#### Metadata
+
+     0                   1                   2                   3
+     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |                      ChunkHeader('META')                      |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+    |  Compression  |                 XMP Metadata                  |
+    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+Compression: 8 bits
+
+: Compression method used:
+
+  * `0`: None.
+  * `1`: Deflate/inflate.
+
+XMP Metadata: _Chunk Size_ - `1` bytes
+
+: XMP metadata.
+
+There SHOULD be at most one such chunk. If there are more such chunks, readers
+MAY ignore all except the first one.
+
+XMP packets are XML text as specified in the [XMP Specification Part
+1][xmpspec]. The chunk tag is different from the one specified by Adobe
+for WAV and AVI (also RIFF formats), because we have the option of
+compression.
+
+Additional guidance about handling metadata can be found in the
+Metadata Working Group's [Guidelines for Handling Metadata][metadata].
+Note that the sections of the document about reconciliation of EXIF,
+XMP and IPTC-IIM don't apply to WebP. As WebP supports only XMP, no
+reconciliation is necessary.
+
+#### Other Chunks
+
+A file MAY contain other chunks. Readers SHOULD be ignore these chunks. Writers
+SHOULD preserve them in their original order.

 ### Assembling the Canvas from Tiles and Animation

-Contents of the chunks will be described in subsequent sections. Here we
-provide an overview of how they are used to assemble the canvas. The
-notation _VP8X.canvasWidth_ means the field in the "VP8X"
-described as _canvasWidth_.
+Here we provide an overview of how 'TILE' chunks and 'FRM '/'LOOP' chunks are
+used to assemble the canvas. The notation _VP8X.field_ means the field in
+the 'VP8X' chunk with the same description.

-Decoding a non-animated canvas **MUST** be equivalent to the following
+Decoding a non-animated canvas MUST be equivalent to the following
 pseudocode:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -301,7 +586,7 @@ for chunk in data_for_all_frames:
 canvas contains the decoded canvas.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Decoding an animated canvas **MUST** be equivalent to the following
+Decoding an animated canvas MUST be equivalent to the following
 pseudocode:

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -338,253 +623,45 @@ for LOOP.loop = 0, ..., LOOP.loopCount-1
 canvas contains the decoded canvas.
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-As described earlier, if an assert related to chunk ordering fails, the
-reader **MAY** ignore the badly-ordered chunks instead of failing to
-decode the file.
-
-
-### Bitstream Chunks (VP8)
-
-These chunks contain compressed image data. Currently, the only allowed
-bitstream is VP8, using "VP8 " (note the significant fourth-character
-space) as its tag. We will refer to all chunks with this tag as
-**_bitstream chunks_**. As described earlier, images without special
-layout have a single bitstream chunk as the first subchunk of RIFF,
-while images with special layout may contain several of them, one for
-each tile.
-
-The content of a "VP8 " chunk **MUST** be one VP8 key frame (with
-optional padding.  See below).
-
-The current [VP8 Data Format and Decoding Guide][vp8spec] can be found
-at the IETF website, <http://www.ietf.org/>. Note that the VP8 frame
-header contains the VP8 frame width and height. That is assumed to be
-the width and height of the tile.
-
-The VP8 specification describes how to decode the image into Y'CbCr
-format. To convert to RGB, Rec. 601 **SHOULD** be used.
-
-For compatibility with older readers, if the size of the frame is odd,
-writers **SHOULD** append a padding byte (preferably `0`) inside the
-chunk contents, making the chunk's _ckSize_ even. Newer readers
-**MUST** support odd-sized bitstream chunks.
-
-
-### VP8X Chunk (Special Layout)
-
-As described earlier, a chunk with tag "VP8X", is the first chunk of
-images with special layout. It is used to enable advanced features of
-WebP.
-
-The content of the chunk is as follows:
-
-  * **uint32** flags. The following bits are currently used (with `0`
-    being the least significant bit):
-
-    * bit 0: _hasTile_:  Set if the image is represented by Tiles.
-
-    * bit 1: _hasAnimation_:  Set if the file is an animation. Data in
-      "LOOP" and "FRM " chunks should be used to control the animation.
-
-    * bit 2: _hasIccp_:  Set if the file contains an "ICCP" chunk with a
-      color profile. If a file contains an "ICCP" chunk but this bit is
-      not set, the error is flagged while constructing the
-      Mux-Container.
-
-    * bit 3: _hasMetadata_:  Set if the file contains a "META" chunk
-      with a XMP metadata. If a file contains an "META" chunk but this
-      bit is not set, the error is flagged while constructing the
-      Mux-Container.
-
-Future specifications **MAY** define other bits in flags. Bits not
-defined by this specification **MUST** be preserved when modifying the
-file.
-
-  * **uint32** _canvasWidth_:  Width of the canvas in pixels (after the
-    optional rotation or symmetry; see below).
-
-  * **uint32** _canvasHeight_:  Height of the canvas in pixels (after
-    the optional rotation or symmetry; see below).
-
-Future specifications **MAY** add more fields. If a chunk of larger size
-is found, programs **MUST** ignore the extra bytes but **MUST** preserve
-them when modifying the file.
-
-
-### LOOP Chunk (Global Animation Parameters)
-
-For images that are animations, this chunk contains the global
-parameters of the animation.
-
-This chunk **MUST** appear if the _hasAnimation_ flag in chunk VP8X is
-set. If the _hasAnimation_ flag is not set and this chunk is present,
-it **MUST** be ignored.
-
-The content of the chunk is as follows:
-
-  * **uint16** _loopCount_:  For animations, the number of times to loop
-    the animation. `0` means infinitely.
-
-Future specifications **MAY** add more fields. If a chunk of larger
-size is found, programs **MUST** ignore the extra bytes but **MUST**
-preserve them when modifying the file.
-
-
-### FRM Chunk (Per-frame Animation Parameters)
-
-For images that are animations, these chunks contain the per-frame
-parameters of the animation.
-
-The content of the chunk is as follows:
-
-  * **uint32** _frameX_:  X coordinate of the upper left corner of the
-    frame. For images using the VP8 codec, this value **MUST** be
-    divisible by `32`. Other codecs **MAY** specify other constraints.
-    Described in more detail later.
-
-  * **uint32** _frameY_:  Y coordinate of the upper left corner of the
-    frame. For images using the VP8 codec, this value **MUST** be
-    divisible by `32`. Other codecs **MAY** specify other constraints.
-    Described in more detail later.
-
-  * **uint32** _frameWidth_:  Width of the frame. For images using the
-    VP8 codec, this value **MUST** be divisible by `16`, or be such that
-    _frameX + frameWidth == canvasWidth_. Other codecs **MAY** specify
-    other constraints. Described in more detail later.
-
-  * **uint32** _frameHeight_:  Height. For images using the VP8 codec,
-    this value **MUST** be divisible by `16`, or be such that _frameY +
-    frameHeight == canvasHeight_. Other codecs **MAY** specify other
-    constraints. Described in more detail later.
-
-  * **uint16** _frameDuration_:  Time to wait before displaying the next
-    tile, in 1ms units.
-
-**Rationale:**  The requirement for corner coordinates to be divisible
-by `32` means that pixels on U and V planes are aligned to a 16-byte
-boundary (even after a rotation), which may help with vector
-instructions on some architectures. This makes the tiles also align to
-16-pixel macroblock boundaries.
-
-**Rationale:**  The requirement for the width and height to be
-divisible by `16` or touching the edge of the canvas simplifies the
-handling of macroblocks that are on the edge of a tile.  VP8 decoders
-can overwrite pixels outside the boundary in such a macroblock, and this
-guarantees they won't overwrite any data.
-
-Future specifications **MAY** add more fields. If a chunk of larger
-size is found, programs **MUST** ignore the extra bytes but **MUST**
-preserve them when modifying the file.
-
-
-### TILE Chunks (Tile Parameters)
-
-This chunk contains information about a single tile and describes the
-bitstream chunk that follows it.
-
-The contents of such a chunk are as follows:
-
-  * **uint32** _tileCanvasX_:  X coordinate of the upper left corner of
-    the tile. For VP8 tiles, this value **MUST** be divisible by `32`.
-    Other codecs **MAY** specify other constraints.
-
-  * **uint32** _tileCanvasY_:  Y coordinate of the upper left corner of
-    the tile. For VP8 tiles, this value **MUST** be divisible by `32`.
-    Other codecs **MAY** specify other constraints.
-
-Future specifications **MAY** add more fields. If a chunk of larger size
-is found, programs **MUST** ignore the extra bytes but **MUST** preserve
-them when modifying the file.
-
-As described earlier, the TILE chunk is followed by VP8 data. From that
-chunk we can read the height and width of the tile.  These we denote as
-_tileWidth_ and _tileHeight_. In the case of VP8, we have the following
-constraints:
-
-  * The width of a tile **MUST** be divisible by `16`, or _tileCanvasX +
-    tileWidth == canvasWidth_ **MUST** be true.
-
-  * The height of a tile **MUST** be divisible by `16`, or
-    _tileCanvasY + tileHeight == canvasHeight_ **MUST** be true.
-
-
-### ALPH Chunks (Alpha Bitstreams)
-
-This optional chunk contains encoded alpha data for a single tile. Either
-**ALL or NONE** of the tiles must contain this chunk.
-
-The alpha channel can be encoded either losslessly or with lossy preprocessing
-(quantization). After the optional preprocessing, the alpha values are encoded
-with a lossless compression method like zlib. Work is in progress to improve the
-compression gain further by exploring alternate compression methods and hence,
-the bit-stream for the Alpha-chunk is still experimental and expected to change.
-
-The contents of such a chunk are as follows:
-
-  * Byte 0 lower nibble: The _compression method_ used. Currently two methods
-    are supported:
-
-    * 0 --> No compression
-
-    * 1 --> Backward reference counts encoded with arithmetic encoder.
-
-  * Byte 0 upper nibble: The _filtering method_ used. Currently the following
-    methods are supported:
-
-    * 0 --> No filter
-
-    * 1 --> Horizontal filter
-
-    * 2 --> Vertical filter
-
-    * 3 --> Gradient filter
-
-  * Byte 1:  _Reserved_. **Should** be 0.
-
-  * Byte 2 onwards:  _Encoded alpha bitstream_.
-
-
-### ICCP Chunk (Color Profile)
-
-An optional "ICCP" chunk contains an ICC profile. There **SHOULD** be
-at most one such chunk. The first byte of the chunk is the compression
-type. Two values are currently defined:  a value of `0` means no
-compression, while a value of `1` means deflate/inflate compression. It
-is followed by a compressed or non-compressed ICC profile.  See
-<http://www.color.org> for specifications.
-
-The color profile can be a v2 or v4 profile. If this chunk is missing,
-sRGB **SHOULD** be assumed.
-
-
-### META Chunk (Compressed XMP Metadata)
-
-Such a chunk (if present) contains XMP metadata. There **SHOULD** be at
-most one such chunk. If there are more such chunks, readers **SHOULD**
-ignore all except the first one. The first byte specifies compression
-type. Two values are currently defined:  a value of `0` means no
-compression, while a value of `1` means deflate/inflate compression. It
-is followed by a compressed or non-compressed XMP metadata packet.
-
-XMP packets are XML text as specified in the [XMP Specification Part
-1][xmpspec]. The chunk tag is different from the one specified by Adobe
-for WAV and AVI (also RIFF formats), because we have the option of
-compression.
-
-Additional guidance about handling metadata can be found in the
-Metadata Working Group's [Guidelines for Handling Metadata][metadata].
-Note that the sections of the document about reconciliation of EXIF,
-XMP and IPTC-IIM don't apply to WebP.  As WebP supports only XMP, no
-reconciliation is necessary.
-
-
-### Other Chunks
-
-A file **MAY** contain other chunks, defined in some future
-specification. Such chunks **MUST** be ignored, but preserved. Writers
-**SHOULD** try to preserve them in their original order.
-
+As described earlier, if an assert related to chunk ordering fails, the reader
+MAY ignore the badly-ordered chunks instead of failing to decode the file.
+
+Example file layouts
+--------------------
+
+A non-animated, tiled image without transparency may look as follows:
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+RIFF/WEBP
+- VP8X (descriptions of features used)
+- ICCP (color profile)
+- TILE (First tile parameters)
+- VP8 (bitstream - first tile)
+- TILE (Second tile parameters)
+- VP8 (bitstream - second tile)
+- TILE (third tile parameters)
+- VP8 (bitstream - third tile)
+- TILE (fourth tile parameters)
+- VP8 (bitstream - fourth tile)
+- META (XMP metadata)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+An animated image with transparency may look as follows:
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+RIFF/WEBP
+- VP8X (descriptions of features used)
+- LOOP (animation control parameters)
+- FRM (first animation frame parameters)
+- ALPH (alpha bitstream - first image frame)
+- VP8 (bitstream - first image frame)
+- FRM (second animation frame parameters)
+- ALPH (alpha bitstream - second image frame)
+- VP8 (bitstream - second image frame)
+- META (XMP metadata)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 [vp8spec]:  http://tools.ietf.org/html/rfc6386
 [xmpspec]:  http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/XMPSpecificationPart1.pdf
 [metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
+[rfc 2119]: http://tools.ietf.org/html/rfc2119