libwebp/doc/webp-container-spec.txt

<!--

Although you may be viewing an alternate representation, this document
is sourced in Markdown, a light-duty markup scheme, and is optimized for
the [kramdown](http://kramdown.rubyforge.org/) transformer.

See the accompanying README. External link targets are referenced at the
end of this file.

-->


WebP Container Specification
============================

_Working Draft, v0.4, 20120613_


* TOC placeholder
{:toc}


Introduction
------------

WebP is an image format that uses either (i) the VP8 key frame encoding
to compress image data in a lossy way, or (ii) the WebP lossless encoding
(and possibly other encodings in the future). These encoding schemes should
make it more efficient than currently used formats. It is optimized for fast
image transfer over the network (e.g., for websites). However, it also aims
for feature parity (color profile, metadata, animation, etc.) with other
formats. This document describes the structure of a WebP file.

The WebP container (i.e., RIFF container for WebP) allows feature support over
and above the basic use case of WebP (i.e., a file containing a single image
encoded as a VP8 key frame). The WebP container provides additional support
for:

  * **Lossless compression.** An image can be losslessly compressed, using the
    WebP Lossless Format.

  * **Transparency.** An image may have transparency, i.e., an alpha channel
    for each frame/tile.

  * **Metadata.** An image can have metadata stored in any of the popular
    metadata formats.

  * **Color profiles.** An image can have an ICC profile characterizing a color
    input or output device.

  * **Animation.** An image may have pauses between frames, making it
    an animation.

  * **Tiling.** A single VP8 frame has an inherent limitation for width
    or height of 2^14 pixels, and a 512 KiB limit on the size of the first
    compressed partition. To support larger images, we support images
    that are composed of multiple tiles, each encoded as a separate VP8
    frame. All tiles form logically a single image:  they have common
    metadata, color profile, etc. Tiling may also improve efficiency for
    larger images, e.g., grass can be encoded differently than sky.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119][].


Terminology &amp; Basics
------------------------

A WebP file contains either a still image (i.e., an encoded matrix of
pixels) or an animation (see below), with possibly a color profile,
metadata, etc. In case we need to refer only to the matrix of pixels,
we will call it the _canvas_ of the image.

The canvas of an image is built from one or multiple tiles. Each tile
is a separately encoded VP8 key frame or a WebP lossless bitstream. Building
an image from several tiles allows the format to overcome the size limitations
of a single VP8 frame / WebP lossless bitstream. Tiles are an internal detail
of the file:  they are not supposed to be exposed to the user.

Below are additional terms used throughout this document:

Code that reads WebP files is referred to as a _reader_, while
code that writes them is referred to as a _writer_.

_uint16_

: A 16-bit, little-endian, unsigned integer.

_uint24_

: A 24-bit, little-endian, unsigned integer.

_uint32_

: A 32-bit, little-endian, unsigned integer.

_1-based_
: An unsigned integer field storing values offset by `-1`. e.g., Such a field
would store value _25_ as _24_.

The basic element of a RIFF file is a _chunk_. It consists of:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Chunk FourCC                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Chunk Size                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Chunk Payload                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Chunk FourCC: 32 bits

: ASCII four character code or _chunk tag_ used for chunk identification.

Chunk Size: 32 bits (_uint32_)

: The size of the chunk (_ckSize_) not including this field, the chunk
  identifier and padding.

Chunk Payload: _Chunk Size_ bytes

: The data payload. If _Chunk Size_ is odd a single padding byte that
  SHOULD be `0` is added.

_ChunkHeader('ABCD')_

: This is used to describe the fourcc and size header of individual
  chunks, where 'ABCD' is the fourcc for the chunk. This element's
  size is 8 bytes.

: Note that, in this specification, all chunk tag characters are in
  file order, not in byte order of a uint32 of any particular
  architecture.

_list of chunks_

: A concatenation of multiple chunks.

: We will refer to the first chunk as having _position_ 0, the second
  as position 1, etc. By _chunk with index 0 among "ABCD"_ we mean
  the first chunk among the chunks of type "ABCD" in the list, the
  _chunk with index 1 among "ABCD"_ is the second such chunk, etc.

A WebP file MUST begin with a single chunk with a tag 'RIFF'. All
other defined chunks are contained within this chunk. The file SHOULD
NOT contain anything after it.

The maximum size of RIFF's _ckSize_ is 2^32 minus 10 bytes. The size
of the whole file is at most 4GiB minus 2 bytes.

**Note:** some RIFF libraries are said to have bugs when handling files
larger than 1GiB or 2GiB. If you are using an existing library, check
that it handles large files correctly.

The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the file)
MUST be the ASCII string "WEBP". They are followed by a list of chunks. As the
size of any chunk is even, the size of the RIFF chunk is also even.  The
contents of the chunks in that list will be described in the following sections.

**Note:** RIFF has a convention that all-uppercase chunks are standard
chunks that apply to any RIFF file format, while chunks specific to a
file format are all lowercase. WebP does not follow this convention.


WebP file header
----------------

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      'R'      |      'I'      |      'F'      |      'F'      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           File Size                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      'W'      |      'E'      |      'B'      |      'P'      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

'RIFF': 32 bits

: The ASCII characters 'R' 'I' 'F' 'F'.

File Size: 32 bits (_uint32_)

: The size of the file in bytes starting at offset 8.

'WEBP': 32 bits

: The ASCII characters 'W' 'E' 'B' 'P'.

Simple file format (lossy)
--------------------------

This layout SHOULD be used if the image requires _lossy_ encoding and does not
require transparency or other advanced features provided by the extended format.
Files with this layout are smaller and supported by older software.

Simple WebP (lossy) file header:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    WebP file header (12 bytes)                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          VP8 chunk                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8 chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8 ')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           VP8 data                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8 data: _Chunk Size_ bytes

: VP8 bitstream data.

The VP8 bitstream format specification can be found at [VP8 Data Format and
Decoding Guide][vp8spec]. Note that the VP8 frame header contains the VP8 frame
width and height. That is assumed to be the width and height of the canvas.

The VP8 specification describes how to decode the image into Y'CbCr
format. To convert to RGB, Rec. 601 SHOULD be used.

Simple file format (lossless)
-----------------------------

**Note:** Older readers may not support files using the lossless format.

This layout SHOULD be used if the image requires _lossless_ encoding (with an
optional transparency channel) and does not require advanced features provided
by the extended format.

Simple WebP (lossless) file header:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    WebP file header (12 bytes)                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          VP8L chunk                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8L chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8L')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           VP8L data                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8L data: _Chunk Size_ bytes

: VP8L bitstream data.

The current specification of the VP8L bitstream can be found at
[WebP Lossless Bitstream Format][webpllspec]. Note that the VP8L header
contains the VP8L image width and height. That is assumed to be the width
and height of the canvas.

Extended file format
--------------------

**Note:** Older readers may not support files using the extended format.

An extended format file consists of:

  * A 'VP8X' chunk with information about features used in the file.

  * An optional 'ICCP' chunk with color profile.

  * Optionally, some other unknown chunk types that may be defined by future
    specifications.

  * An optional 'LOOP' chunk with animation control data.

  * Data for all the frames.

  * An optional 'META' chunk with metadata.

A file MUST contain at least one frame. As will be described in the 'VP8X'
chunk description, by checking a flag one can distinguish animated and
non-animated images. A non-animated image has exactly one frame. An animated
one may have multiple frames. Data for each frame consists of:

  * An optional 'FRM ' (fourth character is a significant space) chunk
    with animation frame metadata. It MUST be present in animated
    images at the beginning of data for that frame. It MUST NOT be
    present in non-animated images.

  * An optional 'TILE' chunk with tile position metadata. It MUST be
    present at the beginning of each tile for a frame that is represented as
    multiple tile images.

  * An optional 'ALPH' chunk with alpha bitstream of the frame/tile.

  * A 'VP8 ' or a 'VP8L' chunk containing compressed image data of the
    frame/tile.

  * An optional unknown chunk type that may be defined by future
    specifications.

All chunks SHOULD be placed in the same order as listed above. If a chunk
appears in the wrong place, the file is invalid, but readers MAY parse the
file, ignoring the chunks that come too late.

**Rationale:** Setting the order of chunks should allow quicker file
parsing. For example, if an ICCP chunk does not appear in its required
position, a decoder can choose to stop searching for it.  The rule of
ignoring late chunks should make programs that need to do a full search
give the same results as the ones stopping early.

Extended WebP file header:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                   WebP file header (12 bytes)                 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8X')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  R  |L|M|I|A|T|                   Reserved                    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Canvas Width Minus One               |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...  Canvas Height Minus One    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Tiling (T): 1 bit

: Set if the image is represented by tiles.

Animation (A): 1 bit

: Set if the file is an animation. Data in 'LOOP' and 'FRM ' chunks
  should be used to control the animation.

ICC profile (I): 1 bit

: Set if the file contains an 'ICCP' chunk.

Metadata (M): 1 bit

: Set if the file contains a 'META' chunk.

Alpha (L): 1 bit

: Set if the file contains some (or all) images with transparency information
("alpha").

Rotation and Symmetry (R): 3 bits

: Specify an isometry to be applied to every bitstream chunk decoded.

The table below specifies into what coordinates a point (x,y) in the original
coordinate system has to be transformed into:

| Value |         Name               |     New coordinates                    |
|:------|:--------------------------:|:--------------------------------------: |
|   0   | Identify                   |     (x,y)                               |
|-------
|   1   | Horizontal symmetry        |     (x, CanvasHeight-1-y)               |
|-------
|   2   | Vertical symmetry          |     (CanvasWidth-1-x, y)                |
|-------
|   3   | Rotation 180 degrees       |     (CanvasWidth-1-x, CanvasHeight-1-y) |
|-------
|   4   | Diagonal symmetry 1        |     (y, x)                              |
|-------
|   5   | Rotation clockwise         |     (CanvasHeight-1-y, x)               |
|-------
|   6   | Rotation counter-clockwise |     (y, CanvasWidth-1-x)                |
|-------
|   7   | Diagonal symmetry 2        |     (CanvasHeight-1-y, CanvasWidth-1-x) |
|-------
{: rules="groups"}

Reserved: 24 bits

: SHOULD be `0`.

Canvas Width Minus One: 24 bits

: _1-based_ width of the canvas in pixels.
  The actual canvas width is '1 + Canvas Width Minus One'

Canvas Height Minus One: 24 bits

: _1-based_ height of the canvas in pixels.
  The actual canvas height is '1 + Canvas Height Minus One'

The product of _Canvas Width_ and _Canvas Height_ MUST be at most `2^32 - 1`.

Future specifications MAY add more fields.

### Chunks

#### Animation

Loop Chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('LOOP')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Loop Count           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Loop Count: 16 bits (_uint16_)

: The number of times to loop the animation. `0` means infinitely.

For images that are animations, this chunk contains the global
parameters of the animation.

This chunk MUST appear if the _Animation_ flag in chunk VP8X is set.
If the _Animation_ flag is not set and this chunk is present, it
SHOULD be ignored.

Frame chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('FRM ')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Frame X                |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...          Frame Y            |   Frame Width Minus One     ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...             |           Frame Height Minus One              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |            Frame Duration Minus One           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Frame X: 24 bits (_uint24_)

: The X coordinate of the upper left corner of the frame is `Frame X * 2`

Frame Y: 24 bits (_uint24_)

: The Y coordinate of the upper left corner of the frame is `Frame Y * 2`

Frame Width Minus One: 24 bits (_uint24_)

: The _1-based_ width of the frame.
  The frame width is '1 + Frame Width Minus One'

Frame Height Minus One: 24 bits (_uint24_)

: The _1-based_ height of the frame.
  The frame height is '1 + Frame Height Minus One'

Frame Duration Minus One: 24 bits (_uint24_)

: The _1-based_ time to wait before displaying the next frame, in 1 millisecond
  units. The actual duration is '1 + Frame Duration Minus One' milliseconds.

For images that are animations, this chunk contains information about a single
frame, and describes the optional alpha chunk and the bitstream chunk that
follow it. If the _Animation flag_ is not set and this chunk is present,
it SHOULD be ignored.


#### Tiling

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('TILE')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      Tile X                   |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...           Tile Y            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Tile X: 24 bits (_uint24_)

: The X coordinate of the upper left corner of the tile is `Tile X * 2`

Tile Y: 24 bits (_uint24_)

: The Y coordinate of the upper left corner of the tile is `Tile Y * 2`

For images that contain tiling, this chunk contains information about a single
tile and describes the optional alpha chunk and the bitstream chunk that follow
it. If the _Tile flag_ is not set and this chunk is present, it SHOULD be
ignored.


#### Alpha

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ALPH')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |Rsv| P | F | C |     Alpha Bitstream...                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Compression method (C): 2 bits

: The compression method used:

  * `0`: No compression.
  * `1`: Compressed using the WebP lossless format.

Filtering method (F): 2 bits

: The filtering method used:

  * `0`: None.
  * `1`: Horizontal filter.
  * `2`: Vertical filter.
  * `3`: Gradient filter.

For each pixel, filtering is performed using the following calculations.
Assume the alpha values surrounding the current `X` position are labeled as:

     C | B |
    ---+---+
     A | X |

We seek to compute the alpha value at position `X`. First, a prediction is
made depending on the filtering method:

  * Method `0`: predictor = 0
  * Method `1`: predictor = A
  * Method `2`: predictor = B
  * Method `3`: predictor = clip(A + B - C)

where `clip(v)` is equal to:

  * 0    if v < 0
  * 255  if v > 255
  * v    otherwise

The final value is derived by adding the decompressed value `X` to the
predictor and using modulo-256 arithmetic to wrap the [256-511] range
into the [0-255] one:

`alpha = (predictor + X) % 256`

There are special cases for left-most and top-most pixel positions:

  * Top-left value at location (0,0) uses 0 as predictor value. Otherwise,
  * For horizontal or gradient filtering methods, the left-most pixels at
    location (0, y) are predicted using the location (0, y-1) just above.
  * For vertical or gradient filtering methods, the top-most pixels at
    location (x, 0) are predicted using the location (x-1, 0) on the left.


Pre-processing (P): 2 bits

: These INFORMATIVE bits are used to signal the pre-processing that has
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.

  * `0`: no pre-processing
  * `1`: level reduction

Decoders are not required to use this information in any specified way.

Reserved (Rsv): 2 bits

: SHOULD be `0`.

Alpha bitstream: _Chunk Size_ - `1` bytes

: Encoded alpha bitstream.

This optional chunk contains encoded alpha data for a single frame/tile.
For images with transparency, some of the frames/tiles may contain this chunk.
However, a frame/tile containing a 'VP8L' chunk SHOULD NOT contain this chunk.
**Rationale**: the transparency information of a frame/tile is already part of
the 'VP8L' chunk.

The alpha channel data is stored as uncompressed raw data (when
compression method is '0') or compressed using the lossless format
(when the compression method is '1').

  * Raw data: consists of a byte sequence of length width * height,
    containing all the 8-bit transparency values in scan order.

  * Lossless format compression: the byte sequence is a compressed
    image-stream (as described in the [WebP Lossless Bitstream Format]
    [webpllspec]) of implicit dimension width x height. That is, this
    image-stream does NOT contain any headers describing the image dimension.

    **Rationale**: the dimension is already known from other sources,
    so storing it again would be redundant and error-prone.

    Once the image-stream is decoded into ARGB color values, following
    the process described in the lossless format specification, the
    transparency information must be extracted from the *green* channel
    of the ARGB quadruplet.

    **Rationale**: the green channel is allowed extra transformation
    steps in the specification -- unlike the other channels -- that can
    improve compression.

#### Bitstream (VP8/VP8L)

This chunk contains compressed image data. As described earlier, images
with a simple file format (lossy/lossless) have a single bitstream chunk
as the first subchunk of RIFF, while images with extended file format may
contain several of them, one for each frame/tile.

A bitstream chunk may be either (i) a VP8 chunk, using "VP8 " (note the
significant fourth-character space) as its tag _or_ (ii) a VP8L chunk , using
"VP8L" as its tag.

The formats of VP8 and VP8L chunks are as described in sections
[Simple file format (lossy)](#simple-file-format-lossy)
and [Simple file format (lossless)](#simple-file-format-lossless) respectively.

#### Color profile

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ICCP')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Color Profile                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Color Profile: _Chunk Size_ bytes

: ICC profile.

This chunk MUST appear before data for all the frames.

There SHOULD be at most one such chunk. If there are more such chunks, readers
MAY ignore all except the first one.
See <http://www.color.org> for specifications.

If this chunk is not present, sRGB SHOULD be assumed.

#### Metadata

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('META')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             Metadata                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Metadata: _Chunk Size_ bytes

: image metadata.

This chunk MUST appear after data for all the frames.

There SHOULD be at most one such chunk. If there are more such chunks, readers
MAY ignore all except the first one.

Additional guidance about handling metadata can be found in the
Metadata Working Group's [Guidelines for Handling Metadata][metadata].

#### Unknown Chunks

A file MAY contain other unknown chunks. Readers SHOULD be ignore these chunks.
Writers SHOULD preserve them in their original order.

### Assembling the Canvas from Tiles and Animation

Here we provide an overview of how 'TILE' chunks and 'FRM '/'LOOP' chunks are
used to assemble the canvas. The notation _VP8X.field_ means the field in
the 'VP8X' chunk with the same description.

Decoding a _non-animated_ canvas MUST be equivalent to the following
pseudocode:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert not VP8X.flags.haveAnimation
canvas ← new black image of size
         VP8X.canvasWidth x VP8X.canvasHeight.
tile_params.tileX = tile_params.tileY = 0
for chunk in data_for_all_frames:
    if chunk.tag == "TILE":
        assert No other TILE chunk after the last Bitstream chunk
        assert No ALPH chunk after the last Bitstream chunk
        tile_params = chunk
        assert VP8X.canvasWidth >=
            tile_params.tileX + tile_params.tileWidth
        assert VP8X.canvasHeight >=
            tile_params.tileY + tile_params.tileHeight
    if chunk.tag == "ALPH":
        assert No other ALPH chunk after the last Bitstream chunk
        tile_params.alpha = alpha_data
    if chunk.tag == "VP8 " OR chunk.tag == "VP8L":
        render image in chunk on canvas with top-left corner in
            (tile_params.tileX, tile_params.tileY).
    Ignore unknown chunks
canvas contains the decoded canvas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Decoding an _animated_ canvas MUST be equivalent to the following
pseudocode:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert VP8X.flags.haveAnimation
canvas ← new black image of size
         VP8X.canvasWidth x VP8X.canvasHeight.
loop_count ← LOOP.loopCount
if loop_count == 0:
    loop_count = ∞
frame_params ← nil
for LOOP.loop = 0, ..., LOOP.loopCount-1
    assert First chunk in data_for_all_frames is FRM
    for chunk in data_for_all_frames:
        if chunk.tag == "FRM ":
            assert No other FRM chunk after the last Bitstream chunk
            assert No ALPH chunk after the last Bitstream chunk
            frame_params = chunk
            assert VP8X.canvasWidth >=
                frame_params.frameX + frame_params.frameWidth
            assert VP8X.canvasHeight >=
                frame_params.frameY + frame_params.frameHeight
        if chunk.tag == "ALPH":
            assert No other ALPH chunk after the last Bitstream chunk
            frame_params.alpha = alpha_data
        if chunk.tag == "VP8 " OR chunk.tag == "VP8L":
            render image in chunk on canvas with top-left corner in
                (frame_params.frameX, frame_params.frameY). Show the contents
                of the image for frame_params.frameDuration * 1ms.
        Ignore unknown chunks
canvas contains the decoded canvas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

As described earlier, if an assert related to chunk ordering fails, the reader
MAY ignore the badly-ordered chunks instead of failing to decode the file.

Example file layouts
--------------------

A tiled image without transparency may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- ICCP (color profile)
+- TILE (First tile parameters)
+- VP8 (bitstream - first tile)
+- TILE (Second tile parameters)
+- VP8 (bitstream - second tile)
+- TILE (third tile parameters)
+- VP8 (bitstream - third tile)
+- TILE (fourth tile parameters)
+- VP8 (bitstream - fourth tile)
+- META (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An animated image with transparency may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- LOOP (animation control parameters)
+- FRM (first animation frame parameters)
+- ALPH (alpha bitstream - first frame)
+- VP8 (bitstream - first frame)
+- FRM (second animation frame parameters)
+- ALPH (alpha bitstream - second frame)
+- VP8 (bitstream - second frame)
+- META (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A losslessly encoded non-animated non-tiled image may
look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- ICCP (color profile)
+- XYZW (unknown chunk)
+- VP8L (lossless bitstream)
+- META (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An animated image may have a mix of lossy and lossless
bitstreams as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- FRM (first animation frame parameters)
+- VP8 (lossy bitstream - first frame)
+- FRM (second animation frame parameters)
+- VP8L (lossless bitstream - second frame)
+- ABCD (unknown chunk)
+- FRM (third animation frame parameters)
+- VP8 (lossy bitstream - third frame)
+- EFGH (unknown chunk)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[vp8spec]:  http://tools.ietf.org/html/rfc6386
[webpllspec]: https://gerrit.chromium.org/gerrit/gitweb?p=webm/libwebp.git;a=blob;f=doc/webp-lossless-bitstream-spec.txt;hb=master
[metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
[rfc 2119]: http://tools.ietf.org/html/rfc2119