diff --git a/doc/webp-container-spec.txt b/doc/webp-container-spec.txt index aac6743c..9be9930f 100644 --- a/doc/webp-container-spec.txt +++ b/doc/webp-container-spec.txt @@ -13,7 +13,7 @@ end of this file. WebP Container Specification ============================ -_Working Draft, v0.1, 20111004_ +_Working Draft, v0.2, 20120207_ * TOC placeholder @@ -27,13 +27,13 @@ WebP is a still image format that uses the VP8 key frame encoding, and possibly other encodings in the future, to compress image data in a lossy way. The VP8 encoding should make it more efficient than currently used formats. It is optimized for fast image transfer over the network -(e.g., for websites). However, it also aims for feature parity (like -Color Profile, XMP Metadata, Animation, etc.) with other formats. This +(e.g., for websites). However, it also aims for feature parity +(color profile, XMP metadata, animation, etc.) with other formats. This document describes the structure of a WebP file. -The first version of WebP handled only the basic use case: a file +The first version of WebP handled only the basic use case: a file containing a single image (being one VP8 key frame), with no metadata. -The use of a RIFF container permits additional feature support. This +The use of a RIFF container permits additional feature support. This document describes additional support for: * **Metadata and color profiles.** We specify chunks that can contain @@ -57,6 +57,10 @@ Files not using these new features are backward compatible with the original format. Use of these features will produce files that are not compatible with older programs. +The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", +"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this +document are to be interpreted as described in [RFC 2119][]. + Terminology & Basics ------------------------ @@ -64,7 +68,7 @@ Terminology & Basics A WebP file contains either a still image (i.e., an encoded matrix of pixels) or an animation (see below), with possibly a color profile, metadata, etc. In case we need to refer only to the matrix of pixels, -we will call it the **_canvas_** of the image. +we will call it the _canvas_ of the image. The canvas of an image is built from one or multiple tiles. Each tile is a separately encoded VP8 key frame (other encodings are possible in @@ -74,210 +78,491 @@ of the file: they are not supposed to be exposed to the user. Below are additional terms used throughout this document: -Code that reads WebP files is referred to as a **_reader_**, while -code that writes them is referred to as a **_writer_**. +Code that reads WebP files is referred to as a _reader_, while +code that writes them is referred to as a _writer_. -A 16-bit, little-endian, unsigned integer will be denoted as -**_uint16_**. +_uint16_ -A 32-bit, little-endian, unsigned integer will be denoted as -**_uint32_**. +: A 16-bit, little-endian, unsigned integer. -The basic element of a RIFF file is a **_chunk_**. It consists of: +_uint32_ - * 4 ASCII characters that will be called the **_chunk tag_**. +: A 32-bit, little-endian, unsigned integer. - * uint32 with the size of the chunk content (that will be denoted as - **_ckSize_**). +The basic element of a RIFF file is a _chunk_. It consists of: - * _ckSize_ bytes of content. + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Chunk FourCC | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Chunk Size | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Chunk Payload | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ - * If _ckSize_ is odd, a single padding byte that **SHOULD** be `0`. +Chunk FourCC: 32 bits -A chunk with a tag "ABCD" will be also called a **_chunk of type_** -"ABCD". Note that, in this specification, all chunk tag characters are -in file order, not in byte order of a uint32 of any particular -architecture. +: ASCII four character code or _chunk tag_ used for chunk identification. -Note that the padding **MUST** be added to the last chunk of the file. +Chunk Size: 32 bits (_uint32_) -A **_list of chunks_** is a concatenation of multiple chunks. We will -refer to the first chunk as having _position_ 0, the second as position -1, etc. By _chunk with index 0 among "ABCD"_ we mean the first chunk -among the chunks of type "ABCD" in the list, the _chunk with index 1 -among "ABCD"_ is the second such chunk, etc. +: The size of the chunk (_ckSize_) not including this field, the chunk + identifier and padding. -A WebP file **MUST** begin with a single chunk with a tag "RIFF". All -other defined chunks are contained within this chunk. The file **SHOULD -NOT** contain anything after it. +Chunk Payload: _Chunk Size_ bytes -The maximum size of RIFF's _ckSize_ is 2^32 minus 10 bytes. The size +: The data payload. If _Chunk Size_ is odd a single padding byte that + SHOULD be `0` is added. + +_ChunkHeader('ABCD')_ + +: This is used to describe the fourcc and size header of individual + chunks, where 'ABCD' is the fourcc for the chunk. This element's + size is 8 bytes. + +_chunk of type_ + +: A chunk with a tag "ABCD". + +: Note that, in this specification, all chunk tag characters are in + file order, not in byte order of a uint32 of any particular + architecture. + +_list of chunks_ + +: A concatenation of multiple chunks. + +: We will refer to the first chunk as having _position_ 0, the second + as position 1, etc. By _chunk with index 0 among "ABCD"_ we mean + the first chunk among the chunks of type "ABCD" in the list, the + _chunk with index 1 among "ABCD"_ is the second such chunk, etc. + +A WebP file MUST begin with a single chunk with a tag 'RIFF'. All +other defined chunks are contained within this chunk. The file SHOULD +NOT contain anything after it. + +The maximum size of RIFF's _ckSize_ is 2^32 minus 10 bytes. The size of the whole file is at most 4GiB minus 2 bytes. **Note:** some RIFF libraries are said to have bugs when handling files larger than 1GiB or 2GiB. If you are using an existing library, check that it handles large files correctly. -The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the -file) **MUST** be the ASCII string "WEBP". They are followed by a list -of chunks. Note that as the size of any chunk is even, the size of the -RIFF chunk is also even. - -The contents of the chunks in that list will be described in the -following sections. +The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the file) +MUST be the ASCII string "WEBP". They are followed by a list of chunks. As the +size of any chunk is even, the size of the RIFF chunk is also even. The +contents of the chunks in that list will be described in the following sections. **Note:** RIFF has a convention that all-uppercase chunks are standard chunks that apply to any RIFF file format, while chunks specific to a -file format are all-lowercase. WebP doesn't follow this convention. +file format are all lowercase. WebP does not follow this convention. -Single-image WebP Files ------------------------ +WebP file header +---------------- -First, we will describe a subset of WebP files: files containing only -one image. Later, we will define multi-image files, which contain -several images. + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 'R' | 'I' | 'F' | 'F' | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | File Size | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | 'W' | 'E' | 'B' | 'P' | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +'RIFF': 32 bits -### Chunks Layout +: The ASCII characters 'R' 'I' 'F' 'F'. -This section describes which chunks may appear in a single-image WebP -file, and their order. The contents of these chunks will be described -in subsequent sections. +File Size: 32 bits (_uint32_) -The first chunk inside the RIFF chunk **MUST** have a tag of "VP8 " -(note that the fourth character is a space, and is significant) or -"VP8X". Other tags for the first chunk **MAY** be introduced by future -specifications if new encodings are added. This tag of the first chunk -determines which of the two possible layouts is used. +: The size of the file in bytes starting at offset 8. -**Rationale:** We fix the possible tags of the first chunk so that it -is possible to introduce other codecs, to keep the "WEBP" signature at -the beginning of the RIFF chunk while still being able to check the -codec used by the image by inspecting the byte stream at a fixed -position. +'WEBP': 32 bits -The two possible layouts will be called _images without special layout_ -and _images with special layout_. +: The ASCII characters 'W' 'E' 'B' 'P'. +Simple file format +------------------ +Simple WebP file header: -#### Images Without Special Layout + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | WebP file header (12 bytes) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('VP8 ') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | VP8 data | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -If the first subchunk of RIFF has the tag "VP8 ", the file contains an -_image without special layout_. +VP8 data: _Chunk Size_ bytes -This layout **SHOULD** be used if the image doesn't require advanced +: VP8 bitstream data. + +The content of a 'VP8 ' chunk (note the last character is a space) MUST be one +VP8 key frame (with optional padding). + +The current [VP8 Data Format and Decoding Guide][vp8spec] can be found +at the IETF website, . + +The VP8 specification describes how to decode the image into Y'CbCr +format. To convert to RGB, Rec. 601 SHOULD be used. + +This layout SHOULD be used if the image does not require advanced features: color profiles, XMP metadata, animation or tiling. Files with this layout are smaller and supported by older software. -Such images consist of: +Extended file format +-------------------- - * A "VP8 " chunk with the bitstream of the single tile. +**Note:** Older readers may not support files using the extended format. -**Example:** An example layout of such a file is as follows: +An extended format file consists of: -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -RIFF/WEBP -+- VP8 (bitstream of the single tile of the image) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * A 'VP8X' chunk with information about features used in the file. + * An optional 'ICCP' chunk with color profile. -#### Images With Special Layout - -If the first subchunk of RIFF has the tag "VP8X", the file contains an -_image with special layout_. - -**Note:** Older readers may not support images with special layout. - -Such an image consists of: - - * A "VP8X" chunk with information about features used in the file. - - * An optional "ICCP" chunk with color profile. - - * An optional "LOOP" chunk with animation control data. + * An optional 'LOOP' chunk with animation control data. * Data for all the frames. - * An optional "META" chunk with XMP metadata. + * An optional 'META' chunk with XMP metadata. * Some other chunk types may be defined by future specifications and placed anywhere in the file. -As will be described in the "VP8X" chunk description, by checking a +As will be described in the 'VP8X' chunk description, by checking a flag one can distinguish animated and non-animated images. A non-animated image has exactly one frame. An animated one may have multiple frames. Data for each frame consists of: - * An optional "FRM " (fourth character is a significant space) chunk - with animation frame metadata. It **MUST** be present in animated - images at the beginning of data for that frame. It **MUST NOT** be + * An optional 'FRM ' (fourth character is a significant space) chunk + with animation frame metadata. It MUST be present in animated + images at the beginning of data for that frame. It MUST NOT be present in non-animated images. - * An optional "TILE" chunk with tile position metadata. It **MUST** be + * An optional 'TILE' chunk with tile position metadata. It MUST be present at the beginning of data for an image that's represented as multiple tile images. - * An optional "ALPH" chunk with alpha bitstream of the tile. It **MUST** be - present for an image containing transparency. It **MUST NOT** be present + * An optional 'ALPH' chunk with alpha bitstream of the tile. It MUST be + present for an image containing transparency. It MUST NOT be present in non-transparent images. - * A "VP8 " chunk with the bitstream of the tile. + * A 'VP8 ' chunk with the bitstream of the tile. -All chunks **MUST** be placed in the same order as listed above (except -for unknown chunks, which **MAY** appear anywhere). If a chunk appears -in the wrong place, the file is invalid, but readers **MAY** parse the +All chunks SHOULD be placed in the same order as listed above (except +for unknown chunks, which MAY appear anywhere). If a chunk appears +in the wrong place, the file is invalid, but readers MAY parse the file, ignoring the chunks that come too late. -**Rationale:** Setting the order of chunks should allow quicker file +**Rationale:** Setting the order of chunks should allow quicker file parsing. For example, if an ICCP chunk does not appear in its required position, a decoder can choose to stop searching for it. The rule of ignoring late chunks should make programs that need to do a full search give the same results as the ones stopping early. -**Example:** An example layout of a non-animated, tiled image without -transparency may look as follows: +Extended WebP file header: -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -RIFF/WEBP -+- VP8X (descriptions of features used) -+- ICCP (color profile) -+- TILE (First tile parameters) -+- VP8 (bitstream - first tile) -+- TILE (Second tile parameters) -+- VP8 (bitstream - second tile) -+- TILE (third tile parameters) -+- VP8 (bitstream - third tile) -+- TILE (fourth tile parameters) -+- VP8 (bitstream - fourth tile) -+- META (XMP metadata) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | WebP file header (12 bytes) | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('VP8X') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Rsrv |M|I|A|T| Reserved | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Canvas Width | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Canvas Height | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ -**Example:** An example layout of an animated image with transparency may look -as follows: +Tiling (T): 1 bit -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -RIFF/WEBP -+- VP8X (descriptions of features used) -+- LOOP (animation control parameters) -+- FRM (first animation frame parameters) -+- ALPH (alpha bitstream - first image frame) -+- VP8 (bitstream - first image frame) -+- FRM (second animation frame parameters) -+- ALPH (alpha bitstream - second image frame) -+- VP8 (bitstream - second image frame) -+- META (XMP metadata) -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +: Set if the image is represented by tiles. +Animation (A): 1 bit + +: Set if the file is an animation. Data in 'LOOP' and 'FRM ' chunks + should be used to control the animation. + +ICC profile (I): 1 bit + +: Set if the file contains an 'ICCP' chunk. + +Metadata (M): 1 bit + +: Set if the file contains a 'META' chunk. + +Reserved (Rsrv): 4 bits + +: SHOULD be `0`. + +Reserved: 16 bits + +: SHOULD be `0`. + +Canvas Width: 32 bits + +: Width of the canvas in pixels. + +Canvas Height: 32 bits + +: Height of the canvas in pixels. + +Future specifications MAY add more fields. If a chunk of larger size is found, +programs MUST ignore the extra bytes but SHOULD preserve them when modifying +the file. + +### Chunks + +#### Animation + +Loop Chunk: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('LOOP') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Loop Count | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Loop Count: 16 bits (_uint16_) + +: The number of times to loop the animation. `0` means infinitely. + +For images that are animations, this chunk contains the global +parameters of the animation. + +This chunk MUST appear if the _Animation_ flag in chunk VP8X is set. +If the _Animation_ flag is not set and this chunk is present, it +SHOULD be ignored. + +Per-frame parameters of the animation: + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('FRM ') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Frame X | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Frame Y | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Frame Width | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Frame Height | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Frame Duration | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Frame X: 32 bits (_uint32_) + +: The X coordinate of the upper left corner of the frame. + +Frame Y: 32 bits (_uint32_) + +: The Y coordinate of the upper left corner of the frame. + +Frame Width: 32 bits (_uint32_) + +: The width of the frame. + +Frame Height: 32 bits (_uint32_) + +: The height of the frame. + +Frame Duration: 16 bits (_uint16_) + +: Time to wait before displaying the next tile, in 1 millisecond units. + +Notes for frames containing VP8 data: + + * _Frame X_ and _Frame Y_ values MUST be divisible by `32`. + + **Rationale:** This ensures that pixels on U and V planes are aligned to a + 16-byte boundary (even after a rotation), which may help with vector + instructions on some architectures. This also makes the tiles align to + 16-pixel macroblock boundaries. + + * _Frame Width_ MUST be divisible by `16` or + `Frame X + Frame Width == Canvas Width` MUST be true. + + * _Frame Height_ MUST be divisible by `16` or + `Frame Y + Frame Height == Canvas Height` MUST be true. + + **Rationale:** The width and height constraints simplify the handling of + macroblocks that are on the edge of a tile. VP8 decoders can overwrite + pixels outside the boundary in such a macroblock, and this guarantees they + won't overwrite any data. + +#### Tiling + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('TILE') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Tile Canvas X | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Tile Canvas Y | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Tile Data | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Tile Canvas X: 32 bits (_uint32_) + +: X coordinate of the upper left corner of the tile. + +Tile Canvas Y: 32 bits (_uint32_) + +: Y coordinate of the upper left corner of the tile. + +Tile Data: _Chunk Size_ - `8` bytes + +: VP8 data. + +This chunk contains information about a single tile and describes the +bitstream chunk that follows it. + +Notes for tiles containing VP8 data: + + * _Tile Canvas X_ and _Tile Canvas Y_ values MUST be + divisible by `32`. + + * The _Tile Width_ and _Tile Height_ can be extracted from the VP8 data. + See 'Section 9' in the [VP8 RFC][vp8spec]. + + * The width of a tile MUST be divisible by `16` or + `Tile Canvas X + Tile Width == Canvas Width` MUST be true. + + * The height of a tile MUST be divisible by `16` or + `Tile Canvas Y + Tile Height == Canvas Height` MUST be true. + + +#### Alpha + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('ALPH') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | F | C | Reserved | Alpha Bitstream | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Filtering method (F): 4 bits + +: The filtering method used: + + * `0`: None. + * `1`: Horizontal filter. + * `2`: Vertical filter. + * `3`: Gradient filter. + +Compression method (C): 4 bits + +: The compression method used: + + * `0`: No compression. + * `1`: Backward reference counts encoded with arithmetic encoder. + +Reserved: 8 bits + +: SHOULD be `0`. + +Alpha bitstream: _Chunk Size_ - `2` bytes + +: Encoded alpha bitstream. + +This optional chunk contains encoded alpha data for a single tile. +Either **ALL or NONE** of the tiles must contain this chunk. + +The alpha channel can be encoded either losslessly or with lossy +preprocessing (quantization). After the optional preprocessing, the +alpha values are encoded with a lossless compression method like +zlib. Work is in progress to improve the compression gain further by +exploring alternate compression methods and hence, the bitstream for +the Alpha-chunk is still experimental and expected to change. + +#### Color profile + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('ICCP') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Compression | Color Profile | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Compression: 8 bits + +: Compression method used: + + * `0`: None. + * `1`: Deflate/inflate. + +Color Profile: _Chunk Size_ - `1` bytes + +: ICC profile. + +There SHOULD be at most one 'ICCP' chunk. +See for specifications. + +If this chunk is not present, sRGB SHOULD be assumed. + +#### Metadata + + 0 1 2 3 + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | ChunkHeader('META') | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + | Compression | XMP Metadata | + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +Compression: 8 bits + +: Compression method used: + + * `0`: None. + * `1`: Deflate/inflate. + +XMP Metadata: _Chunk Size_ - `1` bytes + +: XMP metadata. + +There SHOULD be at most one such chunk. If there are more such chunks, readers +MAY ignore all except the first one. + +XMP packets are XML text as specified in the [XMP Specification Part +1][xmpspec]. The chunk tag is different from the one specified by Adobe +for WAV and AVI (also RIFF formats), because we have the option of +compression. + +Additional guidance about handling metadata can be found in the +Metadata Working Group's [Guidelines for Handling Metadata][metadata]. +Note that the sections of the document about reconciliation of EXIF, +XMP and IPTC-IIM don't apply to WebP. As WebP supports only XMP, no +reconciliation is necessary. + +#### Other Chunks + +A file MAY contain other chunks. Readers SHOULD be ignore these chunks. Writers +SHOULD preserve them in their original order. ### Assembling the Canvas from Tiles and Animation -Contents of the chunks will be described in subsequent sections. Here we -provide an overview of how they are used to assemble the canvas. The -notation _VP8X.canvasWidth_ means the field in the "VP8X" -described as _canvasWidth_. +Here we provide an overview of how 'TILE' chunks and 'FRM '/'LOOP' chunks are +used to assemble the canvas. The notation _VP8X.field_ means the field in +the 'VP8X' chunk with the same description. -Decoding a non-animated canvas **MUST** be equivalent to the following +Decoding a non-animated canvas MUST be equivalent to the following pseudocode: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -301,7 +586,7 @@ for chunk in data_for_all_frames: canvas contains the decoded canvas. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Decoding an animated canvas **MUST** be equivalent to the following +Decoding an animated canvas MUST be equivalent to the following pseudocode: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -338,253 +623,45 @@ for LOOP.loop = 0, ..., LOOP.loopCount-1 canvas contains the decoded canvas. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As described earlier, if an assert related to chunk ordering fails, the -reader **MAY** ignore the badly-ordered chunks instead of failing to -decode the file. - - -### Bitstream Chunks (VP8) - -These chunks contain compressed image data. Currently, the only allowed -bitstream is VP8, using "VP8 " (note the significant fourth-character -space) as its tag. We will refer to all chunks with this tag as -**_bitstream chunks_**. As described earlier, images without special -layout have a single bitstream chunk as the first subchunk of RIFF, -while images with special layout may contain several of them, one for -each tile. - -The content of a "VP8 " chunk **MUST** be one VP8 key frame (with -optional padding. See below). - -The current [VP8 Data Format and Decoding Guide][vp8spec] can be found -at the IETF website, . Note that the VP8 frame -header contains the VP8 frame width and height. That is assumed to be -the width and height of the tile. - -The VP8 specification describes how to decode the image into Y'CbCr -format. To convert to RGB, Rec. 601 **SHOULD** be used. - -For compatibility with older readers, if the size of the frame is odd, -writers **SHOULD** append a padding byte (preferably `0`) inside the -chunk contents, making the chunk's _ckSize_ even. Newer readers -**MUST** support odd-sized bitstream chunks. - - -### VP8X Chunk (Special Layout) - -As described earlier, a chunk with tag "VP8X", is the first chunk of -images with special layout. It is used to enable advanced features of -WebP. - -The content of the chunk is as follows: - - * **uint32** flags. The following bits are currently used (with `0` - being the least significant bit): - - * bit 0: _hasTile_: Set if the image is represented by Tiles. - - * bit 1: _hasAnimation_: Set if the file is an animation. Data in - "LOOP" and "FRM " chunks should be used to control the animation. - - * bit 2: _hasIccp_: Set if the file contains an "ICCP" chunk with a - color profile. If a file contains an "ICCP" chunk but this bit is - not set, the error is flagged while constructing the - Mux-Container. - - * bit 3: _hasMetadata_: Set if the file contains a "META" chunk - with a XMP metadata. If a file contains an "META" chunk but this - bit is not set, the error is flagged while constructing the - Mux-Container. - -Future specifications **MAY** define other bits in flags. Bits not -defined by this specification **MUST** be preserved when modifying the -file. - - * **uint32** _canvasWidth_: Width of the canvas in pixels (after the - optional rotation or symmetry; see below). - - * **uint32** _canvasHeight_: Height of the canvas in pixels (after - the optional rotation or symmetry; see below). - -Future specifications **MAY** add more fields. If a chunk of larger size -is found, programs **MUST** ignore the extra bytes but **MUST** preserve -them when modifying the file. - - -### LOOP Chunk (Global Animation Parameters) - -For images that are animations, this chunk contains the global -parameters of the animation. - -This chunk **MUST** appear if the _hasAnimation_ flag in chunk VP8X is -set. If the _hasAnimation_ flag is not set and this chunk is present, -it **MUST** be ignored. - -The content of the chunk is as follows: - - * **uint16** _loopCount_: For animations, the number of times to loop - the animation. `0` means infinitely. - -Future specifications **MAY** add more fields. If a chunk of larger -size is found, programs **MUST** ignore the extra bytes but **MUST** -preserve them when modifying the file. - - -### FRM Chunk (Per-frame Animation Parameters) - -For images that are animations, these chunks contain the per-frame -parameters of the animation. - -The content of the chunk is as follows: - - * **uint32** _frameX_: X coordinate of the upper left corner of the - frame. For images using the VP8 codec, this value **MUST** be - divisible by `32`. Other codecs **MAY** specify other constraints. - Described in more detail later. - - * **uint32** _frameY_: Y coordinate of the upper left corner of the - frame. For images using the VP8 codec, this value **MUST** be - divisible by `32`. Other codecs **MAY** specify other constraints. - Described in more detail later. - - * **uint32** _frameWidth_: Width of the frame. For images using the - VP8 codec, this value **MUST** be divisible by `16`, or be such that - _frameX + frameWidth == canvasWidth_. Other codecs **MAY** specify - other constraints. Described in more detail later. - - * **uint32** _frameHeight_: Height. For images using the VP8 codec, - this value **MUST** be divisible by `16`, or be such that _frameY + - frameHeight == canvasHeight_. Other codecs **MAY** specify other - constraints. Described in more detail later. - - * **uint16** _frameDuration_: Time to wait before displaying the next - tile, in 1ms units. - -**Rationale:** The requirement for corner coordinates to be divisible -by `32` means that pixels on U and V planes are aligned to a 16-byte -boundary (even after a rotation), which may help with vector -instructions on some architectures. This makes the tiles also align to -16-pixel macroblock boundaries. - -**Rationale:** The requirement for the width and height to be -divisible by `16` or touching the edge of the canvas simplifies the -handling of macroblocks that are on the edge of a tile. VP8 decoders -can overwrite pixels outside the boundary in such a macroblock, and this -guarantees they won't overwrite any data. - -Future specifications **MAY** add more fields. If a chunk of larger -size is found, programs **MUST** ignore the extra bytes but **MUST** -preserve them when modifying the file. - - -### TILE Chunks (Tile Parameters) - -This chunk contains information about a single tile and describes the -bitstream chunk that follows it. - -The contents of such a chunk are as follows: - - * **uint32** _tileCanvasX_: X coordinate of the upper left corner of - the tile. For VP8 tiles, this value **MUST** be divisible by `32`. - Other codecs **MAY** specify other constraints. - - * **uint32** _tileCanvasY_: Y coordinate of the upper left corner of - the tile. For VP8 tiles, this value **MUST** be divisible by `32`. - Other codecs **MAY** specify other constraints. - -Future specifications **MAY** add more fields. If a chunk of larger size -is found, programs **MUST** ignore the extra bytes but **MUST** preserve -them when modifying the file. - -As described earlier, the TILE chunk is followed by VP8 data. From that -chunk we can read the height and width of the tile. These we denote as -_tileWidth_ and _tileHeight_. In the case of VP8, we have the following -constraints: - - * The width of a tile **MUST** be divisible by `16`, or _tileCanvasX + - tileWidth == canvasWidth_ **MUST** be true. - - * The height of a tile **MUST** be divisible by `16`, or - _tileCanvasY + tileHeight == canvasHeight_ **MUST** be true. - - -### ALPH Chunks (Alpha Bitstreams) - -This optional chunk contains encoded alpha data for a single tile. Either -**ALL or NONE** of the tiles must contain this chunk. - -The alpha channel can be encoded either losslessly or with lossy preprocessing -(quantization). After the optional preprocessing, the alpha values are encoded -with a lossless compression method like zlib. Work is in progress to improve the -compression gain further by exploring alternate compression methods and hence, -the bit-stream for the Alpha-chunk is still experimental and expected to change. - -The contents of such a chunk are as follows: - - * Byte 0 lower nibble: The _compression method_ used. Currently two methods - are supported: - - * 0 --> No compression - - * 1 --> Backward reference counts encoded with arithmetic encoder. - - * Byte 0 upper nibble: The _filtering method_ used. Currently the following - methods are supported: - - * 0 --> No filter - - * 1 --> Horizontal filter - - * 2 --> Vertical filter - - * 3 --> Gradient filter - - * Byte 1: _Reserved_. **Should** be 0. - - * Byte 2 onwards: _Encoded alpha bitstream_. - - -### ICCP Chunk (Color Profile) - -An optional "ICCP" chunk contains an ICC profile. There **SHOULD** be -at most one such chunk. The first byte of the chunk is the compression -type. Two values are currently defined: a value of `0` means no -compression, while a value of `1` means deflate/inflate compression. It -is followed by a compressed or non-compressed ICC profile. See - for specifications. - -The color profile can be a v2 or v4 profile. If this chunk is missing, -sRGB **SHOULD** be assumed. - - -### META Chunk (Compressed XMP Metadata) - -Such a chunk (if present) contains XMP metadata. There **SHOULD** be at -most one such chunk. If there are more such chunks, readers **SHOULD** -ignore all except the first one. The first byte specifies compression -type. Two values are currently defined: a value of `0` means no -compression, while a value of `1` means deflate/inflate compression. It -is followed by a compressed or non-compressed XMP metadata packet. - -XMP packets are XML text as specified in the [XMP Specification Part -1][xmpspec]. The chunk tag is different from the one specified by Adobe -for WAV and AVI (also RIFF formats), because we have the option of -compression. - -Additional guidance about handling metadata can be found in the -Metadata Working Group's [Guidelines for Handling Metadata][metadata]. -Note that the sections of the document about reconciliation of EXIF, -XMP and IPTC-IIM don't apply to WebP. As WebP supports only XMP, no -reconciliation is necessary. - - -### Other Chunks - -A file **MAY** contain other chunks, defined in some future -specification. Such chunks **MUST** be ignored, but preserved. Writers -**SHOULD** try to preserve them in their original order. - +As described earlier, if an assert related to chunk ordering fails, the reader +MAY ignore the badly-ordered chunks instead of failing to decode the file. + +Example file layouts +-------------------- + +A non-animated, tiled image without transparency may look as follows: + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +RIFF/WEBP ++- VP8X (descriptions of features used) ++- ICCP (color profile) ++- TILE (First tile parameters) ++- VP8 (bitstream - first tile) ++- TILE (Second tile parameters) ++- VP8 (bitstream - second tile) ++- TILE (third tile parameters) ++- VP8 (bitstream - third tile) ++- TILE (fourth tile parameters) ++- VP8 (bitstream - fourth tile) ++- META (XMP metadata) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +An animated image with transparency may look as follows: + +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +RIFF/WEBP ++- VP8X (descriptions of features used) ++- LOOP (animation control parameters) ++- FRM (first animation frame parameters) ++- ALPH (alpha bitstream - first image frame) ++- VP8 (bitstream - first image frame) ++- FRM (second animation frame parameters) ++- ALPH (alpha bitstream - second image frame) ++- VP8 (bitstream - second image frame) ++- META (XMP metadata) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [vp8spec]: http://tools.ietf.org/html/rfc6386 [xmpspec]: http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/pdfs/XMPSpecificationPart1.pdf [metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf +[rfc 2119]: http://tools.ietf.org/html/rfc2119