WebP Container Specification ============================ _Working Draft, v0.7, 20121102_ * TOC placeholder {:toc} Introduction ------------ WebP is an image format that uses either (i) the VP8 key frame encoding to compress image data in a lossy way, or (ii) the WebP lossless encoding (and possibly other encodings in the future). These encoding schemes should make it more efficient than currently used formats. It is optimized for fast image transfer over the network (e.g., for websites). The WebP format has feature parity (color profile, metadata etc) with other formats as well. This document describes the structure of a WebP file. The WebP container (i.e., RIFF container for WebP) allows feature support over and above the basic use case of WebP (i.e., a file containing a single image encoded as a VP8 key frame). The WebP container provides additional support for: * **Lossless compression.** An image can be losslessly compressed, using the WebP Lossless Format. * **Metadata.** An image may have metadata stored in EXIF or XMP formats. * **Transparency.** An image may have transparency, i.e., an alpha channel. * **Color Profile.** An image may have an embedded ICC profile as described by the [International Color Consortium][iccspec]. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119][]. Terminology & Basics ------------------------ A WebP file contains a still image (i.e., an encoded matrix of pixels) and, optionally, transparency information. In case we need to refer only to the matrix of pixels, we will call it the _canvas_ of the image. Below are additional terms used throughout this document: Code that reads WebP files is referred to as a _reader_, while code that writes them is referred to as a _writer_. _uint16_ : A 16-bit, little-endian, unsigned integer. _uint24_ : A 24-bit, little-endian, unsigned integer. _uint32_ : A 32-bit, little-endian, unsigned integer. _1-based_ : An unsigned integer field storing values offset by `-1`. e.g., Such a field would store value _25_ as _24_. The basic element of a RIFF file is a _chunk_. It consists of: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Chunk FourCC | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Chunk Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Chunk Payload | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Chunk FourCC: 32 bits : ASCII four character code or _chunk tag_ used for chunk identification. Chunk Size: 32 bits (_uint32_) : The size of the chunk (_ckSize_) not including this field, the chunk identifier and padding. Chunk Payload: _Chunk Size_ bytes : The data payload. If _Chunk Size_ is odd a single padding byte that SHOULD be `0` is added. _ChunkHeader('ABCD')_ : This is used to describe the fourcc and size header of individual chunks, where 'ABCD' is the fourcc for the chunk. This element's size is 8 bytes. : Note that, in this specification, all chunk tag characters are in file order, not in byte order of a uint32 of any particular architecture. _list of chunks_ : A concatenation of multiple chunks. : We will refer to the first chunk as having _position_ 0, the second as position 1, etc. By _chunk with index 0 among "ABCD"_ we mean the first chunk among the chunks of type "ABCD" in the list, the _chunk with index 1 among "ABCD"_ is the second such chunk, etc. A WebP file MUST begin with a single chunk with a tag 'RIFF'. All other defined chunks are contained within this chunk. The file SHOULD NOT contain anything after it. The maximum size of RIFF's _ckSize_ is 2^32 minus 10 bytes. The size of the whole file is at most 4GiB minus 2 bytes. **Note:** some RIFF libraries are said to have bugs when handling files larger than 1GiB or 2GiB. If you are using an existing library, check that it handles large files correctly. The first four bytes of the RIFF chunk contents (i.e., bytes 8-11 of the file) MUST be the ASCII string "WEBP". They are followed by a list of chunks. As the size of any chunk is even, the size of the RIFF chunk is also even. The contents of the chunks in that list will be described in the following sections. **Note:** RIFF has a convention that all-uppercase chunks are standard chunks that apply to any RIFF file format, while chunks specific to a file format are all lowercase. WebP does not follow this convention. WebP file header ---------------- 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 'R' | 'I' | 'F' | 'F' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | File Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 'W' | 'E' | 'B' | 'P' | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 'RIFF': 32 bits : The ASCII characters 'R' 'I' 'F' 'F'. File Size: 32 bits (_uint32_) : The size of the file in bytes starting at offset 8. 'WEBP': 32 bits : The ASCII characters 'W' 'E' 'B' 'P'. Simple file format (lossy) -------------------------- This layout SHOULD be used if the image requires _lossy_ encoding and does not require transparency or other advanced features provided by the extended format. Files with this layout are smaller and supported by older software. Simple WebP (lossy) file format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | WebP file header (12 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VP8 chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VP8 chunk: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('VP8 ') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VP8 data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VP8 data: _Chunk Size_ bytes : VP8 bitstream data. The VP8 bitstream format specification can be found at [VP8 Data Format and Decoding Guide][vp8spec]. Note that the VP8 frame header contains the VP8 frame width and height. That is assumed to be the width and height of the canvas. The VP8 specification describes how to decode the image into Y'CbCr format. To convert to RGB, Rec. 601 SHOULD be used. Simple file format (lossless) ----------------------------- **Note:** Older readers may not support files using the lossless format. This layout SHOULD be used if the image requires _lossless_ encoding (with an optional transparency channel) and does not require advanced features provided by the extended format. Simple WebP (lossless) file format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | WebP file header (12 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VP8L chunk | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VP8L chunk: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('VP8L') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | VP8L data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ VP8L data: _Chunk Size_ bytes : VP8L bitstream data. The current specification of the VP8L bitstream can be found at [WebP Lossless Bitstream Format][webpllspec]. Note that the VP8L header contains the VP8L image width and height. That is assumed to be the width and height of the canvas. Extended file format -------------------- **Note:** Older readers may not support files using the extended format. An extended format file consists of: * A 'VP8X' chunk with information about features used in the file. * An optional 'ICCP' chunk with color profile. * An optional 'ALPH' chunk with transparency information. * The image bitstream contained in either a 'VP8 ' or 'VP8L' chunk. * An optional 'EXIF' chunk with EXIF metadata. * An optional 'XMP ' chunk with XMP metadata. All chunks SHOULD be placed in the same order as listed above. If a chunk appears in the wrong place, the file is invalid, but readers MAY parse the file, ignoring the chunks that come too late. **Rationale:** Setting the order of chunks should allow quicker file parsing. For example, if an 'ALPH' chunk does not appear in its required position, a decoder can choose to stop searching for it. The rule of ignoring late chunks should make programs that need to do a full search give the same results as the ones stopping early. Extended WebP file header: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | WebP file header (12 bytes) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('VP8X') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Rsv|I|L|E|X| R | Reserved | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Canvas Width Minus One | ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... Canvas Height Minus One | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved (Rsv): 2 bits : SHOULD be `0`. ICC profile (I): 1 bit : Set if the file contains an ICC profile. Alpha (L): 1 bit : Set if the file contains some (or all) images with transparency information ("alpha"). EXIF metadata (E): 1 bit : Set if the file contains EXIF metadata. XMP metadata (X): 1 bit : Set if the file contains XMP metadata. Reserved (R): 2 bits : SHOULD be `0`. Reserved: 24 bits : SHOULD be `0`. Canvas Width Minus One: 24 bits : _1-based_ width of the canvas in pixels. The actual canvas width is '1 + Canvas Width Minus One' Canvas Height Minus One: 24 bits : _1-based_ height of the canvas in pixels. The actual canvas height is '1 + Canvas Height Minus One' The product of _Canvas Width_ and _Canvas Height_ MUST be at most `2^32 - 1`. Future specifications MAY add more fields. ### Chunks #### Alpha 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('ALPH') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Rsv| P | F | C | Alpha Bitstream... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Reserved (Rsv): 2 bits : SHOULD be `0`. Pre-processing (P): 2 bits : These INFORMATIVE bits are used to signal the pre-processing that has been performed during compression. The decoder can use this information to e.g. dither the values or smooth the gradients prior to display. * `0`: no pre-processing * `1`: level reduction Filtering method (F): 2 bits : The filtering method used: * `0`: None. * `1`: Horizontal filter. * `2`: Vertical filter. * `3`: Gradient filter. For each pixel, filtering is performed using the following calculations. Assume the alpha values surrounding the current `X` position are labeled as: C | B | ---+---+ A | X | We seek to compute the alpha value at position `X`. First, a prediction is made depending on the filtering method: * Method `0`: predictor = 0 * Method `1`: predictor = A * Method `2`: predictor = B * Method `3`: predictor = clip(A + B - C) where `clip(v)` is equal to: * 0 if v < 0 * 255 if v > 255 * v otherwise The final value is derived by adding the decompressed value `X` to the predictor and using modulo-256 arithmetic to wrap the [256-511] range into the [0-255] one: `alpha = (predictor + X) % 256` There are special cases for left-most and top-most pixel positions: * Top-left value at location (0,0) uses 0 as predictor value. Otherwise, * For horizontal or gradient filtering methods, the left-most pixels at location (0, y) are predicted using the location (0, y-1) just above. * For vertical or gradient filtering methods, the top-most pixels at location (x, 0) are predicted using the location (x-1, 0) on the left. Decoders are not required to use this information in any specified way. Compression method (C): 2 bits : The compression method used: * `0`: No compression. * `1`: Compressed using the WebP lossless format. Alpha bitstream: _Chunk Size_ - `1` bytes : Encoded alpha bitstream. This optional chunk contains encoded alpha data for the image. An image containing a 'VP8L' chunk SHOULD NOT contain this chunk. **Rationale**: The transparency information of the image is already part of the 'VP8L' chunk. The alpha channel data is stored as uncompressed raw data (when compression method is '0') or compressed using the lossless format (when the compression method is '1'). * Raw data: consists of a byte sequence of length width * height, containing all the 8-bit transparency values in scan order. * Lossless format compression: the byte sequence is a compressed image-stream (as described in the [WebP Lossless Bitstream Format] [webpllspec]) of implicit dimension width x height. That is, this image-stream does NOT contain any headers describing the image dimension. **Rationale**: the dimension is already known from other sources, so storing it again would be redundant and error-prone. Once the image-stream is decoded into ARGB color values, following the process described in the lossless format specification, the transparency information must be extracted from the *green* channel of the ARGB quadruplet. **Rationale**: the green channel is allowed extra transformation steps in the specification -- unlike the other channels -- that can improve compression. #### Bitstream (VP8/VP8L) This chunk contains compressed image data. A bitstream chunk may be either (i) a VP8 chunk, using "VP8 " (note the significant fourth-character space) as its tag _or_ (ii) a VP8L chunk, using "VP8L" as its tag. The formats of VP8 and VP8L chunks are as described in sections [Simple file format (lossy)](#simple-file-format-lossy) and [Simple file format (lossless)](#simple-file-format-lossless) respectively. #### Color profile 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('ICCP') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Color Profile | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Color Profile: _Chunk Size_ bytes : ICC profile. This chunk MUST appear before data for all the frames. There SHOULD be at most one such chunk. If there are more such chunks, readers MAY ignore all except the first one. See the [ICC Specification][iccspec] for details. If this chunk is not present, sRGB SHOULD be assumed. #### Metadata Metadata can be stored in 'EXIF' or 'XMP ' chunks. There SHOULD be at most one chunk of each type ('EXIF' and 'XMP '). If there are more such chunks, readers MAY ignore all except the first one. Also, a file may possibly contain both 'EXIF' and 'XMP ' chunks. The chunks are defined as follows: EXIF chunk: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('EXIF') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EXIF Metadata | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ EXIF Metadata: _Chunk Size_ bytes : image metadata in EXIF format. XMP chunk: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ChunkHeader('XMP ') | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | XMP Metadata | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ XMP Metadata: _Chunk Size_ bytes : image metadata in XMP format. Additional guidance about handling metadata can be found in the Metadata Working Group's [Guidelines for Handling Metadata][metadata]. #### Unknown Chunks A file MAY contain other unknown chunks. Readers SHOULD ignore these chunks. Writers SHOULD preserve them in their original order. Example file layouts -------------------- A lossy encoded image with alpha may look as follows: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RIFF/WEBP +- VP8X (descriptions of features used) +- ALPH (alpha bitstream) +- VP8 (bitstream) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A losslessly encoded image may look as follows: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RIFF/WEBP +- VP8X (descriptions of features used) +- XYZW (unknown chunk) +- VP8L (lossless bitstream) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A lossless image with ICC profile and XMP metadata may look as follows: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RIFF/WEBP +- VP8X (descriptions of features used) +- ICCP (color profile) +- VP8L (lossless bitstream) +- XMP (metadata) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ [vp8spec]: http://tools.ietf.org/html/rfc6386 [webpllspec]: https://gerrit.chromium.org/gerrit/gitweb?p=webm/libwebp.git;a=blob;f=doc/webp-lossless-bitstream-spec.txt;hb=master [iccspec]: http://www.color.org/icc_specs2.xalter [metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf [rfc 2119]: http://tools.ietf.org/html/rfc2119