diff --git a/doc/webp-container-spec.txt b/doc/webp-container-spec.txt new file mode 100644 index 00000000..3597e698 --- /dev/null +++ b/doc/webp-container-spec.txt @@ -0,0 +1,328 @@ +WebP container specification - Working Draft (V0.1 Date 09/26) +Terminology +Basics +Single-image WebP files +Chunks layout +Images without special layout +Images with special layout +Assembling the canvas from tiles and animation +Bitstream chunk(s) (VP8) +VP8X chunk (special layout) +LOOP chunk (global animation parameters) +FRM chunk (per-frame animation parameters) +TILE chunks (tile parameters) +ICCP chunk (color profile) +META chunk (compressed XMP metadata) +Other chunks + +WebP container specification - Working Draft (V0.1 Date 09/26) +WebP is a still image format that uses the VP8 key frame encoding (and, possibly, other codecs in the +future) to compress image data in a lossy way. The VP8 encoding should make it more efficient than +currently used formats. It is optimized for fast image transfer over the network (e.g., for WWW sites). +However, it also aims for feature parity (like Color profile, XMP Metadata, Animation etc) with other +formats. This document describes the structure of such a file. +The first version of WebP handled only the basic use-case - a file having a single image (being one +VP8 key frame) with no metadata. However, the use of a RIFF container allowed to extend it. This +document extends it by additionally introducing support for: +● Metadata and color profiles. We specify chunks that can contain this information, like can +other popular formats. +● Tiling. A single VP8 frame has an inherent limitation for width or height of 2^14 pixels and +a 512kB limit on the size of first compressed partition. To support larger images, we support +images that are composed of multiple tiles, each encoded as a separate VP8 frame. All tiles +form logically a single image - they have common metadata, color profile etc. Tiling may also +improve efficiency for larger images - grass can be encoded differently than sky. +● Animation. An image may have pauses between frames, making it an animation. +Files not using these new features are backward compatible with the original format. Using these +features will produce files that are not compatible with older programs. + +Terminology +A WebP file contains either a still image (i.e. an encoded matrix of pixels) or an animation (see below) +with, possibly, a color profile, metadata etc. In case we need to refer only to the matrix of pixels, we + + will call is the canvas of the image. +The canvas of an image is built from one or multiple tiles. Each tile is a separately encoded VP8 key +frame (other codec are possible in the future). Building an image from several tiles allows to overcome +the size limitations of a single VP8 frame. Tiling is supposed to be an internal detail of the file - they +are not supposed to be exposed to the user. + +Basics +This section introduces basic terms used throughout the document. +Code reading WebP files will be referred to as readers, while code writing them will be referred as +writers. +A 16-bit, little-endian, unsigned integer will be denoted as uint16. +A 32-bit, little-endian, unsigned integer will be denoted as uint32. +The basic element of a RIFF file is a chunk. It consist of: +● 4 ASCII characters that will be called the chunk tag. +● uint32 with the size of the chunk content (that will be denoted as ckSize). +● ckSize bytes of content. +● If ckSize is odd, a single padding byte that SHOULD be 0. +A chunk with a tag “ABCD” will be also called a chunk of type “ABCD”. Note that, in this +specification, all chunk tag characters are in file order, not in byte order of an uint32 of any particular +architecture. +Note that the padding MUST be also added to the last chunk of the file. +A list of chunks is a concatenation of multiple chunks. We will call the first chunk as having +position 0, the second as position 1 etc. By chunk with index 0 among “ABCD” we will mean the first +chunk among the chunks of type “ABCD” in the list, the chunk with index 1 among “ABCD” is the +second such chunk, etc. +A WebP file MUST begin with a single chunk with a tag “RIFF”. All other defined chunks are +within this chunk. It SHOULD NOT contain anything after it. +The maximum size of RIFF's ckSize is 2^32 – 10 bytes (i.e. the size of the whole file is at most 4GiB +– 2 bytes). +Note: some RIFF libraries are said to have bugs when handling files larger than 1GiB or 2GiB. If +you are using an existing library, check that it handles large files correctly. +The first four bytes of the RIFF chunk contents (i.e. bytes 8-11 of the file) MUST be the ASCII +string “WEBP”. They are followed by a list of chunks. Note that as the size of any chunk is even, the +size of the RIFF chunk is also even. +The content of the chunks in that list will be described in the following sections. +Note: RIFF has a convention that all-uppercase chunks are standard chunks that apply to any +RIFF file format, while chunks specific for a file format are all-lowercase. WebP doesn’t follow this + + convention. + +Single-image WebP files +First, we will describe a subset of WebP files – files containing only one image (later, we will use it +to define multi-image files - file having several different images). + +Chunks layout +This section describes what chunks and in what order may appear in a single-image WebP file. The +content of these chunks will be described in subsequent sections. +The first chunk inside the RIFF chunk MUST be with a tag of “VP8 ” (note the space as the last +character) or “VP8X”. Other tags for the first chunk MAY be introduced by future specifications if we +add new codecs. This tag of the first chunk determines which of the two possible layouts is used. +Rationale: we fix the possible tags of the first chunk so that it is possible to introduce other codecs, to +keep the “WEBP” signature at the beginning of RIFF chunk, while still being able to check the codec +used by the image by inspecting the byte stream at a fixed position. +The two possible layouts will be called images without special layout and images with special layout. +Images without special layout + +If the first subchunk of RIFF has the tag “VP8 ”, the file contains an image without special layout. +This layout SHOULD be used if the image doesn’t require advanced features: color profiles, XMP +metadata, animation or tiling. Files with this layout are smaller and supported by older software. +Such images consist of: +● A “VP8 ” chunk with the bitstream of the single tile. +. +Example: An example layout of such a file looks as follows: +RIFF/WEBP ++- VP8 (bitstream of the single tile of the image) +Images with special layout + +If the first subchunk of RIFF has the tag “VP8X” (other tags may be introduced by future +specifications, if new codecs are added), the file contains an image with special layout. +Note: older readers are not supporting images with special layout and will fail for images having them. +Such an image consists: +● A “VP8X” chunk with information about features used in this file. +● An optional “ICCP” chunk with color profile. +● An optional “LOOP” chunk with animation control data. + + ● Data for all the frames. +● An optional “META” chunk with XMP metadata. +● Some other chunks may be defined by future specifications and placed anywhere in the file. +As will be described in the “VP8X” chunk description, by checking a flag one can distinguish animated +and non-animated images. A non-animated image has exactly one frame. An animated one may have +multiple frames. Data for each frame consists of: +● An optional “FRM ” (note the space as the last character) chunk with animation frame +metadata. It MUST be present in animated images at the beginning of data for that frame. It +MUST NOT be present in non-animated images. +● An optional “TILE” chunk with tile position metadata. It MUST be present at the beginning of +data of image that’s represented as multiple tile images. +● A “VP8 ” chunk with the bitstream of the tile. +All chunks MUST be placed in the same order as listed above (except for unknown chunks, that MAY +appear anywhere). If a chunk appears in a wrong place, the file is invalid, but readers MAY parse the +file ignoring the chunks that come too late. +Rationale: setting the order of chunks should allow to quickly stop the search for e.g., the ICCP if it +is not present in the file. The rule of ignoring late chunks should make programs that needs to do a full +search give the same results as the ones stopping early. + +Example: An example layout of a non-animated, tiled image may look as follows: +RIFF/WEBP ++- VP8X (descriptions of features used) ++- ICCP (color profile) ++- TILE (First tile parameters) ++- VP8 (bitstream - first tile) ++- TILE (Second tile parameters) ++- VP8 (bitstream - second tile) ++- TILE (third tile parameters) ++- VP8 (bitstream - third tile) ++- TILE (fourth tile parameters) ++- VP8 (bitstream - fourth tile) ++- META (XMP metadata) +Example: An example layout of an animated image may look as follows: +RIFF/WEBP ++- VP8X (descriptions of features used) ++- LOOP (animation control parameters) ++- FRM (first animation frame parameters) ++- VP8 (bitstream - first image frame) ++- FRM (second animation frame parameters) ++- VP8 (bitstream - second image frame) ++- META(XMP metadata) + + Assembling the canvas from tiles and animation +Contents of the chunks will be described in details in subsequent section. Here, we provide an overview +how they are used to assemble the canvas. The notation VP8X.canvasWidth means the field in +the “VP8X” described as canvasWidth. +Decoding a non-animated canvas MUST be equivalent to the following pseudo-code: +● assert not VP8X.flags.haveAnimation +● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight. +● tile_params.tileCanvasX = tile_params.tileCanvasY = 0 +● for chunk in data_for_all_frames: +○ if chunk.tag == “TILE”: +■ assert No other TILE chunk after the last “VP8 ” chunk +■ tile_params = chunk +○ if chunk.tag == “VP8 ”: +■ +render image in chunk in canvas with top-left corner in +(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry in +VP8X.flags.rotationAndSymmetry. +■ tile_params.tileCanvasX = tile_params.tileCanvasY = 0 +○ Ignore unknown chunks +● canvas contains the decoded canvas. +Decoding an animated canvas MUST be equivalent to the following pseudo-code: +● assert VP8X.flags.haveAnimation +● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight. +● if LOOP.loopCount==0: +○ LOOP.loopCount=∞ +● current_FRM ← nil +● for LOOP.loop = 0, …, LOOP.loopCount-1 +○ assert First chunk in data_for_all_frames is a FRM +○ for chunk in data_for_all_frames: +■ if chunk.tag == “FRM ”: +● if current_FRM != nil: +○ Show the contents of canvas for +current_FRM.frameDuration*10ms. +● current_FRM = chunk +■ if chunk.tag == “VP8 ”: +● assert tile_params.tileCanvasX >= current_FRM.frameX +● assert tile_params.tileCanvasY >= current_FRM.frameY +● assert tile_params.tileCanvasX + chunk.tileWidth >= +current_FRM.frameX + current_FRM.frameWidth +● assert tile_params.tileCanvasY + chunk.tileHeight >= +current_FRM.frameX + current_FRM.frameHeight +●render image in chunk in canvas with top-left corner in +(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry +in VP8X.flags.rotationAndSymmetry. +● tile_params.tileCanvasX = tile_params.tileCanvasY = 0 +■ Ignore unknown chunks + + ● canvas contains the decoded canvas. +As described earlier, if an assert related to chunk ordering fails, the reader MAY ignore the badly-ordered +chunks instead of failing to decode the file. + +Bitstream chunk(s) (VP8) +These chunks contain compressed image data. Currently, the only allowed bitstream is VP8 and +uses “VP8 ” (note the space as the last character) as its tag. We will refer to all chunks with this tag +as bitstream chunks. As described earlier, images without special layout have a single bitstream chunk +as the first subchunk of RIFF, while images with special layout may contain several of them - one for +each tile. +The content of a “VP8 ” chunk MUST be one VP8 key frame (with optional padding – see below). +The current draft of a VP8 specification can be found at http://tools.ietf.org/html/draft-bankoski-vp8bitstream-04. Note that the VP8 frame header contains the VP8 frame width and height. It is assumed +to be the width and height of the tile. +The VP8 specification specifies how to decode the image into Y’CbCr format. To convert to RGB, +Rec. 601 SHOULD be used. +For compatibility with older readers, if the size of the frame is odd, writers SHOULD append a padding +byte (preferably 0) inside the chunk contents, making the chunk’s ckSize even. Newer readers MUST +support odd-sized tile chunks. + +VP8X chunk (special layout) +As described earlier, a chunk with tag “VP8X”, is the first chunk of images with special layout. It is +used to enable advanced features of WebP. +The content of the chunk is as follows: +● uint32 flags. The following bits are currently used (with 0 being the least significant bit): +○ bit 0: haveTile: set if the image is represented by Tiles. +○ bit 1: haveAnimation: set if the file is an animation. Data in “LOOP” and “FRM ” +chunks should be used to control the animation. +○ bit 2: haveIccp: set if the file contains a “ICCP” chunk with a color profile. If a file +contains an “ICCP” chunk but this bit is not set, the error is flagged while constructing +the Mux-Container. +○ bit 3: haveMetadat: set if the file contains a “META” chunk with a XMP metadata. +If a file contains an “META” chunk but this bit is not set, the error is flagged while +constructing the Mux-Container. +Future specification MAY define other bits in flags. Bits not defined by this specification +MUST be preserved when modifying the file. +● uint32 canvasWidth: width of the canvas in pixels (after the optional rotation or symmetry - see +below). +● uint32 canvasHeight: height of the canvas in pixels (after the optional rotation or symmetry see below). + + Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore +the extra bytes but MUST preserve them when modifying the file. + +LOOP chunk (global animation parameters) +For images that are animations, this chunk contains the global parameters of the animation. +This chunks MUST appear if the haveAnimation flag in chunk VP8X is set. If the haveAnimation flag +is not set and this chunk is present, it MUST be ignored. +The content of the chunk is as follows: +● uint16 loopCount For animations, the number of times to loop this animation. 0 means infinite. +Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore +the extra bytes but MUST preserve them when modifying the file. + +FRM chunk (per-frame animation parameters) +For images that are animations, these chunks contain the per-frame parameters of the animation. +The content of the chunk is as follows: +● uint32 frameX: x coordinate of the upper left corner of the frame. For images using the VP8 +codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in +more details later. +● uint32 frameY: y coordinate of the upper left corner of the frame. For images using the VP8 +codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in +more details later. +● uint32 frameWidth: width of the frame. For images using the VP8 codec, it MUST be divisible +by 16 or such that frameX+frameWidth==canvasWidth. Other codecs MAY specify other +constraints. Desribed in more details later. +● uint32 frameHeight: height. For images using the VP8 codec, it MUST be divisible by 16 or +such that frameY+frameHeight==canvasHeight. Other codecs MAY specify other constraints. +Desribed in more details later. +● uint16 frameDuration Time to wait before displaying the next tile, in 1ms unit. +. +Rationale: the requirement for corner coordinates to be divisible by 32 means that pixels on U and V +planes are aligned to 16 byte boundary (even after a rotation), what may help with vector instructions +on some architectures. Also, this makes the tiles also aligned to 16-pixel macroblock boundaries. +Rationale: the requirement for the width and height to be divisible by 16 or touching the edge of +the canvas simplifies the handling of macroblocks that are on the edge of a tile - VP8 decoders can +overwrite pixels outside the boundary in such a macroblock and this guarantees they won’t overwrite +any data. +Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore +the extra bytes but MUST preserve them when modifying the file. + +TILE chunks (tile parameters) + + This chunk contains information about a single tile and describes the bitstream chunk that proceeds it. +The content of such a chunk is as follows: + +● uint32 tileCanvasX: x coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be +divisible by 32. Other codecs MAY specify other constraints. +● uint32 tileCanvasY: y coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be +divisible by 32. Other codecs MAY specify other constraints. +Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore +the extra bytes but MUST preserve them when modifying the file. +As described earlier, the TILE chunk is followed by a VP8 data. From that chun, we can read the height +and width of the tile, that we will denote by tileWidth and tileHeight. In the case of VP8, we have the +following constraints: +● The width of a tile MUST be divisible by 16 or there MUST be tileCanvasX+tileWidth == +canvasWidth. +● The height of a tile MUST be divisible by 16 or there MUST be tileCanvasY+tileHeight == +canvasHeight. + +ICCP chunk (color profile) +An optional “ICCP” chunk contains an ICC profile. There SHOULD be at most one such chunk. +The first byte of the chunk is the compression type. Two values are currently defined: a value of +0 means no compression, while a value of 1 means deflate/inflate compression. It is followed by a +compressed or non-compressed ICC profile - see www.color.org for specifications. +The color profile can be a v2 or v4 profile. If this chunk is missing, sRGB SHOULD be assumed. + +META chunk (compressed XMP metadata) +Such a chunk (if present) contains XMP metadata. There SHOULD be at most one such chunk. If +there are more such chunks, readers SHOULD ignore all except the first one. The first byte specifies +compression type. Two values are currently defined: a value of 0 means no compression, while a +value of 1 means deflate/inflate compression. It is followed by a compressed or non-compressed XMP +metadata packet. + +XMP packets are XML text specified in http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/ +pdfs/XMPSpecificationPart1.pdf. The chunk tag is different from the one specified by Adobe for WAV +and AVI (also RIFF formats) because we have the options of compression. +Additional guidance about handling metadata can be found at: http://www.metadataworkinggroup.org/ +pdf/mwg_guidance.pdf . Note that the sections of the document about reconciliation of EXIF, XMP +and IPTC-IIM don't apply to WebP, as WebP supports only XMP, thus no reconciliation is necessary. + + Other chunks +A file MAY contain other chunks, defined in some future specification. Such chunks MUST be +ignored, but preserved. Writers SHOULD try to preserve them in the original order. + + \ No newline at end of file