mirror of
https://github.com/webmproject/libwebp.git
synced 2024-12-27 22:28:22 +01:00
55be2cf878
new file: doc/webp-container-spec.txt Change-Id: I60b97d6f0219f0041c92b6d980cd8ebae8ae4ca5
328 lines
19 KiB
Plaintext
328 lines
19 KiB
Plaintext
WebP container specification - Working Draft (V0.1 Date 09/26)
|
||
Terminology
|
||
Basics
|
||
Single-image WebP files
|
||
Chunks layout
|
||
Images without special layout
|
||
Images with special layout
|
||
Assembling the canvas from tiles and animation
|
||
Bitstream chunk(s) (VP8)
|
||
VP8X chunk (special layout)
|
||
LOOP chunk (global animation parameters)
|
||
FRM chunk (per-frame animation parameters)
|
||
TILE chunks (tile parameters)
|
||
ICCP chunk (color profile)
|
||
META chunk (compressed XMP metadata)
|
||
Other chunks
|
||
|
||
WebP container specification - Working Draft (V0.1 Date 09/26)
|
||
WebP is a still image format that uses the VP8 key frame encoding (and, possibly, other codecs in the
|
||
future) to compress image data in a lossy way. The VP8 encoding should make it more efficient than
|
||
currently used formats. It is optimized for fast image transfer over the network (e.g., for WWW sites).
|
||
However, it also aims for feature parity (like Color profile, XMP Metadata, Animation etc) with other
|
||
formats. This document describes the structure of such a file.
|
||
The first version of WebP handled only the basic use-case - a file having a single image (being one
|
||
VP8 key frame) with no metadata. However, the use of a RIFF container allowed to extend it. This
|
||
document extends it by additionally introducing support for:
|
||
● Metadata and color profiles. We specify chunks that can contain this information, like can
|
||
other popular formats.
|
||
● Tiling. A single VP8 frame has an inherent limitation for width or height of 2^14 pixels and
|
||
a 512kB limit on the size of first compressed partition. To support larger images, we support
|
||
images that are composed of multiple tiles, each encoded as a separate VP8 frame. All tiles
|
||
form logically a single image - they have common metadata, color profile etc. Tiling may also
|
||
improve efficiency for larger images - grass can be encoded differently than sky.
|
||
● Animation. An image may have pauses between frames, making it an animation.
|
||
Files not using these new features are backward compatible with the original format. Using these
|
||
features will produce files that are not compatible with older programs.
|
||
|
||
Terminology
|
||
A WebP file contains either a still image (i.e. an encoded matrix of pixels) or an animation (see below)
|
||
with, possibly, a color profile, metadata etc. In case we need to refer only to the matrix of pixels, we
|
||
|
||
will call is the canvas of the image.
|
||
The canvas of an image is built from one or multiple tiles. Each tile is a separately encoded VP8 key
|
||
frame (other codec are possible in the future). Building an image from several tiles allows to overcome
|
||
the size limitations of a single VP8 frame. Tiling is supposed to be an internal detail of the file - they
|
||
are not supposed to be exposed to the user.
|
||
|
||
Basics
|
||
This section introduces basic terms used throughout the document.
|
||
Code reading WebP files will be referred to as readers, while code writing them will be referred as
|
||
writers.
|
||
A 16-bit, little-endian, unsigned integer will be denoted as uint16.
|
||
A 32-bit, little-endian, unsigned integer will be denoted as uint32.
|
||
The basic element of a RIFF file is a chunk. It consist of:
|
||
● 4 ASCII characters that will be called the chunk tag.
|
||
● uint32 with the size of the chunk content (that will be denoted as ckSize).
|
||
● ckSize bytes of content.
|
||
● If ckSize is odd, a single padding byte that SHOULD be 0.
|
||
A chunk with a tag “ABCD” will be also called a chunk of type “ABCD”. Note that, in this
|
||
specification, all chunk tag characters are in file order, not in byte order of an uint32 of any particular
|
||
architecture.
|
||
Note that the padding MUST be also added to the last chunk of the file.
|
||
A list of chunks is a concatenation of multiple chunks. We will call the first chunk as having
|
||
position 0, the second as position 1 etc. By chunk with index 0 among “ABCD” we will mean the first
|
||
chunk among the chunks of type “ABCD” in the list, the chunk with index 1 among “ABCD” is the
|
||
second such chunk, etc.
|
||
A WebP file MUST begin with a single chunk with a tag “RIFF”. All other defined chunks are
|
||
within this chunk. It SHOULD NOT contain anything after it.
|
||
The maximum size of RIFF's ckSize is 2^32 – 10 bytes (i.e. the size of the whole file is at most 4GiB
|
||
– 2 bytes).
|
||
Note: some RIFF libraries are said to have bugs when handling files larger than 1GiB or 2GiB. If
|
||
you are using an existing library, check that it handles large files correctly.
|
||
The first four bytes of the RIFF chunk contents (i.e. bytes 8-11 of the file) MUST be the ASCII
|
||
string “WEBP”. They are followed by a list of chunks. Note that as the size of any chunk is even, the
|
||
size of the RIFF chunk is also even.
|
||
The content of the chunks in that list will be described in the following sections.
|
||
Note: RIFF has a convention that all-uppercase chunks are standard chunks that apply to any
|
||
RIFF file format, while chunks specific for a file format are all-lowercase. WebP doesn’t follow this
|
||
|
||
convention.
|
||
|
||
Single-image WebP files
|
||
First, we will describe a subset of WebP files – files containing only one image (later, we will use it
|
||
to define multi-image files - file having several different images).
|
||
|
||
Chunks layout
|
||
This section describes what chunks and in what order may appear in a single-image WebP file. The
|
||
content of these chunks will be described in subsequent sections.
|
||
The first chunk inside the RIFF chunk MUST be with a tag of “VP8 ” (note the space as the last
|
||
character) or “VP8X”. Other tags for the first chunk MAY be introduced by future specifications if we
|
||
add new codecs. This tag of the first chunk determines which of the two possible layouts is used.
|
||
Rationale: we fix the possible tags of the first chunk so that it is possible to introduce other codecs, to
|
||
keep the “WEBP” signature at the beginning of RIFF chunk, while still being able to check the codec
|
||
used by the image by inspecting the byte stream at a fixed position.
|
||
The two possible layouts will be called images without special layout and images with special layout.
|
||
Images without special layout
|
||
|
||
If the first subchunk of RIFF has the tag “VP8 ”, the file contains an image without special layout.
|
||
This layout SHOULD be used if the image doesn’t require advanced features: color profiles, XMP
|
||
metadata, animation or tiling. Files with this layout are smaller and supported by older software.
|
||
Such images consist of:
|
||
● A “VP8 ” chunk with the bitstream of the single tile.
|
||
.
|
||
Example: An example layout of such a file looks as follows:
|
||
RIFF/WEBP
|
||
+- VP8 (bitstream of the single tile of the image)
|
||
Images with special layout
|
||
|
||
If the first subchunk of RIFF has the tag “VP8X” (other tags may be introduced by future
|
||
specifications, if new codecs are added), the file contains an image with special layout.
|
||
Note: older readers are not supporting images with special layout and will fail for images having them.
|
||
Such an image consists:
|
||
● A “VP8X” chunk with information about features used in this file.
|
||
● An optional “ICCP” chunk with color profile.
|
||
● An optional “LOOP” chunk with animation control data.
|
||
|
||
● Data for all the frames.
|
||
● An optional “META” chunk with XMP metadata.
|
||
● Some other chunks may be defined by future specifications and placed anywhere in the file.
|
||
As will be described in the “VP8X” chunk description, by checking a flag one can distinguish animated
|
||
and non-animated images. A non-animated image has exactly one frame. An animated one may have
|
||
multiple frames. Data for each frame consists of:
|
||
● An optional “FRM ” (note the space as the last character) chunk with animation frame
|
||
metadata. It MUST be present in animated images at the beginning of data for that frame. It
|
||
MUST NOT be present in non-animated images.
|
||
● An optional “TILE” chunk with tile position metadata. It MUST be present at the beginning of
|
||
data of image that’s represented as multiple tile images.
|
||
● A “VP8 ” chunk with the bitstream of the tile.
|
||
All chunks MUST be placed in the same order as listed above (except for unknown chunks, that MAY
|
||
appear anywhere). If a chunk appears in a wrong place, the file is invalid, but readers MAY parse the
|
||
file ignoring the chunks that come too late.
|
||
Rationale: setting the order of chunks should allow to quickly stop the search for e.g., the ICCP if it
|
||
is not present in the file. The rule of ignoring late chunks should make programs that needs to do a full
|
||
search give the same results as the ones stopping early.
|
||
|
||
Example: An example layout of a non-animated, tiled image may look as follows:
|
||
RIFF/WEBP
|
||
+- VP8X (descriptions of features used)
|
||
+- ICCP (color profile)
|
||
+- TILE (First tile parameters)
|
||
+- VP8 (bitstream - first tile)
|
||
+- TILE (Second tile parameters)
|
||
+- VP8 (bitstream - second tile)
|
||
+- TILE (third tile parameters)
|
||
+- VP8 (bitstream - third tile)
|
||
+- TILE (fourth tile parameters)
|
||
+- VP8 (bitstream - fourth tile)
|
||
+- META (XMP metadata)
|
||
Example: An example layout of an animated image may look as follows:
|
||
RIFF/WEBP
|
||
+- VP8X (descriptions of features used)
|
||
+- LOOP (animation control parameters)
|
||
+- FRM (first animation frame parameters)
|
||
+- VP8 (bitstream - first image frame)
|
||
+- FRM (second animation frame parameters)
|
||
+- VP8 (bitstream - second image frame)
|
||
+- META(XMP metadata)
|
||
|
||
Assembling the canvas from tiles and animation
|
||
Contents of the chunks will be described in details in subsequent section. Here, we provide an overview
|
||
how they are used to assemble the canvas. The notation VP8X.canvasWidth means the field in
|
||
the “VP8X” described as canvasWidth.
|
||
Decoding a non-animated canvas MUST be equivalent to the following pseudo-code:
|
||
● assert not VP8X.flags.haveAnimation
|
||
● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
|
||
● tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||
● for chunk in data_for_all_frames:
|
||
○ if chunk.tag == “TILE”:
|
||
■ assert No other TILE chunk after the last “VP8 ” chunk
|
||
■ tile_params = chunk
|
||
○ if chunk.tag == “VP8 ”:
|
||
■
|
||
render image in chunk in canvas with top-left corner in
|
||
(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry in
|
||
VP8X.flags.rotationAndSymmetry.
|
||
■ tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||
○ Ignore unknown chunks
|
||
● canvas contains the decoded canvas.
|
||
Decoding an animated canvas MUST be equivalent to the following pseudo-code:
|
||
● assert VP8X.flags.haveAnimation
|
||
● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
|
||
● if LOOP.loopCount==0:
|
||
○ LOOP.loopCount=∞
|
||
● current_FRM ← nil
|
||
● for LOOP.loop = 0, …, LOOP.loopCount-1
|
||
○ assert First chunk in data_for_all_frames is a FRM
|
||
○ for chunk in data_for_all_frames:
|
||
■ if chunk.tag == “FRM ”:
|
||
● if current_FRM != nil:
|
||
○ Show the contents of canvas for
|
||
current_FRM.frameDuration*10ms.
|
||
● current_FRM = chunk
|
||
■ if chunk.tag == “VP8 ”:
|
||
● assert tile_params.tileCanvasX >= current_FRM.frameX
|
||
● assert tile_params.tileCanvasY >= current_FRM.frameY
|
||
● assert tile_params.tileCanvasX + chunk.tileWidth >=
|
||
current_FRM.frameX + current_FRM.frameWidth
|
||
● assert tile_params.tileCanvasY + chunk.tileHeight >=
|
||
current_FRM.frameX + current_FRM.frameHeight
|
||
●render image in chunk in canvas with top-left corner in
|
||
(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry
|
||
in VP8X.flags.rotationAndSymmetry.
|
||
● tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||
■ Ignore unknown chunks
|
||
|
||
● canvas contains the decoded canvas.
|
||
As described earlier, if an assert related to chunk ordering fails, the reader MAY ignore the badly-ordered
|
||
chunks instead of failing to decode the file.
|
||
|
||
Bitstream chunk(s) (VP8)
|
||
These chunks contain compressed image data. Currently, the only allowed bitstream is VP8 and
|
||
uses “VP8 ” (note the space as the last character) as its tag. We will refer to all chunks with this tag
|
||
as bitstream chunks. As described earlier, images without special layout have a single bitstream chunk
|
||
as the first subchunk of RIFF, while images with special layout may contain several of them - one for
|
||
each tile.
|
||
The content of a “VP8 ” chunk MUST be one VP8 key frame (with optional padding – see below).
|
||
The current draft of a VP8 specification can be found at http://tools.ietf.org/html/draft-bankoski-vp8bitstream-04. Note that the VP8 frame header contains the VP8 frame width and height. It is assumed
|
||
to be the width and height of the tile.
|
||
The VP8 specification specifies how to decode the image into Y’CbCr format. To convert to RGB,
|
||
Rec. 601 SHOULD be used.
|
||
For compatibility with older readers, if the size of the frame is odd, writers SHOULD append a padding
|
||
byte (preferably 0) inside the chunk contents, making the chunk’s ckSize even. Newer readers MUST
|
||
support odd-sized tile chunks.
|
||
|
||
VP8X chunk (special layout)
|
||
As described earlier, a chunk with tag “VP8X”, is the first chunk of images with special layout. It is
|
||
used to enable advanced features of WebP.
|
||
The content of the chunk is as follows:
|
||
● uint32 flags. The following bits are currently used (with 0 being the least significant bit):
|
||
○ bit 0: haveTile: set if the image is represented by Tiles.
|
||
○ bit 1: haveAnimation: set if the file is an animation. Data in “LOOP” and “FRM ”
|
||
chunks should be used to control the animation.
|
||
○ bit 2: haveIccp: set if the file contains a “ICCP” chunk with a color profile. If a file
|
||
contains an “ICCP” chunk but this bit is not set, the error is flagged while constructing
|
||
the Mux-Container.
|
||
○ bit 3: haveMetadat: set if the file contains a “META” chunk with a XMP metadata.
|
||
If a file contains an “META” chunk but this bit is not set, the error is flagged while
|
||
constructing the Mux-Container.
|
||
Future specification MAY define other bits in flags. Bits not defined by this specification
|
||
MUST be preserved when modifying the file.
|
||
● uint32 canvasWidth: width of the canvas in pixels (after the optional rotation or symmetry - see
|
||
below).
|
||
● uint32 canvasHeight: height of the canvas in pixels (after the optional rotation or symmetry see below).
|
||
|
||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||
the extra bytes but MUST preserve them when modifying the file.
|
||
|
||
LOOP chunk (global animation parameters)
|
||
For images that are animations, this chunk contains the global parameters of the animation.
|
||
This chunks MUST appear if the haveAnimation flag in chunk VP8X is set. If the haveAnimation flag
|
||
is not set and this chunk is present, it MUST be ignored.
|
||
The content of the chunk is as follows:
|
||
● uint16 loopCount For animations, the number of times to loop this animation. 0 means infinite.
|
||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||
the extra bytes but MUST preserve them when modifying the file.
|
||
|
||
FRM chunk (per-frame animation parameters)
|
||
For images that are animations, these chunks contain the per-frame parameters of the animation.
|
||
The content of the chunk is as follows:
|
||
● uint32 frameX: x coordinate of the upper left corner of the frame. For images using the VP8
|
||
codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in
|
||
more details later.
|
||
● uint32 frameY: y coordinate of the upper left corner of the frame. For images using the VP8
|
||
codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in
|
||
more details later.
|
||
● uint32 frameWidth: width of the frame. For images using the VP8 codec, it MUST be divisible
|
||
by 16 or such that frameX+frameWidth==canvasWidth. Other codecs MAY specify other
|
||
constraints. Desribed in more details later.
|
||
● uint32 frameHeight: height. For images using the VP8 codec, it MUST be divisible by 16 or
|
||
such that frameY+frameHeight==canvasHeight. Other codecs MAY specify other constraints.
|
||
Desribed in more details later.
|
||
● uint16 frameDuration Time to wait before displaying the next tile, in 1ms unit.
|
||
.
|
||
Rationale: the requirement for corner coordinates to be divisible by 32 means that pixels on U and V
|
||
planes are aligned to 16 byte boundary (even after a rotation), what may help with vector instructions
|
||
on some architectures. Also, this makes the tiles also aligned to 16-pixel macroblock boundaries.
|
||
Rationale: the requirement for the width and height to be divisible by 16 or touching the edge of
|
||
the canvas simplifies the handling of macroblocks that are on the edge of a tile - VP8 decoders can
|
||
overwrite pixels outside the boundary in such a macroblock and this guarantees they won’t overwrite
|
||
any data.
|
||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||
the extra bytes but MUST preserve them when modifying the file.
|
||
|
||
TILE chunks (tile parameters)
|
||
|
||
This chunk contains information about a single tile and describes the bitstream chunk that proceeds it.
|
||
The content of such a chunk is as follows:
|
||
|
||
● uint32 tileCanvasX: x coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be
|
||
divisible by 32. Other codecs MAY specify other constraints.
|
||
● uint32 tileCanvasY: y coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be
|
||
divisible by 32. Other codecs MAY specify other constraints.
|
||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||
the extra bytes but MUST preserve them when modifying the file.
|
||
As described earlier, the TILE chunk is followed by a VP8 data. From that chun, we can read the height
|
||
and width of the tile, that we will denote by tileWidth and tileHeight. In the case of VP8, we have the
|
||
following constraints:
|
||
● The width of a tile MUST be divisible by 16 or there MUST be tileCanvasX+tileWidth ==
|
||
canvasWidth.
|
||
● The height of a tile MUST be divisible by 16 or there MUST be tileCanvasY+tileHeight ==
|
||
canvasHeight.
|
||
|
||
ICCP chunk (color profile)
|
||
An optional “ICCP” chunk contains an ICC profile. There SHOULD be at most one such chunk.
|
||
The first byte of the chunk is the compression type. Two values are currently defined: a value of
|
||
0 means no compression, while a value of 1 means deflate/inflate compression. It is followed by a
|
||
compressed or non-compressed ICC profile - see www.color.org for specifications.
|
||
The color profile can be a v2 or v4 profile. If this chunk is missing, sRGB SHOULD be assumed.
|
||
|
||
META chunk (compressed XMP metadata)
|
||
Such a chunk (if present) contains XMP metadata. There SHOULD be at most one such chunk. If
|
||
there are more such chunks, readers SHOULD ignore all except the first one. The first byte specifies
|
||
compression type. Two values are currently defined: a value of 0 means no compression, while a
|
||
value of 1 means deflate/inflate compression. It is followed by a compressed or non-compressed XMP
|
||
metadata packet.
|
||
|
||
XMP packets are XML text specified in http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/
|
||
pdfs/XMPSpecificationPart1.pdf. The chunk tag is different from the one specified by Adobe for WAV
|
||
and AVI (also RIFF formats) because we have the options of compression.
|
||
Additional guidance about handling metadata can be found at: http://www.metadataworkinggroup.org/
|
||
pdf/mwg_guidance.pdf . Note that the sections of the document about reconciliation of EXIF, XMP
|
||
and IPTC-IIM don't apply to WebP, as WebP supports only XMP, thus no reconciliation is necessary.
|
||
|
||
Other chunks
|
||
A file MAY contain other chunks, defined in some future specification. Such chunks MUST be
|
||
ignored, but preserved. Writers SHOULD try to preserve them in the original order.
|
||
|
||
|