mirror of
https://github.com/webmproject/libwebp.git
synced 2025-01-24 05:33:56 +01:00
Initial import of container spec document, from pdftotext transform.
new file: doc/webp-container-spec.txt Change-Id: I60b97d6f0219f0041c92b6d980cd8ebae8ae4ca5
This commit is contained in:
parent
2b877cd0c8
commit
55be2cf878
328
doc/webp-container-spec.txt
Normal file
328
doc/webp-container-spec.txt
Normal file
@ -0,0 +1,328 @@
|
||||
WebP container specification - Working Draft (V0.1 Date 09/26)
|
||||
Terminology
|
||||
Basics
|
||||
Single-image WebP files
|
||||
Chunks layout
|
||||
Images without special layout
|
||||
Images with special layout
|
||||
Assembling the canvas from tiles and animation
|
||||
Bitstream chunk(s) (VP8)
|
||||
VP8X chunk (special layout)
|
||||
LOOP chunk (global animation parameters)
|
||||
FRM chunk (per-frame animation parameters)
|
||||
TILE chunks (tile parameters)
|
||||
ICCP chunk (color profile)
|
||||
META chunk (compressed XMP metadata)
|
||||
Other chunks
|
||||
|
||||
WebP container specification - Working Draft (V0.1 Date 09/26)
|
||||
WebP is a still image format that uses the VP8 key frame encoding (and, possibly, other codecs in the
|
||||
future) to compress image data in a lossy way. The VP8 encoding should make it more efficient than
|
||||
currently used formats. It is optimized for fast image transfer over the network (e.g., for WWW sites).
|
||||
However, it also aims for feature parity (like Color profile, XMP Metadata, Animation etc) with other
|
||||
formats. This document describes the structure of such a file.
|
||||
The first version of WebP handled only the basic use-case - a file having a single image (being one
|
||||
VP8 key frame) with no metadata. However, the use of a RIFF container allowed to extend it. This
|
||||
document extends it by additionally introducing support for:
|
||||
● Metadata and color profiles. We specify chunks that can contain this information, like can
|
||||
other popular formats.
|
||||
● Tiling. A single VP8 frame has an inherent limitation for width or height of 2^14 pixels and
|
||||
a 512kB limit on the size of first compressed partition. To support larger images, we support
|
||||
images that are composed of multiple tiles, each encoded as a separate VP8 frame. All tiles
|
||||
form logically a single image - they have common metadata, color profile etc. Tiling may also
|
||||
improve efficiency for larger images - grass can be encoded differently than sky.
|
||||
● Animation. An image may have pauses between frames, making it an animation.
|
||||
Files not using these new features are backward compatible with the original format. Using these
|
||||
features will produce files that are not compatible with older programs.
|
||||
|
||||
Terminology
|
||||
A WebP file contains either a still image (i.e. an encoded matrix of pixels) or an animation (see below)
|
||||
with, possibly, a color profile, metadata etc. In case we need to refer only to the matrix of pixels, we
|
||||
|
||||
will call is the canvas of the image.
|
||||
The canvas of an image is built from one or multiple tiles. Each tile is a separately encoded VP8 key
|
||||
frame (other codec are possible in the future). Building an image from several tiles allows to overcome
|
||||
the size limitations of a single VP8 frame. Tiling is supposed to be an internal detail of the file - they
|
||||
are not supposed to be exposed to the user.
|
||||
|
||||
Basics
|
||||
This section introduces basic terms used throughout the document.
|
||||
Code reading WebP files will be referred to as readers, while code writing them will be referred as
|
||||
writers.
|
||||
A 16-bit, little-endian, unsigned integer will be denoted as uint16.
|
||||
A 32-bit, little-endian, unsigned integer will be denoted as uint32.
|
||||
The basic element of a RIFF file is a chunk. It consist of:
|
||||
● 4 ASCII characters that will be called the chunk tag.
|
||||
● uint32 with the size of the chunk content (that will be denoted as ckSize).
|
||||
● ckSize bytes of content.
|
||||
● If ckSize is odd, a single padding byte that SHOULD be 0.
|
||||
A chunk with a tag “ABCD” will be also called a chunk of type “ABCD”. Note that, in this
|
||||
specification, all chunk tag characters are in file order, not in byte order of an uint32 of any particular
|
||||
architecture.
|
||||
Note that the padding MUST be also added to the last chunk of the file.
|
||||
A list of chunks is a concatenation of multiple chunks. We will call the first chunk as having
|
||||
position 0, the second as position 1 etc. By chunk with index 0 among “ABCD” we will mean the first
|
||||
chunk among the chunks of type “ABCD” in the list, the chunk with index 1 among “ABCD” is the
|
||||
second such chunk, etc.
|
||||
A WebP file MUST begin with a single chunk with a tag “RIFF”. All other defined chunks are
|
||||
within this chunk. It SHOULD NOT contain anything after it.
|
||||
The maximum size of RIFF's ckSize is 2^32 – 10 bytes (i.e. the size of the whole file is at most 4GiB
|
||||
– 2 bytes).
|
||||
Note: some RIFF libraries are said to have bugs when handling files larger than 1GiB or 2GiB. If
|
||||
you are using an existing library, check that it handles large files correctly.
|
||||
The first four bytes of the RIFF chunk contents (i.e. bytes 8-11 of the file) MUST be the ASCII
|
||||
string “WEBP”. They are followed by a list of chunks. Note that as the size of any chunk is even, the
|
||||
size of the RIFF chunk is also even.
|
||||
The content of the chunks in that list will be described in the following sections.
|
||||
Note: RIFF has a convention that all-uppercase chunks are standard chunks that apply to any
|
||||
RIFF file format, while chunks specific for a file format are all-lowercase. WebP doesn’t follow this
|
||||
|
||||
convention.
|
||||
|
||||
Single-image WebP files
|
||||
First, we will describe a subset of WebP files – files containing only one image (later, we will use it
|
||||
to define multi-image files - file having several different images).
|
||||
|
||||
Chunks layout
|
||||
This section describes what chunks and in what order may appear in a single-image WebP file. The
|
||||
content of these chunks will be described in subsequent sections.
|
||||
The first chunk inside the RIFF chunk MUST be with a tag of “VP8 ” (note the space as the last
|
||||
character) or “VP8X”. Other tags for the first chunk MAY be introduced by future specifications if we
|
||||
add new codecs. This tag of the first chunk determines which of the two possible layouts is used.
|
||||
Rationale: we fix the possible tags of the first chunk so that it is possible to introduce other codecs, to
|
||||
keep the “WEBP” signature at the beginning of RIFF chunk, while still being able to check the codec
|
||||
used by the image by inspecting the byte stream at a fixed position.
|
||||
The two possible layouts will be called images without special layout and images with special layout.
|
||||
Images without special layout
|
||||
|
||||
If the first subchunk of RIFF has the tag “VP8 ”, the file contains an image without special layout.
|
||||
This layout SHOULD be used if the image doesn’t require advanced features: color profiles, XMP
|
||||
metadata, animation or tiling. Files with this layout are smaller and supported by older software.
|
||||
Such images consist of:
|
||||
● A “VP8 ” chunk with the bitstream of the single tile.
|
||||
.
|
||||
Example: An example layout of such a file looks as follows:
|
||||
RIFF/WEBP
|
||||
+- VP8 (bitstream of the single tile of the image)
|
||||
Images with special layout
|
||||
|
||||
If the first subchunk of RIFF has the tag “VP8X” (other tags may be introduced by future
|
||||
specifications, if new codecs are added), the file contains an image with special layout.
|
||||
Note: older readers are not supporting images with special layout and will fail for images having them.
|
||||
Such an image consists:
|
||||
● A “VP8X” chunk with information about features used in this file.
|
||||
● An optional “ICCP” chunk with color profile.
|
||||
● An optional “LOOP” chunk with animation control data.
|
||||
|
||||
● Data for all the frames.
|
||||
● An optional “META” chunk with XMP metadata.
|
||||
● Some other chunks may be defined by future specifications and placed anywhere in the file.
|
||||
As will be described in the “VP8X” chunk description, by checking a flag one can distinguish animated
|
||||
and non-animated images. A non-animated image has exactly one frame. An animated one may have
|
||||
multiple frames. Data for each frame consists of:
|
||||
● An optional “FRM ” (note the space as the last character) chunk with animation frame
|
||||
metadata. It MUST be present in animated images at the beginning of data for that frame. It
|
||||
MUST NOT be present in non-animated images.
|
||||
● An optional “TILE” chunk with tile position metadata. It MUST be present at the beginning of
|
||||
data of image that’s represented as multiple tile images.
|
||||
● A “VP8 ” chunk with the bitstream of the tile.
|
||||
All chunks MUST be placed in the same order as listed above (except for unknown chunks, that MAY
|
||||
appear anywhere). If a chunk appears in a wrong place, the file is invalid, but readers MAY parse the
|
||||
file ignoring the chunks that come too late.
|
||||
Rationale: setting the order of chunks should allow to quickly stop the search for e.g., the ICCP if it
|
||||
is not present in the file. The rule of ignoring late chunks should make programs that needs to do a full
|
||||
search give the same results as the ones stopping early.
|
||||
|
||||
Example: An example layout of a non-animated, tiled image may look as follows:
|
||||
RIFF/WEBP
|
||||
+- VP8X (descriptions of features used)
|
||||
+- ICCP (color profile)
|
||||
+- TILE (First tile parameters)
|
||||
+- VP8 (bitstream - first tile)
|
||||
+- TILE (Second tile parameters)
|
||||
+- VP8 (bitstream - second tile)
|
||||
+- TILE (third tile parameters)
|
||||
+- VP8 (bitstream - third tile)
|
||||
+- TILE (fourth tile parameters)
|
||||
+- VP8 (bitstream - fourth tile)
|
||||
+- META (XMP metadata)
|
||||
Example: An example layout of an animated image may look as follows:
|
||||
RIFF/WEBP
|
||||
+- VP8X (descriptions of features used)
|
||||
+- LOOP (animation control parameters)
|
||||
+- FRM (first animation frame parameters)
|
||||
+- VP8 (bitstream - first image frame)
|
||||
+- FRM (second animation frame parameters)
|
||||
+- VP8 (bitstream - second image frame)
|
||||
+- META(XMP metadata)
|
||||
|
||||
Assembling the canvas from tiles and animation
|
||||
Contents of the chunks will be described in details in subsequent section. Here, we provide an overview
|
||||
how they are used to assemble the canvas. The notation VP8X.canvasWidth means the field in
|
||||
the “VP8X” described as canvasWidth.
|
||||
Decoding a non-animated canvas MUST be equivalent to the following pseudo-code:
|
||||
● assert not VP8X.flags.haveAnimation
|
||||
● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
|
||||
● tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||||
● for chunk in data_for_all_frames:
|
||||
○ if chunk.tag == “TILE”:
|
||||
■ assert No other TILE chunk after the last “VP8 ” chunk
|
||||
■ tile_params = chunk
|
||||
○ if chunk.tag == “VP8 ”:
|
||||
■
|
||||
render image in chunk in canvas with top-left corner in
|
||||
(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry in
|
||||
VP8X.flags.rotationAndSymmetry.
|
||||
■ tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||||
○ Ignore unknown chunks
|
||||
● canvas contains the decoded canvas.
|
||||
Decoding an animated canvas MUST be equivalent to the following pseudo-code:
|
||||
● assert VP8X.flags.haveAnimation
|
||||
● canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
|
||||
● if LOOP.loopCount==0:
|
||||
○ LOOP.loopCount=∞
|
||||
● current_FRM ← nil
|
||||
● for LOOP.loop = 0, …, LOOP.loopCount-1
|
||||
○ assert First chunk in data_for_all_frames is a FRM
|
||||
○ for chunk in data_for_all_frames:
|
||||
■ if chunk.tag == “FRM ”:
|
||||
● if current_FRM != nil:
|
||||
○ Show the contents of canvas for
|
||||
current_FRM.frameDuration*10ms.
|
||||
● current_FRM = chunk
|
||||
■ if chunk.tag == “VP8 ”:
|
||||
● assert tile_params.tileCanvasX >= current_FRM.frameX
|
||||
● assert tile_params.tileCanvasY >= current_FRM.frameY
|
||||
● assert tile_params.tileCanvasX + chunk.tileWidth >=
|
||||
current_FRM.frameX + current_FRM.frameWidth
|
||||
● assert tile_params.tileCanvasY + chunk.tileHeight >=
|
||||
current_FRM.frameX + current_FRM.frameHeight
|
||||
●render image in chunk in canvas with top-left corner in
|
||||
(tile_params.tileCanvasX, tile_params.tileCanvasY) using the isometry
|
||||
in VP8X.flags.rotationAndSymmetry.
|
||||
● tile_params.tileCanvasX = tile_params.tileCanvasY = 0
|
||||
■ Ignore unknown chunks
|
||||
|
||||
● canvas contains the decoded canvas.
|
||||
As described earlier, if an assert related to chunk ordering fails, the reader MAY ignore the badly-ordered
|
||||
chunks instead of failing to decode the file.
|
||||
|
||||
Bitstream chunk(s) (VP8)
|
||||
These chunks contain compressed image data. Currently, the only allowed bitstream is VP8 and
|
||||
uses “VP8 ” (note the space as the last character) as its tag. We will refer to all chunks with this tag
|
||||
as bitstream chunks. As described earlier, images without special layout have a single bitstream chunk
|
||||
as the first subchunk of RIFF, while images with special layout may contain several of them - one for
|
||||
each tile.
|
||||
The content of a “VP8 ” chunk MUST be one VP8 key frame (with optional padding – see below).
|
||||
The current draft of a VP8 specification can be found at http://tools.ietf.org/html/draft-bankoski-vp8bitstream-04. Note that the VP8 frame header contains the VP8 frame width and height. It is assumed
|
||||
to be the width and height of the tile.
|
||||
The VP8 specification specifies how to decode the image into Y’CbCr format. To convert to RGB,
|
||||
Rec. 601 SHOULD be used.
|
||||
For compatibility with older readers, if the size of the frame is odd, writers SHOULD append a padding
|
||||
byte (preferably 0) inside the chunk contents, making the chunk’s ckSize even. Newer readers MUST
|
||||
support odd-sized tile chunks.
|
||||
|
||||
VP8X chunk (special layout)
|
||||
As described earlier, a chunk with tag “VP8X”, is the first chunk of images with special layout. It is
|
||||
used to enable advanced features of WebP.
|
||||
The content of the chunk is as follows:
|
||||
● uint32 flags. The following bits are currently used (with 0 being the least significant bit):
|
||||
○ bit 0: haveTile: set if the image is represented by Tiles.
|
||||
○ bit 1: haveAnimation: set if the file is an animation. Data in “LOOP” and “FRM ”
|
||||
chunks should be used to control the animation.
|
||||
○ bit 2: haveIccp: set if the file contains a “ICCP” chunk with a color profile. If a file
|
||||
contains an “ICCP” chunk but this bit is not set, the error is flagged while constructing
|
||||
the Mux-Container.
|
||||
○ bit 3: haveMetadat: set if the file contains a “META” chunk with a XMP metadata.
|
||||
If a file contains an “META” chunk but this bit is not set, the error is flagged while
|
||||
constructing the Mux-Container.
|
||||
Future specification MAY define other bits in flags. Bits not defined by this specification
|
||||
MUST be preserved when modifying the file.
|
||||
● uint32 canvasWidth: width of the canvas in pixels (after the optional rotation or symmetry - see
|
||||
below).
|
||||
● uint32 canvasHeight: height of the canvas in pixels (after the optional rotation or symmetry see below).
|
||||
|
||||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||||
the extra bytes but MUST preserve them when modifying the file.
|
||||
|
||||
LOOP chunk (global animation parameters)
|
||||
For images that are animations, this chunk contains the global parameters of the animation.
|
||||
This chunks MUST appear if the haveAnimation flag in chunk VP8X is set. If the haveAnimation flag
|
||||
is not set and this chunk is present, it MUST be ignored.
|
||||
The content of the chunk is as follows:
|
||||
● uint16 loopCount For animations, the number of times to loop this animation. 0 means infinite.
|
||||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||||
the extra bytes but MUST preserve them when modifying the file.
|
||||
|
||||
FRM chunk (per-frame animation parameters)
|
||||
For images that are animations, these chunks contain the per-frame parameters of the animation.
|
||||
The content of the chunk is as follows:
|
||||
● uint32 frameX: x coordinate of the upper left corner of the frame. For images using the VP8
|
||||
codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in
|
||||
more details later.
|
||||
● uint32 frameY: y coordinate of the upper left corner of the frame. For images using the VP8
|
||||
codec, it MUST be divisible by 32. Other codecs MAY specify other constraints. Described in
|
||||
more details later.
|
||||
● uint32 frameWidth: width of the frame. For images using the VP8 codec, it MUST be divisible
|
||||
by 16 or such that frameX+frameWidth==canvasWidth. Other codecs MAY specify other
|
||||
constraints. Desribed in more details later.
|
||||
● uint32 frameHeight: height. For images using the VP8 codec, it MUST be divisible by 16 or
|
||||
such that frameY+frameHeight==canvasHeight. Other codecs MAY specify other constraints.
|
||||
Desribed in more details later.
|
||||
● uint16 frameDuration Time to wait before displaying the next tile, in 1ms unit.
|
||||
.
|
||||
Rationale: the requirement for corner coordinates to be divisible by 32 means that pixels on U and V
|
||||
planes are aligned to 16 byte boundary (even after a rotation), what may help with vector instructions
|
||||
on some architectures. Also, this makes the tiles also aligned to 16-pixel macroblock boundaries.
|
||||
Rationale: the requirement for the width and height to be divisible by 16 or touching the edge of
|
||||
the canvas simplifies the handling of macroblocks that are on the edge of a tile - VP8 decoders can
|
||||
overwrite pixels outside the boundary in such a macroblock and this guarantees they won’t overwrite
|
||||
any data.
|
||||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||||
the extra bytes but MUST preserve them when modifying the file.
|
||||
|
||||
TILE chunks (tile parameters)
|
||||
|
||||
This chunk contains information about a single tile and describes the bitstream chunk that proceeds it.
|
||||
The content of such a chunk is as follows:
|
||||
|
||||
● uint32 tileCanvasX: x coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be
|
||||
divisible by 32. Other codecs MAY specify other constraints.
|
||||
● uint32 tileCanvasY: y coordinate of the upper left corner of the tile. For VP8 tiles, it MUST be
|
||||
divisible by 32. Other codecs MAY specify other constraints.
|
||||
Future specifications MAY add more fields. If a chunk of larger size is found, programs MUST ignore
|
||||
the extra bytes but MUST preserve them when modifying the file.
|
||||
As described earlier, the TILE chunk is followed by a VP8 data. From that chun, we can read the height
|
||||
and width of the tile, that we will denote by tileWidth and tileHeight. In the case of VP8, we have the
|
||||
following constraints:
|
||||
● The width of a tile MUST be divisible by 16 or there MUST be tileCanvasX+tileWidth ==
|
||||
canvasWidth.
|
||||
● The height of a tile MUST be divisible by 16 or there MUST be tileCanvasY+tileHeight ==
|
||||
canvasHeight.
|
||||
|
||||
ICCP chunk (color profile)
|
||||
An optional “ICCP” chunk contains an ICC profile. There SHOULD be at most one such chunk.
|
||||
The first byte of the chunk is the compression type. Two values are currently defined: a value of
|
||||
0 means no compression, while a value of 1 means deflate/inflate compression. It is followed by a
|
||||
compressed or non-compressed ICC profile - see www.color.org for specifications.
|
||||
The color profile can be a v2 or v4 profile. If this chunk is missing, sRGB SHOULD be assumed.
|
||||
|
||||
META chunk (compressed XMP metadata)
|
||||
Such a chunk (if present) contains XMP metadata. There SHOULD be at most one such chunk. If
|
||||
there are more such chunks, readers SHOULD ignore all except the first one. The first byte specifies
|
||||
compression type. Two values are currently defined: a value of 0 means no compression, while a
|
||||
value of 1 means deflate/inflate compression. It is followed by a compressed or non-compressed XMP
|
||||
metadata packet.
|
||||
|
||||
XMP packets are XML text specified in http://www.adobe.com/content/dam/Adobe/en/devnet/xmp/
|
||||
pdfs/XMPSpecificationPart1.pdf. The chunk tag is different from the one specified by Adobe for WAV
|
||||
and AVI (also RIFF formats) because we have the options of compression.
|
||||
Additional guidance about handling metadata can be found at: http://www.metadataworkinggroup.org/
|
||||
pdf/mwg_guidance.pdf . Note that the sections of the document about reconciliation of EXIF, XMP
|
||||
and IPTC-IIM don't apply to WebP, as WebP supports only XMP, thus no reconciliation is necessary.
|
||||
|
||||
Other chunks
|
||||
A file MAY contain other chunks, defined in some future specification. Such chunks MUST be
|
||||
ignored, but preserved. Writers SHOULD try to preserve them in the original order.
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user