mirror of
https://github.com/webmproject/libwebp.git
synced 2024-12-25 13:18:22 +01:00
Re-wrap at <= 72 columns
modified: doc/webp-lossless-bitstream-spec.txt Change-Id: Ie8d7aa907dc20d941b74455f8657d4e1b4e23bbb
This commit is contained in:
parent
a45adc1918
commit
2a4c6c29a0
@ -122,18 +122,20 @@ b |= ReadBits(1) << 1;
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
We assume that each color component (e.g. alpha, red, blue and green) is
|
||||
represented using an 8-bit byte. We define the corresponding type as uint8.
|
||||
A whole ARGB pixel is represented by a type called uint32, an unsigned
|
||||
integer consisting of 32 bits. In the code showing the behavior of the
|
||||
transformations, alpha value is codified in bits 31..24, red in bits
|
||||
23..16, green in bits 15..8 and blue in bits 7..0, but implementations of
|
||||
the format are free to use another representation internally.
|
||||
represented using an 8-bit byte. We define the corresponding type as
|
||||
uint8. A whole ARGB pixel is represented by a type called uint32, an
|
||||
unsigned integer consisting of 32 bits. In the code showing the behavior
|
||||
of the transformations, alpha value is codified in bits 31..24, red in
|
||||
bits 23..16, green in bits 15..8 and blue in bits 7..0, but
|
||||
implementations of the format are free to use another representation
|
||||
internally.
|
||||
|
||||
Broadly a WebP lossless image contains header data, transform information
|
||||
and actual image data. Headers contain width and height of the image. A
|
||||
WebP lossless image can go through five different types of transformation
|
||||
before being entropy encoded. The transform information in the bitstream
|
||||
contains the required data to apply the respective inverse transforms.
|
||||
Broadly a WebP lossless image contains header data, transform
|
||||
information and actual image data. Headers contain width and height of
|
||||
the image. A WebP lossless image can go through five different types of
|
||||
transformation before being entropy encoded. The transform information
|
||||
in the bitstream contains the required data to apply the respective
|
||||
inverse transforms.
|
||||
|
||||
|
||||
RIFF Header
|
||||
@ -143,20 +145,21 @@ The beginning of the header has the RIFF container. This consist of the
|
||||
following 21 bytes:
|
||||
|
||||
1. String "RIFF"
|
||||
2. A little-endian 32 bit value of the block length, the whole size of
|
||||
the block controlled by the RIFF header. Normally this equals the
|
||||
payload size (file size subtracted by 8 bytes, i.e., 4 bytes for
|
||||
'RIFF' identifier and 4 bytes for storing this value itself).
|
||||
2. A little-endian 32 bit value of the block length, the whole size
|
||||
of the block controlled by the RIFF header. Normally this equals
|
||||
the payload size (file size subtracted by 8 bytes, i.e., 4 bytes
|
||||
for 'RIFF' identifier and 4 bytes for storing this value itself).
|
||||
3. String "WEBP" (RIFF container name).
|
||||
4. String "VP8L" (chunk tag for lossless encoded image data).
|
||||
5. A little-endian 32-bit value of the number of bytes in the lossless
|
||||
stream.
|
||||
6. One byte signature 0x64. Decoders need to accept also 0x65 as a valid
|
||||
stream, it has a planned future use. Today, a solid white image of the
|
||||
specified size should be shown for images having a 0x2f signature.
|
||||
5. A little-endian 32-bit value of the number of bytes in the
|
||||
lossless stream.
|
||||
6. One byte signature 0x64. Decoders need to accept also 0x65 as a
|
||||
valid stream, it has a planned future use. Today, a solid white
|
||||
image of the specified size should be shown for images having a
|
||||
0x2f signature.
|
||||
|
||||
First 28 bits of the bitstream specify the width and height of the image.
|
||||
Width and height are decoded as 14-bit integers as follows:
|
||||
First 28 bits of the bitstream specify the width and height of the
|
||||
image. Width and height are decoded as 14-bit integers as follows:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int image_width = ReadBits(14) + 1;
|
||||
@ -176,9 +179,9 @@ correlations. Transformations can make the final compression more dense.
|
||||
|
||||
An image can go through four types of transformations. A 1 bit indicates
|
||||
the presence of a transform. Every transform is allowed to be used only
|
||||
once. The transformations are used only for the main level ARGB image -- the
|
||||
subresolution images have no transforms, not even the 0 bit indicating the
|
||||
end-of-transforms.
|
||||
once. The transformations are used only for the main level ARGB image --
|
||||
the subresolution images have no transforms, not even the 0 bit
|
||||
indicating the end-of-transforms.
|
||||
|
||||
Typically an encoder would use these transforms to reduce the Shannon
|
||||
entropy in the residual image. Also, the transform data can be decided
|
||||
@ -249,8 +252,8 @@ int block_index = (y >> size_bits) * block_xsize +
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are 14 different prediction modes. In each prediction mode, the
|
||||
current pixel value is predicted from one or more neighboring pixels whose
|
||||
values are already known.
|
||||
current pixel value is predicted from one or more neighboring pixels
|
||||
whose values are already known.
|
||||
|
||||
We choose the neighboring pixels (TL, T, TR, and L) of the current pixel
|
||||
(P) as follows:
|
||||
@ -266,8 +269,8 @@ X X X X X X X X X X X
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
where TL means top-left, T top, TR top-right, L left pixel.
|
||||
At the time of predicting a value for P, all pixels O, TL, T, TR and L have
|
||||
been already processed, and pixel P and all pixels X are unknown.
|
||||
At the time of predicting a value for P, all pixels O, TL, T, TR and L
|
||||
have been already processed, and pixel P and all pixels X are unknown.
|
||||
|
||||
Given the above neighboring pixels, the different prediction modes are
|
||||
defined as follows.
|
||||
@ -348,28 +351,28 @@ int ClampAddSubtractHalf(int a, int b) {
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
There are special handling rules for some border pixels. If there is a
|
||||
prediction transform, regardless of the mode [0..13] for these pixels, the
|
||||
predicted value for the left-topmost pixel of the image is 0xff000000, L-
|
||||
pixel for all pixels on the top row, and T-pixel for all pixels on the
|
||||
leftmost column.
|
||||
prediction transform, regardless of the mode [0..13] for these pixels,
|
||||
the predicted value for the left-topmost pixel of the image is
|
||||
0xff000000, L-pixel for all pixels on the top row, and T-pixel for all
|
||||
pixels on the leftmost column.
|
||||
|
||||
Addressing the TR-pixel for pixels on the rightmost column is exceptional.
|
||||
The pixels on the rightmost column are predicted by using the modes [0..13]
|
||||
just like pixels not on border, but by using the leftmost pixel on the same
|
||||
row as the current TR-pixel. The TR-pixel offset in memory is the same fo
|
||||
border and non-border pixels.
|
||||
Addressing the TR-pixel for pixels on the rightmost column is
|
||||
exceptional. The pixels on the rightmost column are predicted by using
|
||||
the modes [0..13] just like pixels not on border, but by using the
|
||||
leftmost pixel on the same row as the current TR-pixel. The TR-pixel
|
||||
offset in memory is the same for border and non-border pixels.
|
||||
|
||||
|
||||
### Color Transform
|
||||
|
||||
The goal of the color transform is to decorrelate the R, G and B values of
|
||||
each pixel. Color transform keeps the green (G) value as it is, transforms
|
||||
red (R) based on green and transforms blue (B) based on green and then
|
||||
based on red.
|
||||
The goal of the color transform is to decorrelate the R, G and B values
|
||||
of each pixel. Color transform keeps the green (G) value as it is,
|
||||
transforms red (R) based on green and transforms blue (B) based on green
|
||||
and then based on red.
|
||||
|
||||
As is the case for the predictor transform, first the image is divided into
|
||||
blocks and the same transform mode is used for all the pixels in a block.
|
||||
For each block there are three types of color transform elements.
|
||||
As is the case for the predictor transform, first the image is divided
|
||||
into blocks and the same transform mode is used for all the pixels in a
|
||||
block. For each block there are three types of color transform elements.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
typedef struct {
|
||||
@ -425,17 +428,17 @@ int8 ColorTransformDelta(int8 t, int8 c) {
|
||||
}
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The multiplication is to be done using more precision (with at least 16 bit
|
||||
dynamics). The sign extension property of the shift operation does not
|
||||
matter here: only the lowest 8 bits are used from the result, and there the
|
||||
sign extension shifting and unsigned shifting are consistent with each
|
||||
other.
|
||||
The multiplication is to be done using more precision (with at least
|
||||
16 bit dynamics). The sign extension property of the shift operation
|
||||
does not matter here: only the lowest 8 bits are used from the result,
|
||||
and there the sign extension shifting and unsigned shifting are
|
||||
consistent with each other.
|
||||
|
||||
Now we describe the contents of color transform data so that decoding can
|
||||
apply the inverse color transform and recover the original red and blue
|
||||
values. The first 4 bits of the color transform data contain the width and
|
||||
height of the image block in number of bits, just like the predictor
|
||||
transform:
|
||||
Now we describe the contents of color transform data so that decoding
|
||||
can apply the inverse color transform and recover the original red and
|
||||
blue values. The first 4 bits of the color transform data contain the
|
||||
width and height of the image block in number of bits, just like the
|
||||
predictor transform:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int size_bits = ReadStream(4);
|
||||
@ -444,13 +447,13 @@ int block_height = 1 << size_bits;
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The remaining part of the color transform data contains
|
||||
ColorTransformElement instances corresponding to each block of the image.
|
||||
ColorTransformElement instances are treated as pixels of an image and
|
||||
encoded using the methods described in section 4.
|
||||
ColorTransformElement instances corresponding to each block of the
|
||||
image. ColorTransformElement instances are treated as pixels of an image
|
||||
and encoded using the methods described in section 4.
|
||||
|
||||
During decoding ColorTransformElement instances of the blocks are decoded
|
||||
and the inverse color transform is applied on the ARGB values of the
|
||||
pixels. As mentioned earlier that inverse color transform is just
|
||||
During decoding ColorTransformElement instances of the blocks are
|
||||
decoded and the inverse color transform is applied on the ARGB values of
|
||||
the pixels. As mentioned earlier that inverse color transform is just
|
||||
subtracting ColorTransformElement values from the red and blue channels.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -481,9 +484,10 @@ void InverseTransform(uint8 red, uint8 green, uint8 blue,
|
||||
### Subtract Green Transform
|
||||
|
||||
The subtract green transform subtracts green values from red and blue
|
||||
values of each pixel. When this transform is present, the decoder needs to
|
||||
add the green value to both red and blue. There is no data associated with
|
||||
this transform. The decoder applies the inverse transform as follows:
|
||||
values of each pixel. When this transform is present, the decoder needs
|
||||
to add the green value to both red and blue. There is no data associated
|
||||
with this transform. The decoder applies the inverse transform as
|
||||
follows:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) {
|
||||
@ -492,63 +496,67 @@ void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) {
|
||||
}
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This transform is redundant as it can be modeled using the color transform.
|
||||
This transform is still often useful, and since it can extend the dynamics
|
||||
of the color transform, and there is no additional data here, this
|
||||
transform can be coded using less bits than a full blown color transform.
|
||||
This transform is redundant as it can be modeled using the color
|
||||
transform. This transform is still often useful, and since it can extend
|
||||
the dynamics of the color transform, and there is no additional data
|
||||
here, this transform can be coded using less bits than a full blown
|
||||
color transform.
|
||||
|
||||
|
||||
### Color Indexing Transform
|
||||
|
||||
If there are not many unique values of the pixels then it may be more
|
||||
efficient to create a color index array and replace the pixel values by the
|
||||
indices to this color index array. Color indexing transform is used to
|
||||
achieve that. In the context of the WebP lossless, we specifically do not
|
||||
call this transform a palette transform, since another slightly similar,
|
||||
but more dynamic concept exists within WebP lossless encoding, called color
|
||||
cache.
|
||||
efficient to create a color index array and replace the pixel values by
|
||||
the indices to this color index array. Color indexing transform is used
|
||||
to achieve that. In the context of the WebP lossless, we specifically do
|
||||
not call this transform a palette transform, since another slightly
|
||||
similar, but more dynamic concept exists within WebP lossless encoding,
|
||||
called color cache.
|
||||
|
||||
The color indexing transform checks for the number of unique ARGB values in
|
||||
the image. If that number is below a threshold (256), it creates an array
|
||||
of those ARGB values is created which replaces the pixel values with the
|
||||
corresponding index. The green channel of the pixels are replaced with the
|
||||
index, all alpha values are set to 255, all red and blue values to 0.
|
||||
The color indexing transform checks for the number of unique ARGB values
|
||||
in the image. If that number is below a threshold (256), it creates an
|
||||
array of those ARGB values is created which replaces the pixel values
|
||||
with the corresponding index. The green channel of the pixels are
|
||||
replaced with the index, all alpha values are set to 255, all red and
|
||||
blue values to 0.
|
||||
|
||||
The transform data contains color table size and the entries in the color
|
||||
table. The decoder reads the color indexing transform data as follow:
|
||||
The transform data contains color table size and the entries in the
|
||||
color table. The decoder reads the color indexing transform data as
|
||||
follow:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
// 8 bit value for color table size
|
||||
int color_table_size = ReadStream(8) + 1;
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The color table is stored using the image storage format itself. The color
|
||||
table can be obtained by reading an image, without the RIFF header, image
|
||||
size, and transforms, assuming an height of one pixel, and a width of
|
||||
color_table_size. The color table is always subtraction coded for reducing
|
||||
the entropy of this image. The deltas of palette colors contain typically
|
||||
much less entropy than the colors themselves leading to significant savings
|
||||
for smaller images. In decoding, every final color in the color table can
|
||||
be obtained by adding the previous color component values, by each ARGB-
|
||||
component separately and storing the least significant 8 bits of the
|
||||
result.
|
||||
The color table is stored using the image storage format itself. The
|
||||
color table can be obtained by reading an image, without the RIFF
|
||||
header, image size, and transforms, assuming an height of one pixel, and
|
||||
a width of color_table_size. The color table is always subtraction coded
|
||||
for reducing the entropy of this image. The deltas of palette colors
|
||||
contain typically much less entropy than the colors themselves leading
|
||||
to significant savings for smaller images. In decoding, every final
|
||||
color in the color table can be obtained by adding the previous color
|
||||
component values, by each ARGB-component separately and storing the
|
||||
least significant 8 bits of the result.
|
||||
|
||||
The inverse transform for the image is simply replacing the pixel values
|
||||
(which are indices to the color table) with the actual color table values.
|
||||
The indexing is done based on the green component of the ARGB color.
|
||||
(which are indices to the color table) with the actual color table
|
||||
values. The indexing is done based on the green component of the ARGB
|
||||
color.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
// Inverse transform
|
||||
argb = color_table[GREEN(argb)];
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
When the color table is of a small size (equal to or less than 16 colors),
|
||||
several pixels are bundled into a single pixel. The pixel bundling packs
|
||||
several (2, 4, or 8) pixels into a single pixel reducing the image width
|
||||
respectively. Pixel bundling allows for a more efficient joint distribution
|
||||
entropy coding of neighboring pixels, and gives some arithmetic coding like
|
||||
benefits to the entropy code, but it can only be used when there is a small
|
||||
amount of unique values.
|
||||
When the color table is of a small size (equal to or less than 16
|
||||
colors), several pixels are bundled into a single pixel. The pixel
|
||||
bundling packs several (2, 4, or 8) pixels into a single pixel reducing
|
||||
the image width respectively. Pixel bundling allows for a more efficient
|
||||
joint distribution entropy coding of neighboring pixels, and gives some
|
||||
arithmetic coding like benefits to the entropy code, but it can only be
|
||||
used when there is a small amount of unique values.
|
||||
|
||||
color_table_size specifies how many pixels are combined together:
|
||||
|
||||
@ -573,17 +581,17 @@ binary value.
|
||||
|
||||
The values are packed into the green component as follows:
|
||||
|
||||
* _width_bits_ = 1: for every x value where x ≡ 0 (mod 2), a green value
|
||||
at x is positioned into the 4 least-significant bits of the green
|
||||
value at x / 2, a green value at x + 1 is positioned into the 4 most-
|
||||
significant bits of the green value at x / 2.
|
||||
* _width_bits_ = 2: for every x value where x ≡ 0 (mod 4), a green value
|
||||
at x is positioned into the 2 least-significant bits of the green
|
||||
value at x / 4, green values at x + 1 to x + 3 in order to the more
|
||||
significant bits of the green value at x / 4.
|
||||
* _width_bits_ = 3: for every x value where x ≡ 0 (mod 8), a green value
|
||||
at x is positioned into the least-significant bit of the green value
|
||||
at x / 8, green values at x + 1 to x + 7 in order to the more
|
||||
* _width_bits_ = 1: for every x value where x ≡ 0 (mod 2), a green
|
||||
value at x is positioned into the 4 least-significant bits of the
|
||||
green value at x / 2, a green value at x + 1 is positioned into the
|
||||
4 most-significant bits of the green value at x / 2.
|
||||
* _width_bits_ = 2: for every x value where x ≡ 0 (mod 4), a green
|
||||
value at x is positioned into the 2 least-significant bits of the
|
||||
green value at x / 4, green values at x + 1 to x + 3 in order to the
|
||||
more significant bits of the green value at x / 4.
|
||||
* _width_bits_ = 3: for every x value where x ≡ 0 (mod 8), a green
|
||||
value at x is positioned into the least-significant bit of the green
|
||||
value at x / 8, green values at x + 1 to x + 7 in order to the more
|
||||
significant bits of the green value at x / 8.
|
||||
|
||||
|
||||
@ -591,61 +599,62 @@ Image Data
|
||||
----------
|
||||
|
||||
Image data is an array of pixel values in scan-line order. We use image
|
||||
data in five different roles: The main role, an auxiliary role related to
|
||||
entropy coding, and three further roles related to transforms.
|
||||
data in five different roles: The main role, an auxiliary role related
|
||||
to entropy coding, and three further roles related to transforms.
|
||||
|
||||
1. ARGB image.
|
||||
2. Entropy image. The red and green components define the meta Huffman
|
||||
code used in a particular area of the image.
|
||||
3. Predictor image. The green component defines which of the 14 values is
|
||||
used within a particular square of the image.
|
||||
4. Color indexing image. An array of up to 256 ARGB colors are used for
|
||||
transforming a green-only image, using the green value as an index to
|
||||
this one-dimensional array.
|
||||
5. Color transformation image. Defines signed 3.5 fixed-point multipliers
|
||||
that are used to predict the red, green, blue components to reduce
|
||||
entropy.
|
||||
3. Predictor image. The green component defines which of the 14 values
|
||||
is used within a particular square of the image.
|
||||
4. Color indexing image. An array of up to 256 ARGB colors are used
|
||||
for transforming a green-only image, using the green value as an
|
||||
index to this one-dimensional array.
|
||||
5. Color transformation image. Defines signed 3.5 fixed-point
|
||||
multipliers that are used to predict the red, green, blue
|
||||
components to reduce entropy.
|
||||
|
||||
To divide the image into multiple regions, the image is first divided into
|
||||
a set of fixed-size blocks (typically 16x16 blocks). Each of these blocks
|
||||
can be modeled using an entropy code, in a way where several blocks can
|
||||
share the same entropy code. There is a cost in transmitting an entropy
|
||||
code, and in order to minimize this cost, statistically similar blocks can
|
||||
share an entropy code. The blocks sharing an entropy code can be found by
|
||||
clustering their statistical properties, or by repeatedly joining two
|
||||
randomly selected clusters when it reduces the overall amount of bits
|
||||
needed to encode the image. [See section _"Decoding of meta Huffman codes"_
|
||||
in Chapter 5 for an explanation of how this _entropy image_ is stored.]
|
||||
To divide the image into multiple regions, the image is first divided
|
||||
into a set of fixed-size blocks (typically 16x16 blocks). Each of these
|
||||
blocks can be modeled using an entropy code, in a way where several
|
||||
blocks can share the same entropy code. There is a cost in transmitting
|
||||
an entropy code, and in order to minimize this cost, statistically
|
||||
similar blocks can share an entropy code. The blocks sharing an entropy
|
||||
code can be found by clustering their statistical properties, or by
|
||||
repeatedly joining two randomly selected clusters when it reduces the
|
||||
overall amount of bits needed to encode the image. [See section
|
||||
_"Decoding of meta Huffman codes"_ in Chapter 5 for an explanation of
|
||||
how this _entropy image_ is stored.]
|
||||
|
||||
Each pixel is encoded using one of three possible methods:
|
||||
|
||||
1. Huffman coded literals, where each channel (green, alpha, red, blue)
|
||||
is entropy-coded independently,
|
||||
2. LZ77, a sequence of pixels in scan-line order copied from elsewhere in
|
||||
the image, or,
|
||||
1. Huffman coded literals, where each channel (green, alpha, red,
|
||||
blue) is entropy-coded independently,
|
||||
2. LZ77, a sequence of pixels in scan-line order copied from elsewhere
|
||||
in the image, or,
|
||||
3. Color cache, using a short multiplicative hash code (color cache
|
||||
index) of a recently seen color.
|
||||
|
||||
In the following sections we introduce the main concepts in LZ77 prefix
|
||||
coding, LZ77 entropy coding, LZ77 distance mapping, and color cache codes.
|
||||
The actual details of the entropy code are described in more detail in
|
||||
chapter 5.
|
||||
coding, LZ77 entropy coding, LZ77 distance mapping, and color cache
|
||||
codes. The actual details of the entropy code are described in more
|
||||
detail in chapter 5.
|
||||
|
||||
|
||||
### LZ77 prefix coding
|
||||
|
||||
Prefix coding divides large integer values into two parts, the prefix code
|
||||
and the extra bits. The benefit of this approach is that entropy coding is
|
||||
later used only for the prefix code, reducing the resources needed by the
|
||||
entropy code. The extra bits are stored as they are, without an entropy
|
||||
code.
|
||||
Prefix coding divides large integer values into two parts, the prefix
|
||||
code and the extra bits. The benefit of this approach is that entropy
|
||||
coding is later used only for the prefix code, reducing the resources
|
||||
needed by the entropy code. The extra bits are stored as they are,
|
||||
without an entropy code.
|
||||
|
||||
This prefix code is used for coding backward reference lengths and
|
||||
distances. The extra bits form an integer that is added to the lower value
|
||||
of the range. Hence the LZ77 lengths and distances are divided into prefix
|
||||
codes and extra bits performing the Huffman coding only on the prefixes
|
||||
reduces the size of the Huffman codes to tens of values instead of
|
||||
otherwise a million (distance) or several thousands (length).
|
||||
distances. The extra bits form an integer that is added to the lower
|
||||
value of the range. Hence the LZ77 lengths and distances are divided
|
||||
into prefix codes and extra bits performing the Huffman coding only on
|
||||
the prefixes reduces the size of the Huffman codes to tens of values
|
||||
instead of otherwise a million (distance) or several thousands (length).
|
||||
|
||||
| Prefix code | Value range | Extra bits |
|
||||
| ----------- | --------------- | ---------- |
|
||||
@ -675,21 +684,23 @@ return offset + ReadBits(extra_bits) + 1;
|
||||
|
||||
### LZ77 backward reference entropy coding
|
||||
|
||||
Backward references are tuples of length and distance. Length indicates how
|
||||
many pixels in scan-line order are to be copied. The length is codified in
|
||||
two steps: prefix and extra bits. Only the first 24 prefix codes with their
|
||||
respective extra bits are used for length codes, limiting the maximum
|
||||
length to 4096. For distances, all 40 prefix codes are used.
|
||||
Backward references are tuples of length and distance. Length indicates
|
||||
how many pixels in scan-line order are to be copied. The length is
|
||||
codified in two steps: prefix and extra bits. Only the first 24 prefix
|
||||
codes with their respective extra bits are used for length codes,
|
||||
limiting the maximum length to 4096. For distances, all 40 prefix codes
|
||||
are used.
|
||||
|
||||
|
||||
### LZ77 distance mapping
|
||||
|
||||
120 smallest distance codes [1..120] are reserved for a close neighborhood
|
||||
within the current pixel. The rest are pure distance codes in scan-line
|
||||
order, just offset by 120. The smallest codes are coded into x and y
|
||||
offsets by the following table. Each tuple shows the x and the y
|
||||
coordinates in 2d offsets -- for example the first tuple (0, 1) means 0 for
|
||||
no difference in x, and 1 pixel difference in y (indicating previous row).
|
||||
120 smallest distance codes [1..120] are reserved for a close
|
||||
neighborhood within the current pixel. The rest are pure distance codes
|
||||
in scan-line order, just offset by 120. The smallest codes are coded
|
||||
into x and y offsets by the following table. Each tuple shows the x and
|
||||
the y coordinates in 2d offsets -- for example the first tuple (0, 1)
|
||||
means 0 for no difference in x, and 1 pixel difference in y (indicating
|
||||
previous row).
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
(0, 1), (1, 0), (1, 1), (-1, 1), (0, 2), (2, 0), (1, 2), (-1, 2),
|
||||
@ -709,23 +720,25 @@ no difference in x, and 1 pixel difference in y (indicating previous row).
|
||||
(-6, 7), (7, 6), (-7, 6), (8, 5), (7, 7), (-7, 7), (8, 6), (8, 7)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The distances codes that map into these tuples are changes into scan-line
|
||||
order distances using the following formula: _dist = x + y * xsize_, where
|
||||
_xsize_ is the width of the image in pixels.
|
||||
The distances codes that map into these tuples are changes into
|
||||
scan-line order distances using the following formula:
|
||||
_dist = x + y *xsize_, where _xsize_ is the width of the image in
|
||||
pixels.
|
||||
|
||||
|
||||
### Color Cache Code
|
||||
|
||||
Color cache stores a set of colors that have been recently used in the
|
||||
image. Using the color cache code, the color cache colors can be referred
|
||||
more efficiently than emitting the respective ARGB values independently or
|
||||
by sending them as backward references with a length of one pixel.
|
||||
image. Using the color cache code, the color cache colors can be
|
||||
referred more efficiently than emitting the respective ARGB values
|
||||
independently or by sending them as backward references with a length of
|
||||
one pixel.
|
||||
|
||||
Color cache codes are coded as follows. First, there is a bit that
|
||||
indicates if the color cache is used or not. If this bit is 0, no color
|
||||
cache codes exist, and they are not transmitted in the Huffman code that
|
||||
decodes the green symbols and the length prefix codes. However, if this bit
|
||||
is 1, the color cache size is read:
|
||||
decodes the green symbols and the length prefix codes. However, if this
|
||||
bit is 1, the color cache size is read:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int color_cache_code_bits = ReadBits(br, 4);
|
||||
@ -737,15 +750,15 @@ _color_cache_code_bits_). The range of allowed values for
|
||||
_color_cache_code_bits_ is [1..11]. Compliant decoders must indicate a
|
||||
corrupted bit stream for other values.
|
||||
|
||||
A color cache is an array of the size _color_cache_size_. Each entry stores
|
||||
one ARGB color. Colors are looked up by indexing them by (0x1e35a7bd *
|
||||
_color_) >> (32 - _color_cache_code_bits_). Only one lookup is done in a
|
||||
color cache, there is no conflict resolution.
|
||||
A color cache is an array of the size _color_cache_size_. Each entry
|
||||
stores one ARGB color. Colors are looked up by indexing them by
|
||||
(0x1e35a7bd * _color_) >> (32 - _color_cache_code_bits_). Only one
|
||||
lookup is done in a color cache, there is no conflict resolution.
|
||||
|
||||
In the beginning of decoding or encoding of an image, all entries in all
|
||||
color cache values are set to zero. The color cache code is converted to
|
||||
this color at decoding time. The state of the color cache is maintained by
|
||||
inserting every pixel, be it produced by backward referencing or as
|
||||
this color at decoding time. The state of the color cache is maintained
|
||||
by inserting every pixel, be it produced by backward referencing or as
|
||||
literals, into the cache in the order they appear in the stream.
|
||||
|
||||
|
||||
@ -754,29 +767,29 @@ Entropy Code
|
||||
|
||||
### Huffman coding
|
||||
|
||||
Most of the data is coded using a canonical Huffman code. This includes the
|
||||
following:
|
||||
Most of the data is coded using a canonical Huffman code. This includes
|
||||
the following:
|
||||
|
||||
* A combined code that defines either the value of the green
|
||||
component, a color cache code, or a prefix of the length codes,
|
||||
* the data for alpha, red and blue components, and
|
||||
* prefixes of the distance codes.
|
||||
|
||||
The Huffman codes are transmitted by sending the code lengths, the actual
|
||||
symbols are implicit and done in order for each length. The Huffman code
|
||||
lengths are run-length-encoded using three different prefixes, and the
|
||||
result of this coding is further Huffman coded.
|
||||
The Huffman codes are transmitted by sending the code lengths, the
|
||||
actual symbols are implicit and done in order for each length. The
|
||||
Huffman code lengths are run-length-encoded using three different
|
||||
prefixes, and the result of this coding is further Huffman coded.
|
||||
|
||||
|
||||
### Spatially-variant Huffman coding
|
||||
|
||||
For every pixel (x, y) in the image, there is a definition of which entropy
|
||||
code to use. First, there is an integer called 'meta Huffman code' that can
|
||||
be obtained from a subresolution 2d image. This meta Huffman code
|
||||
identifies a set of five Huffman codes, one for green (along with length
|
||||
codes and color cache codes), one for each of red, blue and alpha, and one
|
||||
for distance. The Huffman codes are identified by their position in a table
|
||||
by an integer.
|
||||
For every pixel (x, y) in the image, there is a definition of which
|
||||
entropy code to use. First, there is an integer called 'meta Huffman
|
||||
code' that can be obtained from a subresolution 2d image. This
|
||||
meta Huffman code identifies a set of five Huffman codes, one for green
|
||||
(along with length codes and color cache codes), one for each of red,
|
||||
blue and alpha, and one for distance. The Huffman codes are identified
|
||||
by their position in a table by an integer.
|
||||
|
||||
### Decoding flow of image data
|
||||
|
||||
@ -798,9 +811,9 @@ Read next symbol S
|
||||
|
||||
### Decoding the code lengths
|
||||
|
||||
There are two different ways to encode the code lengths of a Huffman code,
|
||||
indicated by the first bit of the code: _simple code length code_ (1), and
|
||||
_normal code length code_ (0).
|
||||
There are two different ways to encode the code lengths of a Huffman
|
||||
code, indicated by the first bit of the code: _simple code length code_
|
||||
(1), and _normal code length code_ (0).
|
||||
|
||||
|
||||
#### Simple code length code
|
||||
@ -814,9 +827,9 @@ The first bit indicates the number of codes:
|
||||
int num_symbols = ReadBits(1) + 1;
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The first symbol is stored either using a 1-bit code for values of 0 and 1,
|
||||
or using a 8-bit code for values in range [0, 255]. The second symbol, when
|
||||
present, is coded as an 8-bit code.
|
||||
The first symbol is stored either using a 1-bit code for values of 0 and
|
||||
1, or using a 8-bit code for values in range [0, 255]. The second
|
||||
symbol, when present, is coded as an 8-bit code.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int first_symbol_len_code = VP8LReadBits(br, 1);
|
||||
@ -828,16 +841,16 @@ if (num_symbols == 2) {
|
||||
|
||||
Empty trees can be coded as trees that contain one 0 symbol, and can be
|
||||
codified using four bits. For example, a distance tree can be empty if
|
||||
there are no backward references. Similarly, alpha, red, and blue trees can
|
||||
be empty if all pixels within the same meta Huffman code are produced using
|
||||
the color cache.
|
||||
there are no backward references. Similarly, alpha, red, and blue trees
|
||||
can be empty if all pixels within the same meta Huffman code are
|
||||
produced using the color cache.
|
||||
|
||||
|
||||
#### Normal code length code
|
||||
|
||||
The code lengths of a Huffman code are read as follows. _num_codes_ specifies
|
||||
the number of code lengths, the rest of the codes lengths (according to the
|
||||
order in _kCodeLengthCodeOrder_) are zeros.
|
||||
The code lengths of a Huffman code are read as follows. _num_codes_
|
||||
specifies the number of code lengths, the rest of the codes lengths
|
||||
(according to the order in _kCodeLengthCodeOrder_) are zeros.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int kCodeLengthCodes = 19;
|
||||
@ -853,14 +866,19 @@ for (i = 0; i < num_codes; ++i) {
|
||||
* Code length code [0..15] indicate literal code lengths.
|
||||
* Value 0 means no symbols have been coded,
|
||||
* Values [1..15] indicate the bit length of the respective code.
|
||||
* Code 16 repeats the previous non-zero value [3..6] times, i.e., 3 + ReadStream(2) times. If code 16 is used before a non-zero value has been emitted, a value of 8 is repeated.
|
||||
* Code 17 emits a streak of zeros [3..10], i.e., 3 + ReadStream(3) times.
|
||||
* Code 18 emits a streak of zeros of length [11..138], i.e., 11 + ReadStream(7) times.
|
||||
* Code 16 repeats the previous non-zero value [3..6] times, i.e.,
|
||||
3 + ReadStream(2) times. If code 16 is used before a non-zero value
|
||||
has been emitted, a value of 8 is repeated.
|
||||
* Code 17 emits a streak of zeros [3..10], i.e., 3 + ReadStream(3)
|
||||
times.
|
||||
* Code 18 emits a streak of zeros of length [11..138], i.e.,
|
||||
11 + ReadStream(7) times.
|
||||
|
||||
The entropy codes for alpha, red and blue have a total of 256 symbols. The
|
||||
entropy code for distance prefix codes has 40 symbols. The entropy code for
|
||||
green has 256 + 24 + _color_cache_size_, 256 symbols for different green
|
||||
symbols, 24 length code prefix symbols, and symbols for the color cache.
|
||||
The entropy codes for alpha, red and blue have a total of 256 symbols.
|
||||
The entropy code for distance prefix codes has 40 symbols. The entropy
|
||||
code for green has 256 + 24 + _color_cache_size_, 256 symbols for
|
||||
different green symbols, 24 length code prefix symbols, and symbols for
|
||||
the color cache.
|
||||
|
||||
The meta Huffman code, specified in the next section, defines how many
|
||||
Huffman codes there are. There are always 5 times the number of Huffman
|
||||
@ -878,13 +896,14 @@ codes 0, 1, 2, 3 and 4 for green, alpha, red, blue and distance,
|
||||
respectively. This meta Huffman code is used everywhere in the image.
|
||||
|
||||
If this bit is one, the meta Huffman codes are controlled by the entropy
|
||||
image, where the index of the meta Huffman code is codified in the red and
|
||||
green components. The index can be obtained from the uint32 value by
|
||||
_((pixel >> 8) & 0xffff)_, thus there can be up to 65536 unique meta Huffman
|
||||
codes. When decoding a Huffman encoded symbol at a pixel x, y, one chooses
|
||||
the meta Huffman code respective to these coordinates. However, not all
|
||||
bits of the coordinates are used for choosing the meta Huffman code, i.e.,
|
||||
the entropy image is of subresolution to the real image.
|
||||
image, where the index of the meta Huffman code is codified in the red
|
||||
and green components. The index can be obtained from the uint32 value by
|
||||
_((pixel >> 8) & 0xffff)_, thus there can be up to 65536 unique meta
|
||||
Huffman codes. When decoding a Huffman encoded symbol at a pixel x, y,
|
||||
one chooses the meta Huffman code respective to these coordinates.
|
||||
However, not all bits of the coordinates are used for choosing the meta
|
||||
Huffman code, i.e., the entropy image is of subresolution to the real
|
||||
image.
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int huffman_bits = ReadBits(4);
|
||||
@ -897,24 +916,25 @@ _huffman_bits_ gives the amount of subsampling in the entropy image.
|
||||
After reading the _huffman_bits_, an entropy image stream of size
|
||||
_huffman_xsize_, _huffman_ysize_ is read.
|
||||
|
||||
The meta Huffman code, identifying the five Huffman codes per meta Huffman
|
||||
code, is coded only by the number of codes:
|
||||
The meta Huffman code, identifying the five Huffman codes per meta
|
||||
Huffman code, is coded only by the number of codes:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
int num_meta_codes = max(entropy_image) + 1;
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Now, we can obtain the five Huffman codes for green, alpha, red, blue and
|
||||
distance for a given (x, y) by the following expression:
|
||||
Now, we can obtain the five Huffman codes for green, alpha, red, blue
|
||||
and distance for a given (x, y) by the following expression:
|
||||
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
meta_codes[(entropy_image[(y >> huffman_bits) * huffman_xsize +
|
||||
(x >> huffman_bits)] >> 8) & 0xffff]
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The _huffman_code[5 * meta_code + k]_, codes with _k_ == 0 are for the green &
|
||||
length code, _k_ == 4 for the distance code, and the codes at _k_ == 1, 2, and
|
||||
3, are for codes of length 256 for red, blue and alpha, respectively.
|
||||
The _huffman_code[5 * meta_code + k]_, codes with _k_ == 0 are for the
|
||||
green & length code, _k_ == 4 for the distance code, and the codes at
|
||||
_k_ == 1, 2, and 3, are for codes of length 256 for red, blue and alpha,
|
||||
respectively.
|
||||
|
||||
The value of k for the reference position in _meta_code_ determines the
|
||||
length of the Huffman code:
|
||||
@ -928,8 +948,8 @@ Overall Structure of the Format
|
||||
-------------------------------
|
||||
|
||||
Below there is a eagles-eye-view into the format in Backus-Naur form. It
|
||||
does not cover all details. End-of-image EOI is only implicitly coded into
|
||||
the number of pixels (xsize * ysize).
|
||||
does not cover all details. End-of-image EOI is only implicitly coded
|
||||
into the number of pixels (xsize * ysize).
|
||||
|
||||
|
||||
#### Basic structure
|
||||
|
Loading…
Reference in New Issue
Block a user