mirror of
				https://github.com/webmproject/libwebp.git
				synced 2025-10-31 02:15:42 +01:00 
			
		
		
		
	Edit for consistency, usage and grammar.
Substantial edit, though less than 100% thorough. This makes changes that are clearly safe, but avoids others where my domain knowledge is incomplete and accuracy might be compromised. modified: doc/webp-lossless-bitstream-spec.txt Change-Id: I89361a2e1157b8d2e44a8b4f4603f65833f0c1e6
This commit is contained in:
		| @@ -12,7 +12,7 @@ end of this file. | ||||
| Specification for WebP Lossless Bitstream | ||||
| ========================================= | ||||
|  | ||||
| _2012-06-08_ | ||||
| _2012-06-19_ | ||||
|  | ||||
|  | ||||
| Abstract | ||||
| @@ -26,8 +26,8 @@ itself, for storing statistical data about the images, such as the used | ||||
| entropy codes, spatial predictors, color space conversion, and color | ||||
| table. LZ77, Huffman coding, and a color cache are used for compression | ||||
| of the bulk data. Decoding speeds faster than PNG have been | ||||
| demonstrated, as well as 25 % denser compression than what can be | ||||
| achieved using today's PNG format. | ||||
| demonstrated, as well as 25% denser compression than can be achieved | ||||
| using today's PNG format. | ||||
|  | ||||
|  | ||||
| * TOC placeholder | ||||
| @@ -44,53 +44,52 @@ ARGB image | ||||
| : A two-dimensional array containing ARGB pixels. | ||||
|  | ||||
| color cache | ||||
| : A small hash-addressed array to store recently used colors | ||||
|   and to be able to recall them with shorter codes. | ||||
| : A small hash-addressed array to store recently used colors, to be able | ||||
|   to recall them with shorter codes. | ||||
|  | ||||
| color indexing image | ||||
| : A one-dimensional image of colors that can be | ||||
|   indexed using a small integer (up to 256 within WebP lossless). | ||||
| : A one-dimensional image of colors that can be indexed using a small | ||||
|   integer (up to 256 within WebP lossless). | ||||
|  | ||||
| color transform image | ||||
| : A two-dimensional subresolution image containing | ||||
|   data about correlations of color components. | ||||
| : A two-dimensional subresolution image containing data about | ||||
|   correlations of color components. | ||||
|  | ||||
| distance mapping | ||||
| : Changes LZ77 distances to have the smallest values for | ||||
|   pixels in 2d proximity. | ||||
| : Changes LZ77 distances to have the smallest values for pixels in 2D | ||||
|   proximity. | ||||
|  | ||||
| entropy image | ||||
| : A two-dimensional subresolution image indicating which | ||||
|   entropy coding should be used in a respective square in the image, | ||||
|   i.e., each pixel is a meta Huffman code. | ||||
| : A two-dimensional subresolution image indicating which entropy coding | ||||
|   should be used in a respective square in the image, i.e., each pixel | ||||
|   is a meta Huffman code. | ||||
|  | ||||
| Huffman code | ||||
| : A classic way to do entropy coding where a smaller number of | ||||
|   bits are used for more frequent codes. | ||||
| : A classic way to do entropy coding where a smaller number of bits are | ||||
|   used for more frequent codes. | ||||
|  | ||||
| LZ77 | ||||
| : Dictionary-based sliding window compression algorithm that either | ||||
|   emits symbols or describes them as sequences of past symbols. | ||||
|  | ||||
| meta Huffman code | ||||
| : A small integer (up to 16 bits) that indexes an element | ||||
|   in the meta Huffman table. | ||||
| : A small integer (up to 16 bits) that indexes an element in the meta | ||||
|   Huffman table. | ||||
|  | ||||
| predictor image | ||||
| : A two-dimensional subresolution image indicating which | ||||
|   spatial predictor is used for a particular square in the image. | ||||
| : A two-dimensional subresolution image indicating which spatial | ||||
|   predictor is used for a particular square in the image. | ||||
|  | ||||
| prefix coding | ||||
| : A way to entropy code larger integers that codes a few bits | ||||
|   of the integer using an entropy code and codifies the remaining bits | ||||
|   raw. This allows for the descriptions of the entropy codes to remain | ||||
| : A way to entropy code larger integers that codes a few bits of the | ||||
|   integer using an entropy code and codifies the remaining bits raw. | ||||
|   This allows for the descriptions of the entropy codes to remain | ||||
|   relatively small even when the range of symbols is large. | ||||
|  | ||||
| scan-line order | ||||
| : A processing order of pixels, left-to-right, top-to- | ||||
|   bottom, starting from the left-hand-top pixel, proceeding towards | ||||
|   right. Once a row is completed, continue from the left-hand column of | ||||
|   the next row. | ||||
| : A processing order of pixels, left-to-right, top-to-bottom, starting | ||||
|   from the left-hand-top pixel, proceeding to the right. Once a row is | ||||
|   completed, continue from the left-hand column of the next row. | ||||
|  | ||||
|  | ||||
| 1 Introduction | ||||
| @@ -100,15 +99,14 @@ This document describes the compressed data representation of a WebP | ||||
| lossless image. It is intended as a detailed reference for WebP lossless | ||||
| encoder and decoder implementation. | ||||
|  | ||||
| In this document, we use extensively the syntax of the C programming | ||||
| language to describe the bitstream, and assume the existence of a | ||||
| function for reading bits, `ReadBits(n)`. The bytes are read in the | ||||
| natural order of the stream containing them, and bits of each byte are | ||||
| read in the least-significant-bit-first order. When multiple bits are | ||||
| read at the same time the integer is constructed from the original data | ||||
| in the original order, the most significant bits of the returned | ||||
| integer are also the most significant bits of the original data. Thus | ||||
| the statement | ||||
| In this document, we extensively use C programming language syntax to | ||||
| describe the bitstream, and assume the existence of a function for | ||||
| reading bits, `ReadBits(n)`. The bytes are read in the natural order of | ||||
| the stream containing them, and bits of each byte are read in | ||||
| least-significant-bit-first order. When multiple bits are read at the | ||||
| same time, the integer is constructed from the original data in the | ||||
| original order. The most significant bits of the returned integer are | ||||
| also the most significant bits of the original data. Thus the statement | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| b = ReadBits(2); | ||||
| @@ -130,35 +128,32 @@ bits 23..16, green in bits 15..8 and blue in bits 7..0, but | ||||
| implementations of the format are free to use another representation | ||||
| internally. | ||||
|  | ||||
| Broadly a WebP lossless image contains header data, transform | ||||
| Broadly, a WebP lossless image contains header data, transform | ||||
| information and actual image data. Headers contain width and height of | ||||
| the image. A WebP lossless image can go through five different types of | ||||
| transformation before being entropy encoded. The transform information | ||||
| in the bitstream contains the required data to apply the respective | ||||
| in the bitstream contains the data required to apply the respective | ||||
| inverse transforms. | ||||
|  | ||||
|  | ||||
| 2 RIFF Header | ||||
| ------------- | ||||
|  | ||||
| The beginning of the header has the RIFF container. This consist of the | ||||
| The beginning of the header has the RIFF container. This consists of the | ||||
| following 21 bytes: | ||||
|  | ||||
|    1. String "RIFF" | ||||
|    2. A little-endian 32 bit value of the block length, the whole size | ||||
|       of the block controlled by the RIFF header. Normally this equals | ||||
|       the payload size (file size subtracted by 8 bytes, i.e., 4 bytes | ||||
|       for 'RIFF' identifier and 4 bytes for storing this value itself). | ||||
|       the payload size (file size minus 8 bytes: 4 bytes for the 'RIFF' | ||||
|       identifier and 4 bytes for storing the value itself). | ||||
|    3. String "WEBP" (RIFF container name). | ||||
|    4. String "VP8L" (chunk tag for lossless encoded image data). | ||||
|    5. A little-endian 32-bit value of the number of bytes in the | ||||
|       lossless stream. | ||||
|    6. One byte signature 0x64. Decoders need to accept also 0x65 as a | ||||
|       valid stream, it has a planned future use. Today, a solid white | ||||
|       image of the specified size should be shown for images having a | ||||
|       0x2f signature. | ||||
|    6. One byte signature 0x2f. | ||||
|  | ||||
| First 28 bits of the bitstream specify the width and height of the | ||||
| The first 28 bits of the bitstream specify the width and height of the | ||||
| image. Width and height are decoded as 14-bit integers as follows: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| @@ -169,6 +164,21 @@ int image_height = ReadBits(14) + 1; | ||||
| The 14-bit dynamics for image size limit the maximum size of a WebP | ||||
| lossless image to 16384✕16384 pixels. | ||||
|  | ||||
| The alpha_is_used bit is a hint only, and should not impact decoding. | ||||
| It should be set to 0 when all alpha values are 255 in the picture, and | ||||
| 1 otherwise. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int alpha_is_used = ReadBits(1); | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The version_number is a 3 bit code that must be discarded by the decoder | ||||
| at this time. Complying encoders write a 3-bit value 0. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int version_number = ReadBits(3); | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
|  | ||||
| 3 Transformations | ||||
| ----------------- | ||||
| @@ -177,9 +187,9 @@ Transformations are reversible manipulations of the image data that can | ||||
| reduce the remaining symbolic entropy by modeling spatial and color | ||||
| correlations. Transformations can make the final compression more dense. | ||||
|  | ||||
| An image can go through four types of transformations. A 1 bit indicates | ||||
| the presence of a transform. Every transform is allowed to be used only | ||||
| once. The transformations are used only for the main level ARGB image -- | ||||
| An image can go through four types of transformation. A 1 bit indicates | ||||
| the presence of a transform. Each transform is allowed to be used only | ||||
| once. The transformations are used only for the main level ARGB image: | ||||
| the subresolution images have no transforms, not even the 0 bit | ||||
| indicating the end-of-transforms. | ||||
|  | ||||
| @@ -195,7 +205,7 @@ while (ReadBits(1)) {  // Transform present. | ||||
|   ... | ||||
| } | ||||
|  | ||||
| // Decode actual image data (section 4). | ||||
| // Decode actual image data (Section 4). | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| If a transform is present then the next two bits specify the transform | ||||
| @@ -211,12 +221,12 @@ enum TransformType { | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The transform type is followed by the transform data. Transform data | ||||
| contains the required information to apply the inverse transform and | ||||
| contains the information required to apply the inverse transform and | ||||
| depends on the transform type. Next we describe the transform data for | ||||
| different types. | ||||
|  | ||||
|  | ||||
| ### Predictor transform | ||||
| ### Predictor Transform | ||||
|  | ||||
| The predictor transform can be used to reduce entropy by exploiting the | ||||
| fact that neighboring pixels are often correlated. In the predictor | ||||
| @@ -227,11 +237,11 @@ prediction to use. We divide the image into squares and all the pixels | ||||
| in a square use same prediction mode. | ||||
|  | ||||
| The first 4 bits of prediction data define the block width and height in | ||||
| number of bits. The number of block columns, _block_xsize_, is used in | ||||
| number of bits. The number of block columns, `block_xsize`, is used in | ||||
| indexing two-dimensionally. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int size_bits = ReadBits(4); | ||||
| int size_bits = ReadBits(3) + 2; | ||||
| int block_width = (1 << size_bits); | ||||
| int block_height = (1 << size_bits); | ||||
| #define DIV_ROUND_UP(num, den) ((num) + (den) - 1) / (den)) | ||||
| @@ -241,7 +251,8 @@ int block_xsize = DIV_ROUND_UP(image_width, 1 << size_bits); | ||||
| The transform data contains the prediction mode for each block of the | ||||
| image. All the `block_width * block_height` pixels of a block use same | ||||
| prediction mode. The prediction modes are treated as pixels of an image | ||||
| and encoded using the same techniques described in chapter 4. | ||||
| and encoded using the same techniques described in | ||||
| [Chapter 4](#image-data). | ||||
|  | ||||
| For a pixel _x, y_, one can compute the respective filter block address | ||||
| by: | ||||
| @@ -258,7 +269,6 @@ whose values are already known. | ||||
| We choose the neighboring pixels (TL, T, TR, and L) of the current pixel | ||||
| (P) as follows: | ||||
|  | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| O    O    O    O    O    O    O    O    O    O    O | ||||
| O    O    O    O    O    O    O    O    O    O    O | ||||
| @@ -289,8 +299,8 @@ defined as follows. | ||||
| |  9     | Average2(T, TR)                                         | | ||||
| | 10     | Average2(Average2(L, TL), Average2(T, TR))              | | ||||
| | 11     | Select(L, T, TL)                                        | | ||||
| | 12     | ClampedAddSubtractFull(L, T, TL)                        | | ||||
| | 13     | ClampedAddSubtractHalf(Average2(L, T), TL)              | | ||||
| | 12     | ClampAddSubtractFull(L, T, TL)                          | | ||||
| | 13     | ClampAddSubtractHalf(Average2(L, T), TL)                | | ||||
|  | ||||
|  | ||||
| `Average2` is defined as follows for each ARGB component: | ||||
| @@ -328,7 +338,7 @@ uint32 Select(uint32 L, uint32 T, uint32 TL) { | ||||
| } | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The function `ClampedAddSubstractFull` and `ClampedAddSubstractHalf` are | ||||
| The functions `ClampAddSubtractFull` and `ClampAddSubtractHalf` are | ||||
| performed for each ARGB component as follows: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| @@ -383,24 +393,14 @@ typedef struct { | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The actual color transformation is done by defining a color transform | ||||
| delta. The color transform delta depends on the `ColorTransformElement` | ||||
| which is same for all the pixels in a particular block. The delta is | ||||
| delta. The color transform delta depends on the `ColorTransformElement`, | ||||
| which is the same for all the pixels in a particular block. The delta is | ||||
| added during color transform. The inverse color transform then is just | ||||
| subtracting those deltas. | ||||
|  | ||||
| The color transform function is defined as follows: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| /* | ||||
|  * Input: | ||||
|  * red, green, blue values of the pixel | ||||
|  * trans: Color transform element of the block where the | ||||
|  *        pixel belongs to. | ||||
|  * | ||||
|  * Output: | ||||
|  * *new_red = transformed value of red | ||||
|  * *new_blue = transformed value of blue | ||||
|  */ | ||||
| void ColorTransform(uint8 red, uint8 blue, uint8 green, | ||||
|                     ColorTransformElement *trans, | ||||
|                     uint8 *new_red, uint8 *new_blue) { | ||||
| @@ -429,7 +429,7 @@ int8 ColorTransformDelta(int8 t, int8 c) { | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The multiplication is to be done using more precision (with at least | ||||
| 16 bit dynamics). The sign extension property of the shift operation | ||||
| 16-bit dynamics). The sign extension property of the shift operation | ||||
| does not matter here: only the lowest 8 bits are used from the result, | ||||
| and there the sign extension shifting and unsigned shifting are | ||||
| consistent with each other. | ||||
| @@ -441,33 +441,26 @@ width and height of the image block in number of bits, just like the | ||||
| predictor transform: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int size_bits = ReadStream(4); | ||||
| int size_bits = ReadStream(3) + 2; | ||||
| int block_width = 1 << size_bits; | ||||
| int block_height = 1 << size_bits; | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The remaining part of the color transform data contains | ||||
| ColorTransformElement instances corresponding to each block of the | ||||
| image. ColorTransformElement instances are treated as pixels of an image | ||||
| and encoded using the methods described in section 4. | ||||
| `ColorTransformElement` instances corresponding to each block of the | ||||
| image. `ColorTransformElement` instances are treated as pixels of an | ||||
| image and encoded using the methods described in | ||||
| [Chapter 4](#image-data). | ||||
|  | ||||
| During decoding ColorTransformElement instances of the blocks are | ||||
| During decoding, `ColorTransformElement` instances of the blocks are | ||||
| decoded and the inverse color transform is applied on the ARGB values of | ||||
| the pixels. As mentioned earlier that inverse color transform is just | ||||
| subtracting ColorTransformElement values from the red and blue channels. | ||||
| the pixels. As mentioned earlier, that inverse color transform is just | ||||
| subtracting `ColorTransformElement` values from the red and blue | ||||
| channels. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| /* | ||||
|  * Input: | ||||
|  * red, blue and green values in the current state. | ||||
|  * trans: Color transform element of the corresponding to the | ||||
|  * block of the current pixel. | ||||
|  * | ||||
|  * Output: | ||||
|  * new_red, new_blue: red, blue values after inverse transform. | ||||
|  */ | ||||
| void InverseTransform(uint8 red, uint8 green, uint8 blue, | ||||
|                       ColorTransfromElement *p, | ||||
|                       ColorTransformElement *p, | ||||
|                       uint8 *new_red, uint8 *new_blue) { | ||||
|   // Applying inverse transform is just subtracting the | ||||
|   // color transform deltas | ||||
| @@ -497,32 +490,31 @@ void AddGreenToBlueAndRed(uint8 green, uint8 *red, uint8 *blue) { | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| This transform is redundant as it can be modeled using the color | ||||
| transform. This transform is still often useful, and since it can extend | ||||
| the dynamics of the color transform, and there is no additional data | ||||
| here, this transform can be coded using less bits than a full blown | ||||
| color transform. | ||||
| transform, but it is still often useful. Since it can extend the | ||||
| dynamics of the color transform and there is no additional data here, | ||||
| the subtract green transform can be coded using fewer bits than a | ||||
| full-blown color transform. | ||||
|  | ||||
|  | ||||
| ### Color Indexing Transform | ||||
|  | ||||
| If there are not many unique values of the pixels then it may be more | ||||
| efficient to create a color index array and replace the pixel values by | ||||
| the indices to this color index array. Color indexing transform is used | ||||
| to achieve that. In the context of the WebP lossless, we specifically do | ||||
| not call this transform a palette transform, since another slightly | ||||
| similar, but more dynamic concept exists within WebP lossless encoding, | ||||
| called color cache. | ||||
| If there are not many unique pixel values, it may be more efficient to | ||||
| create a color index array and replace the pixel values by the array's | ||||
| indices. The color indexing transform achieves this. (In the context of | ||||
| WebP lossless, we specifically do not call this a palette transform | ||||
| because a similar but more dynamic concept exists in WebP lossless | ||||
| encoding: color cache.) | ||||
|  | ||||
| The color indexing transform checks for the number of unique ARGB values | ||||
| in the image. If that number is below a threshold (256), it creates an | ||||
| array of those ARGB values is created which replaces the pixel values | ||||
| with the corresponding index. The green channel of the pixels are | ||||
| replaced with the index, all alpha values are set to 255, all red and | ||||
| array of those ARGB values, which is then used to replace the pixel | ||||
| values with the corresponding index: the green channel of the pixels are | ||||
| replaced with the index; all alpha values are set to 255; all red and | ||||
| blue values to 0. | ||||
|  | ||||
| The transform data contains color table size and the entries in the | ||||
| color table. The decoder reads the color indexing transform data as | ||||
| follow: | ||||
| follows: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| // 8 bit value for color table size | ||||
| @@ -531,13 +523,13 @@ int color_table_size = ReadStream(8) + 1; | ||||
|  | ||||
| The color table is stored using the image storage format itself. The | ||||
| color table can be obtained by reading an image, without the RIFF | ||||
| header, image size, and transforms, assuming an height of one pixel, and | ||||
| a width of color_table_size. The color table is always subtraction coded | ||||
| for reducing the entropy of this image. The deltas of palette colors | ||||
| contain typically much less entropy than the colors themselves leading | ||||
| header, image size, and transforms, assuming a height of one pixel and | ||||
| a width of `color_table_size`. The color table is always | ||||
| subtraction-coded to reduce image entropy. The deltas of palette colors | ||||
| contain typically much less entropy than the colors themselves, leading | ||||
| to significant savings for smaller images. In decoding, every final | ||||
| color in the color table can be obtained by adding the previous color | ||||
| component values, by each ARGB-component separately and storing the | ||||
| component values by each ARGB component separately, and storing the | ||||
| least significant 8 bits of the result. | ||||
|  | ||||
| The inverse transform for the image is simply replacing the pixel values | ||||
| @@ -550,46 +542,48 @@ color. | ||||
| argb = color_table[GREEN(argb)]; | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| When the color table is of a small size (equal to or less than 16 | ||||
| colors), several pixels are bundled into a single pixel. The pixel | ||||
| bundling packs several (2, 4, or 8) pixels into a single pixel reducing | ||||
| the image width respectively. Pixel bundling allows for a more efficient | ||||
| joint distribution entropy coding of neighboring pixels, and gives some | ||||
| arithmetic coding like benefits to the entropy code, but it can only be | ||||
| used when there is a small amount of unique values. | ||||
| When the color table is small (equal to or less than 16 colors), several | ||||
| pixels are bundled into a single pixel. The pixel bundling packs several | ||||
| (2, 4, or 8) pixels into a single pixel, reducing the image width | ||||
| respectively. Pixel bundling allows for a more efficient joint | ||||
| distribution entropy coding of neighboring pixels, and gives some | ||||
| arithmetic coding-like benefits to the entropy code, but it can only be | ||||
| used when there are a small number of unique values. | ||||
|  | ||||
| color_table_size specifies how many pixels are combined together: | ||||
| `color_table_size` specifies how many pixels are combined together: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int width_bits = 0; | ||||
| int width_bits; | ||||
| if (color_table_size <= 2) { | ||||
|   width_bits = 3; | ||||
| } else if (color_table_size <= 4) { | ||||
|   width_bits = 2; | ||||
| } else if (color_table_size <= 16) { | ||||
|   width_bits = 1; | ||||
| } else { | ||||
|   width_bits = 0; | ||||
| } | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The _width_bits_ has a value of 0, 1, 2 or 3. A value of 0 indicates no | ||||
| `width_bits` has a value of 0, 1, 2 or 3. A value of 0 indicates no | ||||
| pixel bundling to be done for the image. A value of 1 indicates that two | ||||
| pixels are combined together, and each pixel has a range of [0..15]. A | ||||
| value of 2 indicates that four pixels are combined together, and each | ||||
| pixel has a range of [0..3]. A value of 3 indicates that eight pixels | ||||
| are combined together and each pixels has a range of [0..1], i.e., a | ||||
| are combined together and each pixel has a range of [0..1], i.e., a | ||||
| binary value. | ||||
|  | ||||
| The values are packed into the green component as follows: | ||||
|  | ||||
|   * _width_bits_ = 1: for every x value where x ≡ 0 (mod 2), a green | ||||
|   * `width_bits` = 1: for every x value where x ≡ 0 (mod 2), a green | ||||
|     value at x is positioned into the 4 least-significant bits of the | ||||
|     green value at x / 2, a green value at x + 1 is positioned into the | ||||
|     4 most-significant bits of the green value at x / 2. | ||||
|   * _width_bits_ = 2: for every x value where x ≡ 0 (mod 4), a green | ||||
|   * `width_bits` = 2: for every x value where x ≡ 0 (mod 4), a green | ||||
|     value at x is positioned into the 2 least-significant bits of the | ||||
|     green value at x / 4, green values at x + 1 to x + 3 in order to the | ||||
|     more significant bits of the green value at x / 4. | ||||
|   * _width_bits_ = 3: for every x value where x ≡ 0 (mod 8), a green | ||||
|   * `width_bits` = 3: for every x value where x ≡ 0 (mod 8), a green | ||||
|     value at x is positioned into the least-significant bit of the green | ||||
|     value at x / 8, green values at x + 1 to x + 7 in order to the more | ||||
|     significant bits of the green value at x / 8. | ||||
| @@ -607,12 +601,12 @@ to entropy coding, and three further roles related to transforms. | ||||
|      code used in a particular area of the image. | ||||
|   3. Predictor image. The green component defines which of the 14 values | ||||
|      is used within a particular square of the image. | ||||
|   4. Color indexing image. An array of up to 256 ARGB colors are used | ||||
|      for transforming a green-only image, using the green value as an | ||||
|      index to this one-dimensional array. | ||||
|   4. Color indexing image. An array of up to 256 ARGB colors is used for | ||||
|      transforming a green-only image, using the green value as an index | ||||
|      to this one-dimensional array. | ||||
|   5. Color transformation image. Defines signed 3.5 fixed-point | ||||
|      multipliers that are used to predict the red, green, blue | ||||
|      components to reduce entropy. | ||||
|      multipliers that are used to predict the red, green, and blue | ||||
|      components, to reduce entropy. | ||||
|  | ||||
| To divide the image into multiple regions, the image is first divided | ||||
| into a set of fixed-size blocks (typically 16x16 blocks). Each of these | ||||
| @@ -622,28 +616,29 @@ an entropy code, and in order to minimize this cost, statistically | ||||
| similar blocks can share an entropy code. The blocks sharing an entropy | ||||
| code can be found by clustering their statistical properties, or by | ||||
| repeatedly joining two randomly selected clusters when it reduces the | ||||
| overall amount of bits needed to encode the image. [See section | ||||
| _"Decoding of meta Huffman codes"_ in Chapter 5 for an explanation of | ||||
| how this _entropy image_ is stored.] | ||||
| overall amount of bits needed to encode the image. See the section | ||||
| [Decoding of Meta Huffman Codes](#decoding-of-meta-huffman-codes) in | ||||
| [Chapter 5](#entropy-code) for an explanation of how this entropy image | ||||
| is stored. | ||||
|  | ||||
| Each pixel is encoded using one of three possible methods: | ||||
|  | ||||
|   1. Huffman coded literals, where each channel (green, alpha, red, | ||||
|      blue) is entropy-coded independently, | ||||
|      blue) is entropy-coded independently; | ||||
|   2. LZ77, a sequence of pixels in scan-line order copied from elsewhere | ||||
|      in the image, or, | ||||
|      in the image; or | ||||
|   3. Color cache, using a short multiplicative hash code (color cache | ||||
|      index) of a recently seen color. | ||||
|  | ||||
| In the following sections we introduce the main concepts in LZ77 prefix | ||||
| coding, LZ77 entropy coding, LZ77 distance mapping, and color cache | ||||
| codes. The actual details of the entropy code are described in more | ||||
| detail in chapter 5. | ||||
| detail in [Chapter 5](#entropy-code). | ||||
|  | ||||
|  | ||||
| ### LZ77 prefix coding | ||||
| ### LZ77 Prefix Coding | ||||
|  | ||||
| Prefix coding divides large integer values into two parts, the prefix | ||||
| Prefix coding divides large integer values into two parts: the prefix | ||||
| code and the extra bits. The benefit of this approach is that entropy | ||||
| coding is later used only for the prefix code, reducing the resources | ||||
| needed by the entropy code. The extra bits are stored as they are, | ||||
| @@ -652,9 +647,9 @@ without an entropy code. | ||||
| This prefix code is used for coding backward reference lengths and | ||||
| distances. The extra bits form an integer that is added to the lower | ||||
| value of the range. Hence the LZ77 lengths and distances are divided | ||||
| into prefix codes and extra bits performing the Huffman coding only on | ||||
| into prefix codes and extra bits. Performing the Huffman coding only on | ||||
| the prefixes reduces the size of the Huffman codes to tens of values | ||||
| instead of otherwise a million (distance) or several thousands (length). | ||||
| instead of a million (distance) or several thousands (length). | ||||
|  | ||||
| | Prefix code | Value range     | Extra bits | | ||||
| | ----------- | --------------- | ---------- | | ||||
| @@ -676,13 +671,13 @@ The code to obtain a value from the prefix code is as follows: | ||||
| if (prefix_code < 4) { | ||||
|   return prefix_code; | ||||
| } | ||||
| uint32 extra_bits = (prefix_code - 2) >> 1; | ||||
| uint32 offset = (2 + (prefix_code & 1)) << extra_bits; | ||||
| int extra_bits = (prefix_code - 2) >> 1; | ||||
| int offset = (2 + (prefix_code & 1)) << extra_bits; | ||||
| return offset + ReadBits(extra_bits) + 1; | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
|  | ||||
| ### LZ77 backward reference entropy coding | ||||
| ### LZ77 Backward Reference Entropy Coding | ||||
|  | ||||
| Backward references are tuples of length and distance. Length indicates | ||||
| how many pixels in scan-line order are to be copied. The length is | ||||
| @@ -692,13 +687,13 @@ limiting the maximum length to 4096. For distances, all 40 prefix codes | ||||
| are used. | ||||
|  | ||||
|  | ||||
| ### LZ77 distance mapping | ||||
| ### LZ77 Distance Mapping | ||||
|  | ||||
| 120 smallest distance codes [1..120] are reserved for a close | ||||
| neighborhood within the current pixel. The rest are pure distance codes | ||||
| in scan-line order, just offset by 120. The smallest codes are coded | ||||
| into x and y offsets by the following table. Each tuple shows the x and | ||||
| the y coordinates in 2d offsets -- for example the first tuple (0, 1) | ||||
| the y coordinates in 2D offsets -- for example the first tuple (0, 1) | ||||
| means 0 for no difference in x, and 1 pixel difference in y (indicating | ||||
| previous row). | ||||
|  | ||||
| @@ -723,15 +718,16 @@ previous row). | ||||
| The distances codes that map into these tuples are changes into | ||||
| scan-line order distances using the following formula: | ||||
| _dist = x + y * xsize_, where _xsize_ is the width of the image in | ||||
| pixels. | ||||
| pixels. If a decoder detects a computed _dist_ value smaller than 1, | ||||
| the value of 1 is used instead. | ||||
|  | ||||
|  | ||||
| ### Color Cache Code | ||||
|  | ||||
| Color cache stores a set of colors that have been recently used in the | ||||
| image. Using the color cache code, the color cache colors can be | ||||
| referred more efficiently than emitting the respective ARGB values | ||||
| independently or by sending them as backward references with a length of | ||||
| referred to more efficiently than emitting the respective ARGB values | ||||
| independently or sending them as backward references with a length of | ||||
| one pixel. | ||||
|  | ||||
| Color cache codes are coded as follows. First, there is a bit that | ||||
| @@ -745,15 +741,15 @@ int color_cache_code_bits = ReadBits(br, 4); | ||||
| int color_cache_size = 1 << color_cache_code_bits; | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| _color_cache_code_bits_ defines the size of the color_cache by (1 << | ||||
| _color_cache_code_bits_). The range of allowed values for | ||||
| _color_cache_code_bits_ is [1..11]. Compliant decoders must indicate a | ||||
| `color_cache_code_bits` defines the size of the color_cache by (1 << | ||||
| `color_cache_code_bits`). The range of allowed values for | ||||
| `color_cache_code_bits` is [1..11]. Compliant decoders must indicate a | ||||
| corrupted bitstream for other values. | ||||
|  | ||||
| A color cache is an array of the size _color_cache_size_. Each entry | ||||
| A color cache is an array of the size `color_cache_size`. Each entry | ||||
| stores one ARGB color. Colors are looked up by indexing them by | ||||
| (0x1e35a7bd * _color_) >> (32 - _color_cache_code_bits_). Only one | ||||
| lookup is done in a color cache, there is no conflict resolution. | ||||
| (0x1e35a7bd * `color`) >> (32 - `color_cache_code_bits`). Only one | ||||
| lookup is done in a color cache; there is no conflict resolution. | ||||
|  | ||||
| In the beginning of decoding or encoding of an image, all entries in all | ||||
| color cache values are set to zero. The color cache code is converted to | ||||
| @@ -765,33 +761,34 @@ literals, into the cache in the order they appear in the stream. | ||||
| 5 Entropy Code | ||||
| -------------- | ||||
|  | ||||
| ### Huffman coding | ||||
| ### Huffman Coding | ||||
|  | ||||
| Most of the data is coded using a canonical Huffman code. This includes | ||||
| the following: | ||||
|  | ||||
|   * A combined code that defines either the value of the green | ||||
|     component, a color cache code, or a prefix of the length codes, | ||||
|   * the data for alpha, red and blue components, and | ||||
|   * a combined code that defines either the value of the green | ||||
|     component, a color cache code, or a prefix of the length codes; | ||||
|   * the data for alpha, red and blue components; and | ||||
|   * prefixes of the distance codes. | ||||
|  | ||||
| The Huffman codes are transmitted by sending the code lengths, the | ||||
| The Huffman codes are transmitted by sending the code lengths; the | ||||
| actual symbols are implicit and done in order for each length. The | ||||
| Huffman code lengths are run-length-encoded using three different | ||||
| prefixes, and the result of this coding is further Huffman coded. | ||||
|  | ||||
|  | ||||
| ### Spatially-variant Huffman coding | ||||
| ### Spatially-variant Huffman Coding | ||||
|  | ||||
| For every pixel (x, y) in the image, there is a definition of which | ||||
| entropy code to use. First, there is an integer called 'meta Huffman | ||||
| code' that can be obtained from a subresolution 2d image. This  | ||||
| code' that can be obtained from a subresolution 2D image. This | ||||
| meta Huffman code identifies a set of five Huffman codes, one for green | ||||
| (along with length codes and color cache codes), one for each of red, | ||||
| blue and alpha, and one for distance. The Huffman codes are identified | ||||
| by their position in a table by an integer. | ||||
|  | ||||
| ### Decoding flow of image data | ||||
|  | ||||
| ### Decoding Flow of Image Data | ||||
|  | ||||
| Read next symbol S | ||||
|  | ||||
| @@ -809,14 +806,14 @@ Read next symbol S | ||||
|      1. Use ARGB color from the color cache, at index S - 256 + 24 | ||||
|  | ||||
|  | ||||
| ### Decoding the code lengths | ||||
| ### Decoding the Code Lengths | ||||
|  | ||||
| There are two different ways to encode the code lengths of a Huffman | ||||
| code, indicated by the first bit of the code: _simple code length code_ | ||||
| (1), and _normal code length code_ (0). | ||||
|  | ||||
|  | ||||
| #### Simple code length code | ||||
| #### Simple Code Length Code | ||||
|  | ||||
| This variant can codify 1 or 2 non-zero length codes in the range of [0, | ||||
| 255]. All other code lengths are implicitly zeros. | ||||
| @@ -846,11 +843,11 @@ can be empty if all pixels within the same meta Huffman code are | ||||
| produced using the color cache. | ||||
|  | ||||
|  | ||||
| #### Normal code length code | ||||
| #### Normal Code Length Code | ||||
|  | ||||
| The code lengths of a Huffman code are read as follows. _num_codes_ | ||||
| specifies the number of code lengths, the rest of the codes lengths | ||||
| (according to the order in _kCodeLengthCodeOrder_) are zeros. | ||||
| The code lengths of a Huffman code are read as follows: `num_codes` | ||||
| specifies the number of code lengths; the rest of the code lengths | ||||
| (according to the order in `kCodeLengthCodeOrder`) are zeros. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int kCodeLengthCodes = 19; | ||||
| @@ -863,20 +860,20 @@ for (i = 0; i < num_codes; ++i) { | ||||
| } | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
|   * Code length code [0..15] indicate literal code lengths.  | ||||
|     * Value 0 means no symbols have been coded,  | ||||
|   * Code length code [0..15] indicates literal code lengths. | ||||
|     * Value 0 means no symbols have been coded. | ||||
|     * Values [1..15] indicate the bit length of the respective code. | ||||
|   * Code 16 repeats the previous non-zero value [3..6] times, i.e., | ||||
|     3 + ReadStream(2) times.  If code 16 is used before a non-zero value | ||||
|     has been emitted, a value of 8 is repeated.  | ||||
|   * Code 17 emits a streak of zeros [3..10], i.e., 3 + ReadStream(3) | ||||
|     3 + `ReadStream(2)` times.  If code 16 is used before a non-zero | ||||
|     value has been emitted, a value of 8 is repeated. | ||||
|   * Code 17 emits a streak of zeros [3..10], i.e., 3 + `ReadStream(3)` | ||||
|     times. | ||||
|   * Code 18 emits a streak of zeros of length [11..138], i.e., | ||||
|     11 + ReadStream(7) times. | ||||
|     11 + `ReadStream(7)` times. | ||||
|  | ||||
| The entropy codes for alpha, red and blue have a total of 256 symbols. | ||||
| The entropy code for distance prefix codes has 40 symbols. The entropy | ||||
| code for green has 256 + 24 + _color_cache_size_, 256 symbols for | ||||
| code for green has 256 + 24 + `color_cache_size`, 256 symbols for | ||||
| different green symbols, 24 length code prefix symbols, and symbols for | ||||
| the color cache. | ||||
|  | ||||
| @@ -885,7 +882,7 @@ Huffman codes there are. There are always 5 times the number of Huffman | ||||
| codes to the number of meta Huffman codes. | ||||
|  | ||||
|  | ||||
| ### Decoding of meta Huffman codes | ||||
| ### Decoding of Meta Huffman Codes | ||||
|  | ||||
| There are two ways to code the meta Huffman codes, indicated by one bit | ||||
| for the ARGB image and is an implicit zero, i.e., not present in the | ||||
| @@ -906,15 +903,15 @@ Huffman code, i.e., the entropy image is of subresolution to the real | ||||
| image. | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| int huffman_bits = ReadBits(4); | ||||
| int huffman_bits = ReadBits(3) + 2; | ||||
| int huffman_xsize = DIV_ROUND_UP(xsize, 1 << huffman_bits); | ||||
| int huffman_ysize = DIV_ROUND_UP(ysize, 1 << huffman_bits); | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| _huffman_bits_ gives the amount of subsampling in the entropy image. | ||||
| `huffman_bits` gives the amount of subsampling in the entropy image. | ||||
|  | ||||
| After reading the _huffman_bits_, an entropy image stream of size | ||||
| _huffman_xsize_, _huffman_ysize_ is read. | ||||
| After reading the `huffman_bits`, an entropy image stream of size | ||||
| `huffman_xsize`, `huffman_ysize` is read. | ||||
|  | ||||
| The meta Huffman code, identifying the five Huffman codes per meta | ||||
| Huffman code, is coded only by the number of codes: | ||||
| @@ -931,12 +928,12 @@ meta_codes[(entropy_image[(y >> huffman_bits) * huffman_xsize + | ||||
|                           (x >> huffman_bits)] >> 8) & 0xffff] | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| The _huffman_code[5 * meta_code + k]_, codes with _k_ == 0 are for the | ||||
| The `huffman_code[5 * meta_code + k]`, codes with _k_ == 0 are for the | ||||
| green & length code, _k_ == 4 for the distance code, and the codes at | ||||
| _k_ == 1, 2, and 3, are for codes of length 256 for red, blue and alpha, | ||||
| respectively. | ||||
|  | ||||
| The value of k for the reference position in _meta_code_ determines the | ||||
| The value of _k_ for the reference position in `meta_code` determines the | ||||
| length of the Huffman code: | ||||
|  | ||||
|   * k = 0; length = 256 + 24 + cache_size | ||||
| @@ -947,12 +944,12 @@ length of the Huffman code: | ||||
| 6 Overall Structure of the Format | ||||
| --------------------------------- | ||||
|  | ||||
| Below there is a eagles-eye-view into the format in Backus-Naur form. It  | ||||
| does not cover all details. End-of-image EOI is only implicitly coded | ||||
| into the number of pixels (xsize * ysize). | ||||
| Below is a view into the format in Backus-Naur form. It does not cover | ||||
| all details. End-of-image (EOI) is only implicitly coded into the number | ||||
| of pixels (xsize * ysize). | ||||
|  | ||||
|  | ||||
| #### Basic structure | ||||
| #### Basic Structure | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| <format> ::= <RIFF header><image size><image stream> | ||||
| @@ -961,7 +958,7 @@ into the number of pixels (xsize * ysize). | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
|  | ||||
| #### Structure of transforms | ||||
| #### Structure of Transforms | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| <optional-transform> ::= 1-bit <transform> <optional-transform> | 0-bit | ||||
| @@ -974,11 +971,11 @@ into the number of pixels (xsize * ysize). | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
|  | ||||
| #### Structure of the image data | ||||
| #### Structure of the Image Data | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| <entropy-coded image> ::= <color cache info><optional meta huffman><huffman codes> | ||||
|                           <lz77-coded image> | ||||
| <entropy-coded image> ::= <color cache info><optional meta huffman> | ||||
|                           <huffman codes><lz77-coded image> | ||||
| <optional meta huffman> ::= 1-bit value 0 | | ||||
|                             (1-bit value 1; | ||||
|                             <huffman image><meta Huffman size>) | ||||
| @@ -995,7 +992,7 @@ into the number of pixels (xsize * ysize). | ||||
|                        (<lz77-coded image> | "") | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
|  | ||||
| A possible example sequence | ||||
| A possible example sequence: | ||||
|  | ||||
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||||
| <RIFF header><image size>1-bit value 1<subtract-green-tx> | ||||
|   | ||||
		Reference in New Issue
	
	Block a user