- WEBP_MUX_INVALID_PARAMETER: was used only at one place, and that too
should actually be an assert().
- WEBP_MUX_ERROR: was never used.
Change-Id: I8883cb4dfae7a7918507501f21fced0c04dda36a
minor revision shouldn't matter, we only check major revision number.
Bumped all version numbers so that incompatibility starts *now*
Change-Id: Id06c20f03039845ae4cfb3fd121807b931d67ee4
the VP8 decode functions do not need to be public; only
GetInfo/CheckSignature need to be marked extern as they're used by
libwebpmux.
Change-Id: Id9ab4d6166b0271cf5d04563c6dac1fcc84adbdc
- Match offsets, duration, width/height for frames/tiles and enforce
some constraints.
- Note that this also means using 'int's instead of 'uint32_t's for
16-bit and 24-bit fields.
Change-Id: If0b229ad9fce296372d961104aa36731a3b1304b
could have led to some negative overflow on 32bit arch
(if it was not for the "total_size == (size_t)total_size" test)
Change-Id: I7640340b605b9c674d30dd58a1e2144707299683
the extended file format is still under development and related
libs/binaries are not fit for release
configure:
--enable/disable-libwebpmux; default is disabled.
makefile.unix:
src/mux/libwebpmux.a and examples/webpmux must be explicitly specified
Makefile.vc:
$(DIRLIB)\libwebpmux.lib and $(DIRBIN)\webpmux.exe must be explicitly
specified
Change-Id: I8246746b256010dd2a2e4de58291222d7eaf0457
The image_info_ was used only in GetImageCanvasWidthHeight(). So, now
we infer it from data there.
This removal fixes a bug: earlier, 'image_info' wasn't initialized in
the WebPMuxCreate() flow, and so the canvas width/height were being
calculated to be zero.
Also, a related refactoring: Combine CreateImageInfo() and
CreateDataFromImageInfo() into a single function CreateFrameTileData().
Change-Id: I7b0afb0d36dc6e13b9d6a1135fb027aa4e03716c
Evaluated the impact of this change over 1000 image corpus.
The compression density is up (on average) by 1.2% and encoding time has
gone down considerably from 716 ms (per file) to 146 ms (per file)
(4.9X improvement in encoding time).
Change-Id: Ida562cc0bfe18c9d6f5f00873c95f8396b480eab
Adds new methods WebPPictureARGBToYUVA() and WebPPictureYUVAToARGB()
Depending on the value of picture->use_argb_input,
the main call WebPEncode() will convert appropriately.
Note that both conversions are lossy, so it's recommended to:
* use YUVA input for lossy compression (picture->use_argb_input=0)
* use ARGB input for lossless compression (picture->use_argb_input=1)
Change-Id: I8269d607723ee8a1136b9f4999f7ff4e657bbb04
* add a real proper pointer for holding memory chunk pointer
(instead of using y and argb fields)
* polish the doc with details
* add a WebPPictureView() that extract a view from a picture
without any copy (kind of a fast-crop).
* properly snap the top-left corner for Crop/View. Previously,
the luma position was not snapped, and was off compared to the chroma.
Change-Id: I8a3620c7f5fc6f7d1f8dd89d9da167c91e237439
'Set' and 'Get' methods for images take/return a bitstream as input,
instead of separate 'image' and 'alpha' arguments.
Also,
- Make WebPDataCopy() a public API
- Use WebPData for storing data in WebPChunk.
- Fix a potential memleak.
Change-Id: I4bf5ee6b39971384cb124b5b43921c27e9aabf3e
Check for valid bounds on the 'dist' in backward reference case.
Clamp it to 1 in case of zero and negative values.
Change-Id: I78e956d4595955efa02b1f9628b475093f6ee001
Adds a test of enc->mb_h_ to VP8IteratorProgress() to avoid a division
by zero. Fixes issue #121.
Original patch from even rouault (even dot rouault at gmail dot com).
Change-Id: Ie5fcc1821860c0a9366d5aa11f3aded4f5b98ed7
This limit corresponds to default histo_bits=3 for images upto sizes 400x400.
Any image higher than this dimension will bump up the histo_bits to 4 internally.
Change-Id: Ic8ba3dcd50e9c588cbbc4a0457289086498ff4ee
Fix inequality assertion on number of palette colors.
Fix inequality assertion test in BackwardReferencesHashChainFollowChosenPath.
Change-Id: Ie3242f1bbeaf96db91b839b6732ccce2634cebf3
- Move TAG_ID to webp/mux.h
- Rename it to WebPChunkId
- Rename IDs to WEBP_CHUNK_<tag>
- Remove "name" param from ChunkInfo struct and related changes.
- Rename WebPMuxNumNamedElements to WebPMuxNumChunks().
- WebPMuxNumChunks() takes WebPChunkId as param.
Change-Id: Ic6546e4a9ab823b556cdbc600faa137076546a2b
moves the implementation to ParseHeadersInternal. this also allows
decoding to start at a VP8X sub-chunk, e.g. 'ALPH'.
Change-Id: I06791f87d90f888de32746ecb02705e4b0ff227a
Used when we don't code a Huffman Image.
-> Simplify the code quite some because we don't have to
deal with special cases of histo bits
Change-Id: I0c3f46cbf3b501e021c093e07253e7404c01ff4f
ParseVP8X was checking for presence of extra 20 bytes (after RIFF header).
This check should not be executed for non-mux (non-VP8X) images.
Change-Id: I3fc89fa098ac0a53102e7bbf6c291269817c8e47
Updated histo_bits to 3 from 4 and changed the quality threshold for inner loop for HashChainFindCopy.
Impact: 0.5%-0.8% better bpp with 15%-20% hit on encoding throughput
at default encoding settings.
Change-Id: I316ef88403148b1e19036fa0817d944eb0301255
Ensure that the lossless bit-stream doesn't allow for such cases and
safe-gaurd decoder against indefinite recursion.
Change-Id: Ia6d7f519291de8739f79a977a5800982872aae71
When importing BGRA or RGBA data for encoding, provide variants of
the WEBPImportPicture API for RGBX and BRGX data meaning the alpha
channel should be ignored.
Author: noel@chromium.org
from Chromium patch: https://chromiumcodereview.appspot.com/10496016/
Change-Id: I15fcaa4160c69a2b5549394204b6e6d7a1c5d333
VP8-lossy will now avoid writing an ALPH chunk if the
alpha values are trivial.
+ changed DumpPicture() accordingly in cwebp
+ prevented the -d option to be active with lossless
(DumpPicture wouldn't work).
Change-Id: I34fdb108a2b6207e93fa6cd00b1d2509a8e1dc4b
The new modes are
MODE_rgbA
MODE_bgrA
MODE_Argb
MODE_rgbA_4444
It's binary incompatible, since the enums changed.
While at it, i removed the now unneeded KeepAlpha methods.
-> Saved ~12k of code!
* made explicit mention that alpha_plane is persistent,
so we have access to the full alpha plane data at all time.
Incremental decoding of alpha was planned for, but not
implemented. So better not dragged this constaint for now
and make the code easier until we revisit that.
Change-Id: Idaba281a6ca819965ca062d1c23329f36d90c7ff
Change the lossless signature to 0x2f
Add 1 bit indicator for 'droppable (or trivial) alpha)'.
Add 3 bit lossless version (for future extension like yuv support).
Change the sub-resolution information to 3 bits implying range [2 .. 9]
Change-Id: Ic7b8c069240bbcd326cf5d5d4cd2dde8667851e2
Take picture and percent value storage location instead of VP8Encoder.
This will allow reuse by the lossless encoder.
Change-Id: Ic49dbc800cc3e2df60d20f4ebac277f68ed6031b
Previously, it used to assume any raw bitstream is a VP8 one.
Also,
- Factor out VP8CheckSignature() & VP8LCheckSignature().
- Use a local var for *data_ptr in ParseVP8Header() for
readability.
Change-Id: I0fa8aa177dad7865e00c8898f7e7ce76a9db19d5
frame's InitIo should not reset fancy_upsampling flag.
This flag (fancy_upsampling) is set via CustomSetup -> WebPIoInitFromOptions
and frame's InitIo() is resetting it to 0.
Change-Id: I64b54cdfba43799c0a5aa8e384575af5d6331674
CostModelBuild() and TrackBackwards() returns weren't checked
+ code clean-up
+ de-inline VP8LBackwardRefs non-critical methods
+ shuffle the .h around to group things together
+ extract some constants as #define's
+ fixed the "if (!(cc_init = ...)) {...}" constructs
+ removed some unneeded VP8L prefixes
Change-Id: Ic634cb87bc6b2033242d3e8e8731fab4c134f327
may be useful later for instance to bypass some code
if we know we don't use the Bundle+ColorMap transform.
Change-Id: I9dc70d18165b2363ad9ede763684ef3d8eba5903
we vary linearly lossless-method between 0 and 6,
and lossless-quality between 50 and 100, so that encoding
speed can go from 'quite fast' to 'rather slow'.
Impact on size is moderate, but visible.
Change-Id: I0b7917e7170eb50258afb1a4e248028cd9e9207d
- Make sure alpha flag is set in case of a lossless file with VP8X chunk.
The semantic of ALPHA_FLAG changes with this: it means the images
contain alpha (rather than ALPH chunk in particular).
- Update the mux container spec to add 1-line description of alpha
flag.
- Rename "HasLosslessImages()" to "MuxHasLosslessImages()", and other
similar function renames.
- Rename FeatureFlags to WebPFeatureFlags
- Elaborated a comment for a special case.
- A misc comment fix.
Change-Id: If212ccf4338c125b4c71c10bf281a51b3ba7ff45
Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
This saves ~26 bytes of headers.
* introduce new VP8LDecodeAlphaImageStream() for decoding
* use VP8LEncodeStream() for encoding
* refactor code a bit
still TODO: make the alpha-quality/enc-method user-configurable
Change-Id: I23e599bebe335cfb5868e746e076c3358ef12e71
Limit the overall number of transformations to 4 and disallow any
duplicate transform for decoding an image.
Change-Id: Ic4b0ecd553db96702e117fd073617237d95e45c0
this allows later customization of data output method.
No perf diff observed, even if ProcessRows is no longer inlined.
Change-Id: I6933a3612a9cf6c108cf2776dfde0ae80c6c07c0
+ small opportunistic fixes:
* allow NULL decoded_data to be passed to DecodeStream
and clarity (with assert()) when to do so
* AllocateAndInitRescaler() was already setting error status,
as it should. No need to do it at caller's site
Change-Id: I30867e596564a7f459a0d1ddbf6f5d312414b7fd
* RIFF header is omitted
* rename NewVP8LEncoder and DeleteVP8Encoder
* change the signature to take a "const WebPPicture*"
(it was non-const just because we were setting some error potentially)
* made the pic_ field const in VP8LEncoder too.
* trap the bitwriter::error_ too
* simplify some signatures to take WebPPicture* instead
of unneeded VP8LEncoder*
VP8LEncodeStream() will be called directly to compress alpha channel
header-less.
Change-Id: Ibceef63d2b3fbc412f0dffc38dc05c2dee6b6bbf
This is a border-case situation: the picture is not const, because
we're change its error status. But taking it non-const forces
the caller to carry a non-const picture all around the code just
in case (0.00001% of the time?) something bad happen.
This pretty much the same as making all objects non-const because
we'll eventually call delete or free() on them, which is quite a
non-const operation. Well... Better allow constness enforcement for
the remaining 99.9999% of the code.
Change-Id: I9b93892a189a50feaec1a3a518ebf488eb6ff22f
now, we only use 2 bits for the filtering method, and 2 bits
for the compression method.
There's two additional bits which are INFORMATIVE, to specify
whether the source has been pre-processed (level reduction)
during compression. This can be used at decompression time
for some post-processing (see DequantizeLevels()).
New relevant spec excerpt:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ChunkHeader('ALPH') |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Rsv| P | F | C | Alpha Bitstream... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Compression method (C): 2 bits
: The compression method used:
* `0`: No compression.
* `1`: Backward reference counts encoded with arithmetic encoder.
Filtering method (F): 2 bits
: The filtering method used:
* `0`: None.
* `1`: Horizontal filter.
* `2`: Vertical filter.
* `3`: Gradient filter.
Pre-processing (P): 2 bits
: These INFORMATIVE bits are used to signal the pre-processing that has
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.
* `0`: no pre-processing
* `1`: level reduction
Decoders are not required to use this information in any specified way.
Reserved (Rsv): 2 bits
: SHOULD be `0`.
Alpha bitstream: _Chunk Size_ - `1` bytes
: Encoded alpha bitstream.
This optional chunk contains encoded alpha data for a single tile.
Either **ALL or NONE** of the tiles must contain this chunk.
The alpha channel data is losslessly stored as raw data (when
compression method is '0') or compressed using the lossless format
(when the compression method is '1').
Change-Id: Ied8f5fb922707a953e6a2b601c69c73e552dda6b
will be called by alpha post-processing, although doing nothing for now.
Gradient smoothing would be nice-to-have here. Patch welcome!
Change-Id: I534cde866bdc75da22d0f0a6d1373c90e21366f3
* Method #1 is now calling the lossless encoder on the alpha plane.
Format is not final, it's just a first draft. We need ad-hoc functions.
* removed now useless utils/alpha.*
* added utils/quant_levels.h instead
* removed the TCoder code altogether
Change-Id: I636840b6129a43171b74860e0a0fc5bb1bcffc6a
- Separate out 'CHUNK_INDEX' from 'TAG_ID' (this is to help with the
situation where two different tags - "VP8 " and "VP8L" can have the
same TAG_ID -> IMAGE_ID).
- Some internal methods now take 'CHUNK_INDEX' param instea of 'TAG_ID'
as appropriate.
- Add kChunks[] entry for lossless.
- Rename WebPMuxImage.vp8_ --> WebPMuxImage.img_
- SetImage() and AddFrame/Tile() infer whether the bitstream is a
lossless one based on LOSSLESS_MAGIC_BYTE. The correct tag is stored
based on this.
Also, handle the case when GetVP8Info/GetVP8LInfo() fails.
Change-Id: I6b3bc9555cedb791b43f743b5a7770958864bb05
-> lot of simplifications ensue and we should be able to get rid of
ClearHuffmanTreeIfOnlyOneSymbol() too, in a subsequent patch.
Change-Id: Ic4c51d05e4b1970e37f94ffd85fae6a02e4a6422
Allocate big chunk of memory in GetHuffBitLengthsAndCodes, instead of allocating in a loop.
Also fixed the potential memleak.
Change-Id: Idc23ffa306f76100217304444191a8d2fef9c44a
The size written in VP8L header should be without padding.
(Also clarified this code using consts).
Change-Id: Ic6583d760c0f52ef61924ab0330c65c668a12fdc
When we are at end-of-stream, but haven't decoded all pixels, we should
return an error.
Also remove an obsolete TODO.
Change-Id: I3fb1646136e706da536d537a54d1fa487a890630
- Symbols added to the tree are valid inside HuffmanTreeBuildExplicit().
- In HuffmanTreeBuildImplicit(), make sure 'root_symbol' is
valid in case of a single symbol tree.
Change-Id: I7de5de71ff28f41e2d6228b29ed8dd4a20813e99
if histogram_image_size is reduced in when writing the histogram_image
the bit arrays would leak any remaining elements. store their element
count separately.
Change-Id: I710142a11ebd4325faec7bd65c2d2572aae19307
[Basically, the condition "src - dist < data" can be wrongly evaluated
to be false if "src < dist" due to underflow. Instead, "src - data <
dist" is the correct condition, as "src > data" is always true and so
there would never be an underflow].
Change-Id: Ic9f64bfe76a9acae97abc1fb7c1f4868e81f1eb8
CompareHuffmanTrees() and SetBitDepths()):
- Move 'tree_size' initialization and malloc for 'tree + tree_pool'
outside the loop.
- Some renames/tweaks for readability.
Change-Id: I5cb3cc942afac6e9f51a0b97c57ee897677a48a2
The current implementation doesn't take care one byte
signature and associated one byte padding (for odd sized chunk).
Change-Id: I35b81d0644818cdba38189aa48c75db5f92e68f4
The new method VP8LGetBackwardReferences hides internal
heuristics used for choosing RLE or LZ77 based refs.
- Tuned VP8LHashChainFindCopy for better compression at higher Q.
- Refactored code.
- Removed the unused method VP8LVerifyBackwardReferences.
Change-Id: Ibb7bb072bab5a49a001577a20d88226f52e6c663
arrays can be passed directly as only their members are being modified.
this also reduces the allocation for bit_codes[] by taking the
sizeof(type)=2 rather than sizeof(ptr)=4/8 in one case.
Change-Id: Idad20cead58c218b58d90b71699374fefd01cad9
add proper cpu-detection for Android targets
Fixes issue #118 (and is a better solution for #117).
based on patch by pepijn vaneeckhoudt
Change-Id: I6b00ea6d51ca658ccf6a3d55b87b99c01c6805be
* lossless_encoder: (46 commits)
split StoreHuffmanCode() into smaller functions
more consolidation: introduce VP8LHistogramSet
big code clean-up and refactoring and optimization
Some cosmetics in histogram.c
Approximate FastLog between value range [256, 8192]
Forgot to update out_bit_costs to symbol_bit_costs at one instance.
Evaluate output cluster's bit_costs once in HistogramRefine.
Simple Huffman code changes.
Lossless decoder: remove an unneeded param in ReadHuffmanCodeLengths().
Reducing emerging palette size from 11 to 9 bits.
Move GetHistImageSymbols to histogram.c
Improve predict vs no-predict heuristic.
code-moving and clean-up
reduce memory usage by allocating only one histo
Restrict histo_bits to ensure histo_image size is under 32MB
further simplification for the meta-Huffman coding
A quick pass of cleanup in backward reference code
Make transform bits a function of encode method (-m).
introduce -lossless option, protected by USE_LOSSLESS_ENCODER
Run TraceBackwards for higher qualities.
...
Conflicts:
src/enc/webpenc.c
Change-Id: I9a5d98cba0889ea91d10699466939cc283da345a
VP8LHistogramSet is container for pointers to histograms that
we can shuffle around. Allocation is one big chunk of memory.
Downside is that we don't de-allocate memory on-the-go during
HistogramRefine().
+ renamed HistogramRefine() into HistogramRemap(), so we don't
confuse with "HistogramCombine"
+ made VP8LHistogramClear() static.
Change-Id: Idf1a748a871c3b942cca5c8050072ccd82c7511d
* de-inline some function
* make VP8LBackwardRefs be more like a vectorwith max capacity
* add bit_cost_ field to VP8LHistogram
* general code simplifications
* remove some memmov() from HistogramRefine
* simplify HistogramDistance()
...
Change-Id: I16904d9fa2380e1cf4a3fdddf56ed1fcadfa25dc
Profiled data: Profiled few images and found that in the function VP8LFastLog,
90% of time table lookup is performed, while rest of time (10%) call to log
function is made. Typical lookup accounts for 10 CPU instructions and call to
log 200 instruction counts. The weighted average comes out to be 30
instructions per call. For mid qualities (25-75), this function (VP8LFastLog)
accounts for 30-50% of total CPU cycles (via call path: VP8LCOlorSpaceTransform
-> PredictionCostCrossColor -> ShannonEntropy). After this change, the log is
called less that 1% of time, with average instructions being 15 per call.
Measured the performance over 1000 files for various qualities and found
overall compression speedup between 10-15% (in quality range [0, 75]). The
compression density loss is around 0.5% (though at some qualities, compression
is little better as well).
Change-Id: I247bc6a8d4351819c871f19d65455dc23aea8650
Avoid bit_costs evaluated every time in function HistogramDistance. Also moved
VP8LInitBackwardRefs and VP8LClearBackwardRefs to backward_references.h
Change-Id: Id507f164d0fc64480aebc4a3ea3e6950ed377a60
No empty trees are codified with the simple Huffman code. The simple Huffman
code is simplified to be either a 1-bit code or 8-bit code for symbols.
Change-Id: I3e2813027b5a643862729339303d80197c497aff
This improves compression density. For example, at quality 95 on 1000 PNGs:
bpp(before) = 2.447 and bpp(after) = 2.412
Change-Id: I19c343ba05cca48a6940293721066502a5c3d693
* removed use_lz77_ field, and added cache_bits_ one.
* use more BackwardRefs params
* move code around to organize more logically
* reduce memory use on histo
...
Change-Id: I833217a1b950189cf486704049e3fe28382ce335
instead of lz77+rle
* introduce VP8LBackwardRefs structure and simplify the code by not passing
around {PixOrCopy/int} pairs.
More functions should be turned into using this struct (TODO(later)).
Change-Id: I69c5c9fa61dddd61a2abc2824d70b8606a1c55b6
* don't transmit the number of Huffman tree group explicitly
* move color-cache information before the meta-Huffman block
* also add a check that color_cache_bits is in [1..11] range, as per spec.
Change-Id: I81d7711068653b509cdbc1151d93e229c4254580
(Essentially, there was no need of a separate 'argb_palette' array. And
argb_palette[0] was never being set).
Change-Id: Id0a8c7e063d3af41e39fc9b8661611b51ccc55cd
make elements of "Multiplier" struct unsigned, so that any negative values are
automatically converted to "mod 256" values.
Change-Id: Iab4f9bacc50dcd94a557944727d9338dbb0982f7
so that it uses original values of left, top etc for prediction rather than the
predicted values of the same. Also, do some renaming in the same to make it
more readable.
Change-Id: I2fe94e35a6700bd437f5c601e2af12323bf32445
- VP8LEncAnalyze, EvalAndApplySubtractGreen, ApplyPredictFilter,
ApplyCrossColorFilter
- Added palette handling and transform buffer management in VP8LEncodeImage()
- Add Transforms (subtract Green, Predict, cross_color) to dsp/lossless.c.
These are more-or-less copied from src/lossless code.
After this Change, will implement the EncodeImageInternal() method.
Change-Id: Idf71f803c24b3b5ae3b5079b15e019721784611d
no version of msvc currently implements log2(). unconditionally define
NOT_HAVE_LOG2 in this case to simplify building libwebp sources in other
projects.
Change-Id: Ia9d985b1125553c5a8271d7e539bc1b4f898d749
src/dec/Makefile.am: add missing reference to vp8li.h
src/{dec,dsp,enc}/Makefile.am: move some headers to noinst_
Change-Id: I0e2bc69980bd8175d99ad0ab63f537ef9e425b77
src/dsp/lossless.c: In function 'VP8LInverseTransform':
src/dsp/lossless.c:312:23: warning: 'packed_pixels' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:304:16: note: 'packed_pixels' was declared here
src/dsp/lossless.c:258:34: warning: 'm.red_to_blue_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.red_to_blue_' was declared here
src/dsp/lossless.c:257:34: warning: 'm.green_to_blue_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.green_to_blue_' was declared here
src/dsp/lossless.c:255:33: warning: 'm.green_to_red_' may be used
uninitialized in this function [-Wuninitialized]
src/dsp/lossless.c:275:17: note: 'm.green_to_red_' was declared here
patch by pepijn vaneeckhoudt
Change-Id: Iffa4764487a75479df45e772169325cd9ee60d94
- use common file organization across subdir makefiles
- append lib/source/header list variables and sort
Change-Id: I0653e1c73a4552b0c43d21f321b22b4972d6e87b
* each with their own decoder instances.
* Refactor the incremental buffer-update code a lot.
* remove br_offset_ for VP8LDecoder along the way
* make VP8GetHeaders() be used only for VP8, not VP8L bitstream
* remove VP8LInitDecoder()
* rename VP8LBitReaderResize() to VP8LBitReaderSetBuffer()
(cherry picked from commit 5529a2e6d47212a721ca4ab003215f97bd88ebb4)
Change-Id: I58f0b8abe1ef31c8b0e1a6175d2d86b863793ead
import changes from experimental 5529a2e^
and enable build in autoconf and makefile.unix; windows will be treated
separately.
Change-Id: Ie2e177a99db63190b4cd647b3edee3b4e13719e9
Pulled from 5529a2e^.
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: I490a49cb0084d556a29d7c5257c986ea695ce60c
Pulled from the parent of the current version (5529a2e^).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: I6dae97fc381cd6c1d1640c4c565b2084a41ec955
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: I40538799dbf999abb9408ac83f55b897d8e22498
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: Ie57be21bf50ad83808c72aeb5fc706d9954d01d8
Pulled from the current HEAD (218c32e).
The history of this and related files is a bit entangled so rather
trying to split the changes and introduce some noise in master's history
we'll start with a fresh snapshot.
The file progression is still available in the experimental branch.
Change-Id: Id879be453a94d9f44ec8d47747823ca7297ae008
by packing the symbol map more efficiently.
This is mainly useful in making it harder to generate invalid bitstream:
before this change, one could code the same symbol twice. Now, it's
impossible, since we code the position using empty symbol slots, instead
of total position.
* Fix the HasOnlyLeftChild() naming while at it.
Change-Id: I63c56c80a4f04a86ac83eded1e3306329815b6c9
group like parameters together in prototypes, comments, move variable
declarations closer to first use.
Change-Id: Idd6bd87d0366d783fed83f4dd21bd7968cbe6948
_byteswap_ulong is defined in stdlib.h, release builds seem to pull it
in through a different path.
Change-Id: I510d2624150f89a4a77734bf3dc5b4db60a4ba95
- const / move declarations closer to first use
- remove unnecessary ()s
- don't return int unnecessarily with internal Init/Release
- compact some lines
Change-Id: If7ab505e417221debc356f21f075506939110a84
A call to Append/Update would index the parts_ array w/-1 as num_parts_
had yet to be set by DecodePartition0. This would cause corruption
within the VP8Decoder member.
Fixes issue #106.
Change-Id: Ib9f2811594ff19e948a66fda862a4e0a384bb9aa
If the first call to WebPIUpdate contained all of partition 0, but not
enough to decode a macroblock the test for max macro block size could
fail (4096 bytes) due to a memory buffer offset not being updated.
This change offsets the buffer in the same manner as Append().
Fixes issue #105.
Change-Id: I90ca3d0dde92eedc076933b8349d2d297f2469e8
- remove some unused functions
- move global arrays from data to read only section
- explicitly cast malloc returns; not specifically necessary, but helps
show intent
- miscellaneous formatting
Change-Id: Ib15fe5b37fe6c29c369ad928bdc3a7290cd13c84
As with the loop filter, implementing this with intrinsics is difficult
because we require subscript access for reading and writing 32 bits at a
time.
Approximately 5% decode speed improvement. This could be increased by
exposing TransformOne and rewriting TransformTwo to only handle dual
IDCTs.
Change-Id: Idd409264ab5d154a537107a1d54b419a48f7c1a8
Add a dirty_ flag to keep track of updated probabilities and the need to
recompute the level costs.
This only makes a difference for "-m 2" method which was sub-optimal.
But it's overall cleaner to have this flag.
Change-Id: I21c71201e1d07a923d97a3adf2fbbd7d67d35433
This proved being ok, even for large pictures, provided one
takes care of overflow. When an overflow is bound to occur, the
counters are renormalized.
Overall, shaves ~12k of memory.
Change-Id: I2ba21a407964fe1a34c352371cba15166e0c4548
the p0 proba was incorrectly accumulated. Merging its contribution into
the LevelCost[] was creating more problems than anything (esp. with trellis)
so let's just not.
Change-Id: I4c07bfee471085df901228d97b20a4d9606ba44e
These will report the 7x7-averaged PSNR or SSIM, using the
new internal function WebPPictureDistortion().
This is for information only. These flags have no encoding impact.
+misc opportunistic cosmetics
Change-Id: I64c0a7eca679134d39062e438886274b22bb643f
to 'clean up' the fully-transparent area and make it more compressible
new cwebp flags: -alpha_cleanup (off by default, since gain is not 100% guaranteed)
Change-Id: I74d77e1915eee146584cd61c9c1132a41db922eb
gcc-4.0 defines __PIC__ but not __pic__. This leaves the test for
__pic__ should the inverse case exist.
Fixes issue #103; build failing with:
"error: can't find a register in class 'BREG' while reloading 'asm'"
Change-Id: Ia767a733de6ce0294146f9477ff9c46f0ebe13b0
Make sure chunk->data and wpi are not leaked by
MuxAddFrameTileInternal() in case of MEMORY_ERROR in ChunkSetNth().
Change-Id: Ie20e84b92f4bdcb7c3b94520f36b20dd2e730545
Add extern to kChunks[] in muxi.h.
Fixes:
ld: duplicate symbol _kChunks in .libs/muxinternal.o and .libs/muxedit.o
Change-Id: Ibeb060f7fdec5fe904097a2443f0cda2f7ede884
MAX_LEN -> max_len
This was sub-optimal at the end of the picture, when there's
less than MAX_LEN bytes left to match.
Change-Id: I5ebe1fca4e7c112dcd34748a082d1c97f95eb099
often reduces compressed size by ~10's of bytes
+ refactored / sped-up the prediction code (gradient: ~30% faster)
Change-Id: I26bd983655dad4f85d5c5ddc20a1980f384c4dd6
.. where only 2 filtering modes are potentially
tried, instead of all of them. This is fast than the exhaustive 'best'
mode, and not much worse.
Options for cwebp are:
-alpha_filter none
-alpha_filter fast (<- default)
-alpha_filter best (<- slow)
Change-Id: I8cb90ee11b8f981811e013ea4ad5bf72ba3ea7d4
Fix bug for Alpha data in RGBA_4444 color-mode.
The Alpha data is required to be clipped [0, 15] while
converting from 8 bits to 4 bits.
Change-Id: I80705d575c35121beb9633a05ec8823435c79586
Add predictive filtering option for Alpha plane.
Valid range for filter option is [0, 5] corresponding to prediction
methods none, horizontal, vertical, gradient & paeth filter.
The prediction method 5 will try all the prediction methods (0 to 4)
and pick the prediction method that gives best compression.
Change-Id: I9244d4a9c5017501a9696c7cec5045f04c16d49b
- add check for native log2 to configure
- use a common define (NOT_HAVE_LOG2) to enable use of local library
version for non-autoconf platforms without their own version,
currently msvc and android
This uses a negative (NOT_HAVE_) to simplify the ifdef
Change-Id: Id0610eed507f8bb9c5da338918112853d5c8127a
alpha information is not to be found at RIFF chunks level, not
in the VP8 bitstream (that was a tmp hack)
Change-Id: Idd1629c696b03c26f6f30650d7216f627f1761df
This speeds things up for long message, while not damaging
the stats too much for usual-sized cases
Change-Id: I3f45e28b771d701e2e1da11eb800de18c4ed12fc
- Fix the off-by-one diff when cropping with simple-filter.
- Fix a bug in incremental decoding in case of alpha.
- In VP8FinishRow(), do not decode alpha when y_start > y_end.
- Correct output of alpha channel for MODE_ARGB.
- Correct output of alpha channel for MODE_RGBA_4444.
Change-Id: I785763a2a704b973cc742ad93ffbb53699d1fc0a
Extend WebP Encode functionality to encode Alpha data and produce
bit-stream (RIFF+VP8X+ALPH+VP8) corresponding to WebP-Alpha.
Change-Id: I983b4cd97be94a86a8e6d03b3b9c728db851bf48
Extend WebP decode functionality to extract Alpha data (support ALPH
chunk) and decode the Alpha plane using Alpha (utils/alpha) core-lib.
Change-Id: I6f0ee1b189c13caba3c1dd9b681383bfd00aa212
Add code for Alpha encoding & decoding. The alpha compression is done
via backward reference counts encoded with Arithmetic encoder (TCoder).
Also provided is lossy Alpha pre-processing option via level-quantizations
using kNN heuristic.
Change-Id: Ib6b13530c1a4ab6493edcb586ad29fe242bc1766
- Add alpha support in mux.
- Remove WebPMuxAddNamedData() and WebPMuxGetNamedData() APIs. Add WebPMuxSetImage(), WebPmuxGetImage() and WebPMuxDeleteImage() APIs instead.
- Refactor code using WebPImage struct.
- Corresponding changes in webpmux binary.
- WebPMuxSetImage()/AddFrame()/AddTile() can now work with data which starts from "RIFF...". This simplifies reading a single-image webp file and adding it as an image/frame/tile in mux.
Change-Id: I7d98a6407dfe55c84a682ef7e46bc622f5a6f8d9
For odd-sized chunk-payload, there's one Byte padding at the end to make the
next chunk start at even-address. The chunk-skip logic needs to be updated to
handle such cases.
Change-Id: I8939ed07584a6195092fbe703eb0ea711fa5f153
Changed function signature of method WebPMuxCreate and few other minor nits.
Header file has been re-organised to have declaration of set/get/Delete
methods for different use-cases (Metadata, ColorProfile etc) in one place
instead of declaring all Set methods together followed by Get & Delete.
Change-Id: I52f6dffd216b1c343423d55a5b45fa1b9b9c1347
This change adds the WebP Mux library for manipulating the WebP Mux
Container. The library document and command line tool exhibiting the
usage of this libary will follow in subsequent Git change.
Change-Id: I4cba7dd12307483185ad5a68df33af6c36c154c8
- Move common defines to dec/webpi.h
- Regularize naming and parameters of various "CheckAndSkip" functions.
Also they return VP8StatusCode for clarity.
- Move WebP header/chunk parsing functions to webpi.h
- Fix a bug in static function GetFeatures()
Change-Id: Ibc3da4ba210d860551724dc40741959750d66b89
This has been pointed as a useful information to have in the header (for
the non VP8-specs savvy ones)
Change-Id: I494b1da41dfafce882a94e3677d1cd6206bc504b
CloseHandle returns non-zero on success so earlier versions would leave
'ok' with a misleading value, though the return itself was correct.
Change-Id: I21b74a59d90f7bf1b484a55f3960962e933f577b
Was getting compiler error when I included bit_writer.h from non libwebp
directory. bit_writer.h includes vp8enci.h and that uses VP8BitWriter without
having it's definition.
Change-Id: I1ca82594292979b9eb7e60e2fffb22c16768dd30
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.
+ opportunistic clean-up:
* remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
* make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
* change OUT macro to DST
uint8_t is a gcc extension which msvc similarly supports, but for
greater compatibility, and to match the change already made in
dec/vp8i.h, update the remaining bitfield to use unsigned int.
Change-Id: Id9dca470345871e00e82893255a306dfe5d3fa29
Although it degrades quality, this option is useful to avoid the 512k
limit for partition #0.
If not enough to reach the lower bound of 4bits per macroblock header,
one should also limit the number of segments used (down to -segments 1)
See the man file for extra details.
Change-Id: Ia59ffac13176c85b809ddd6340d37b54ee9487ea
was missing from the RD-computation of intra-4x4 score.
Doesn't change anything significantly, it's just More Correct.
Change-Id: I25c5b53a810d97e6fb7f98c549fd23bbe55e1bf4
with assertions enabled the code would abort in
WebPWorkerChangeState with:
Assertion `worker->status_ >= OK'
without them the code would hang in the _cond_wait.
this change makes WebPWorkerChangeState a no-op in this case.
Change-Id: Iea855568bbdef2865ae61ab54473b3a7c230e91a
To be enabled with the flag WEBP_USE_THREAD.
For now it's only available on unix (pthread), when using Makefile.unix
Will be switched on more generally later.
In-loop filtering and output (=rescaling/yuv->rgb conversion)
is done in parallel to bitstream decoding, lagging 1 row behind.
Example:
examples/dwebp bryce.webp -v
Time to decode picture: 0.680s
examples/dwebp bryce.webp -v -mt
Time to decode picture: 0.515s
Change-Id: Ic30a897423137a3bdace9c4e30465ef758fe53f2
Add WEBP_EXTERN(type) macro which should make Windows DLL builds simpler
by allowing the signature to be changed.
Change-Id: I0cfa45dff779985680b1a38ddff30973a0d26639
RGB565 and ARGB4444 are only supported through the advanced decoding API.
ARGB being somewhat generic, there's an easy WebPDecodeARGB()
new function for convenience.
Patch by Vikas Arora (vikaas dot arora at gmail dot com)
Change-Id: Ic7b6f72bd70aca458d14e7fdd23679212430ebca
~5-10% faster.
Heavy 8bit arithmetic trickery!
Patch by Somnath Banerjee (somnath at google dot com)
Change-Id: I9fd2c511d9f631e9cf4b008c46127b49fb527b47
You can now use WebPDecBuffer, WebPBitstreamFeatures and WebPDecoderOptions
to have better control over the decoding process (and the speed/quality tradeoff).
WebPDecoderOptions allow to:
- turn fancy upsampler on/off
- turn in-loop filter on/off
- perform on-the-fly cropping
- perform on the-fly rescale
(and more to come. Not all features are implemented yet).
On-the-fly cropping and scaling allow to save quite some memory
(as the decoding operation will now scale with the output's size, not
the input's one). It saves some CPU too (since for instance,
in-loop filtering is partially turned off where it doesn't matter,
and some YUV->RGB conversion operations are ommitted too).
The scaler uses summed area, so is mainly meant to be used for
downscaling (like: for generating thumbnails or previews).
Incremental decoding works with these new options.
More doc to come soon.
dwebp is now using the new decoding interface, with the new flags:
-nofancy
-nofilter
-crop top left width height
-scale width height
Change-Id: I08baf2fa291941686f4ef70a9cc2e4137874e85e
This is a temporary workaround for issue #80.
Currently dec/dsp_sse2 is not included in Makefile.vc. Simply adding it
will cause a build error, however, as cl does not support aligning
function parameters producing, e.g.,
src\dec\dsp_sse2.c(228) : error C2719: 'q1': formal parameter with
__declspec(align('16')) won't be aligned
Change-Id: Id29e6802dd29110e59c4f6d13ffa5d4793c750a0
There was 1 unneeded sample line allocated for the filter cache in of simple filtering.
+ Add an explaining comment.
Change-Id: I775a596c8b8643e773e0eade8aa341dc23fb290f
Only builds with --enable-experimental require zlib currently.
A base install of mingw will not include the development headers and
library. libwebp itself will now build in such environments.
Additionally, remove -lz from **/Makefile.am, -lz will be added to LIBS
by AC_CHECK_LIB when necessary.
Change-Id: Iae8319cdf00162ecb7ed44661c02f40beb34f155
Use _MSC_VER as the intrinsics compile without /arch:SSE2 on x86.
Also avoids applying the same flag to all files which defeated the
purpose of the runtime cpu-detection.
Thanks to Frank B. for the suggestion!
Change-Id: Iae9933a3cee704e663d9bbd53d0fa68e8c025425
picture->error_code can be looked up for finer error diagnose.
Added readable error messages to cwebp too.
Should close bug #75 (http://code.google.com/p/webp/issues/detail?id=75)
Change-Id: I8889d06642d90702f698cd5c27441a058ddb3636
sometimes, gcc insert sse2 storeu instructions (like in VP8InitFilter())
with aligment requirements.
Bug was visible 'sometimes' in non-debug mode, when trying to use -af.
Change-Id: If3ec282bbbb9f9d0d33ca4b2c4bed46cd26fe495
we were reading past the end of the dqs[] array.
reported by Mathias Schindler (on cygwin only)
http://code.google.com/p/webp/issues/detail?id=71
Change-Id: Ib38c4c139e3cac3e8915626d63e16b403d6bbd63
+ add a simple rescaling function: WebPPictureRescale() for encoding
+ clean-up the memory managment around the alpha plane
+ fix some includes path by using "../webp/xxx.h" instead of "webp/xxx.h"
New flags for 'cwebp':
-resize <width> <height>
-444 (no effect)
-422 (no effect)
-400
Change-Id: I25a95f901493f939c2dd789e658493b83bd1abfa
This is a (minor) bitstream change: if the 'color_space' bit is set to '1'
(which is normally an undefined/invalid behaviour), we add extra data at the
end of partition #0 (so-called 'extensions')
Namely, we add the size of the extension data as 3 bytes (little-endian),
followed by a set of bits telling which extensions we're incorporating.
The data then _preceeds_ this trailing tags.
This is all experimental, and you'll need to have
'#define WEBP_EXPERIMENTAL_FEATURES' in webp/types.h to enable this code
(at your own risk! :))
Still, this hack produces almost-valid WebP file for decoders that don't
check this color_space bit. In particular, previous 'dwebp' (and for instance
Chrome) will recognize this files and decode them, but without the alpha
of course. Other decoder will just see random extra stuff at the end of
partition #0.
To experiment with the alpha-channel, you need to compile on Unix platform
and use PNGs for input/output.
If 'alpha.png' is a source with alpha channel, then you can try (on Unix):
cwebp alpha.png -o alpha.webp
dwebp alpha.webp -o test.png
cwebp now has a '-noalpha' flag to ignore any alpha information from the
source, if present.
More hacking and experimenting welcome!
Change-Id: I3c7b1fd8411c9e7a9f77690e898479ad85c52f3e
For now, SSE2 functions are compiled a-minima: only on platforms
where __SSE2__ is defined. Let's later add some autoconf-based
config to enable/disable at will.
One can disable SSE2 at run-time by hooking-up VP8GetInfo.
There is a new option "-noasm" in cwebp for that.
Output should be binary the same between C and SSE2 version. If not,
that's a bug!
patch by Christian Duvivier (cduvivier at google dot com)
Change-Id: Iae006c3cdcb7e8280e846cedb94d239dab1e42ae
to return the sum directly.
output is bitwise the same, speed up 1-2%. This is preparatory to a
more efficient SSE2 implementation.
Change-Id: I0bcdf05808c93420fbe9dcb75e5e7e55a4ae5b89
Makes things lighter at the expense of requiring the user
to be up-to-date for autotools.
patch by Jan Engelhardt (jengelh at medozas dot de)
Change-Id: Icfcab2d899828a213d9fade0dab350dacd0c070a
going down to strict -ansi c89 is quite overkill (no 'inline',
and /* */-style comments).
But with these fixes, the code compiles with the stringent flags:
-Wextra -Wold-style-definition -Wmissing-prototypes
-Wmissing-declarations and -Wdeclaration-after-statement
Change-Id: I36222f8f505bcba3d9d1309ad98b5ccb04ec17e3
WebPGetDecoderVersion() and WebPGetEncoderVersion()
will not return 0.1.2 encoded as 0x000102
dwebp and cwebp also have a new "-version" flag
Change-Id: I4fb4b5a8fc4e53681a386ff4b74fffb639fa237a
The object WebPIDecoder is available to store the
decoding state. The flow is typically:
WebPIDecoder* const idec = WebPINew(mode);
while (has_more_data) {
// ... (get additional data)
status = WebPIAppend(idec, new_data, new_data_size);
if (status != VP8_STATUS_SUSPENDED ||
break;
}
// The above call decodes the current available buffer.
// Part of the image can now be refreshed by calling to
// WebPIDecGetRGB()/WebPIDecGetYUV() etc.
}
WebPIDelete(idec);
Doing so, one can try and decode new macroblocks everytime fresh
bytes are available.
There's two operating modes: either appending fresh bytes, or
updating the whole buffer with additional data in the end.
The latter requires less memcpy()'s
main patch by Somnath Banerjee (somnath at google.com)
Change-Id: Ie81cbd0b50f175743af06b1f964de838b9a10a4a
use top_srcdir rather than top_builddir for AM_CPPFLAGS
add EXTRA_DIST to man Makefile. fixes distcheck target.
Change-Id: I308dc1c98f096de1efe188f63d040ef953598e78
non-zero dc-bit was mixed with non-zero ac-bit, preventing
finer optimization during VP8ReconstructBlock.
Depending on sparsity, i see 2-5% gain on average.
Change-Id: I7f34f18d0701c77837de3540b732e5b7d85d7c5d
(this makes initialization easier and will be helpful for incremental
decoding).
Modify ParsePartitions() to accommodate for truncated input.
Change-Id: I62f52078d6b7a2314a11880a20d9eac5b4714bd0
libvpx (and ffvp8) implementations are completely skipping the deblocking
step if loop_filter_level is 0, which is _not_ equivalent to performing
the loop-filtering with a 0 value for loop_filter_level. In the latter case
(which we followed), few pixels were modified here and there and you could
observe off-by-1 errors on few places.
This patch will reconcile the 3 implementations (since the difference
is minor, skipping the deblocking step will save CPU for virtually
no visible difference).
The spec will be made clearer about the expected behaviour:
* if the global loop_filter_level is 0, turn deblocking off.
* if it's not 0 but the local loop_filter_level ends up being 0 for whatever
reason (lf_delta, mode delta, ref delta, etc.) on a particular
macroblock, skip the deblocking too.
Change-Id: I157f1f8de463b8a76caddb3f347b7fbc7bd527d2
This will make the decoder skip the filtering process if needed,
resulting in speed-up, but also non-compliant (blocky?) output
+ Add a versioning check for VP8InitIo(), since we've adding a field to VP8Io
+ add some more error checks while at it
Change-Id: I4e9899edc24ecf8600cbb27aa4038490b7b2cef3
we'll always encode using absolute value, not relative ones.
Both methods use the same number of bits, so we'll go for the
simpler and most robust one.
+ add some extra checks about pic->u/v being NULL.
Change-Id: I98ea01a1a6b133ab3c816c0fbc50e18269bd2098
converts PNG & JPEG to WebP
This is an experimental early version, with lot of room
of later optimizations in both speed and quality.
Compile with the usual `./configure && make`
Command line example is examples/cwebp
Usage:
cwebp [options] -q quality input.png -o output.webp
where 'quality' is between 0 (poor) to 100 (very good).
Typical value is around 80.
More encoding options with 'cwebp -longhelp'
Change-Id: I577a94f6f622a0c44bdfa9daf1086ace89d45539
For very very short partitions, the initial GetByte() could
set eof_ to 1, whereas some bits were available but unread yet.
So we set eof_ to 0 last.
Change-Id: Ic6b68271bc72efa4de4e64e1f57307d1d8fb613c
by processing two rows at a time.
The [9 3 3 1] weights are decomposed as
[1 1 1 1] + [0 2 2 0] + [8 0 0 0] for better
reuse of sub-expressions, too.
Change-Id: I87ab549048ed249d38add73bb3241dfa0c583328
We only need to know _if_ there's some non-zero coeffs at all,
not the exact number of these.
+ code-style nits clean-up
Change-Id: Ice9acf00aef999d70affe4629cb56ae7709140ab
just flushing the pile:
- get rid of Put4 in dsp.c
- move LD4 enum value
- move more fields and code under the ONLY_KEYFRAME_CODE compile flag
- update test_ref.ppm reference (now with FANCY_UPSCALER)
- simplify VP8GetSignedValue()
- use HE and VE naming for prediction mode instead of H and V
- some missing c89 fixes
- code style nits
all test vectors still passing
not super-elegant, but should work without
having to go into deeper hacking (like: creating
an endian.h file, etc.)
Change-Id: I43100ec67a22393c87e11357078efffc750060f5
When FANCY_UPSCALING is defined, use a smoothing filter for upscaling
the U/V chroma fields. The filter used is a separable t[1 3 3 1] x [1 3 3 1]
filter. It can be easily changed in macros MIX_*.
The upscaling code reside on the thing shell between user and core
decoding (in webp.c), and not in the core decoder. As such, this smoothing
process can still be offloaded to GPU in some future and is not integral
part of the decoding process.
Coincidentaly: changed the way data is tranfered to user. For profile 2 (no
filtering), it used to be on a per-block basis. Now, for all profiles, we
emit rows of pixels (between 8 and 24 in height) when they are ready.
This makes the upscaling code much easier.
Will update the test vectors MD5 sums soon (as they'll be broken
after this change)
Change-Id: I2640ff12596cb8b843a4a376d7347447d9b9f778