The most common conditions are re-ordered and cached.
iter_min was recently introduced to make sure enough iterations
are made in cases where there are many matches (mostly uniform regions).
Now that those are properly analyzed, it becomes useless.
Change-Id: Id3010ee4ec66b84d602fcb926f91eb9155ad27f4
No need to find backward references for pixels in uniform regions
by looking at all pixels.
Only pixels at the same distance from the end need to be compared to.
Change-Id: I4f187e965f0667d3a929775726a412f7e69f6473
Return key/index if the query is found, and -1 otherwise.
The benefit of this is to save a hashing computation.
Change-Id: Iff056be330f5fb8204011259ac814f7677dd40fe
This is essentially a revert of a3611513d2bf465fd282d9dc45b3f72c79c232ad
and cfbcc5ece022fc74ae9b987e05c2807df0d82ec5.
Here is what happened: there was a corruption bug that eventually
got fixed by 0174d18d8b51f6c9228c70066a987c30a8132995.
But before finding the root, a3611513d2bf465fd282d9dc45b3f72c79c232ad
and cfbcc5ece022fc74ae9b987e05c2807df0d82ec5 hid the bug
by not imposing length of 1 when it was actually 2 or 3 (which does help
compression as a litteral is more efficient than an offset and a length
of size 2 or 3).
Change-Id: I6f18fc1f583a51ac9d8aab2508458264047cd493
The optimization for (len != MIN_LENGTH) actually only holds for
(len > MIN_LENGTH) but (len < MIN_LENGTH) can now happen as len can
be changed in the loop before.
Change-Id: I3f9f91a540206c80385c5fba96c3d64ab9536752
This is getting back to the old behavior which is actually better for
compression and speed with the latest patches.
Change-Id: I35884bab02589297c25d6e1e66dc5f13e05f7aa7
In case where the same offset is found in consecutive pixels,
the cost computation from one pixel can be re-used for the next.
Change-Id: Ic03c7d4ab95f3612eafc703349cfefd75273c3d7
and also recycle the malloc'd intervals
This avoids quite some malloc/free cycles during interval managment.
Change-Id: Ic2892e7c0260d0fca0e455d4728f261fb4c3800e
In a lot of cases, only one interval is used. This can cause
a lot of malloc/free cycles for only 56 bytes. By caching this
single interval and re-using it, we remove this cycle in most
frequent cases.
Change-Id: Ia22d583f60ae438c216612062316b20ecb34f029
In some cases, the hash chain for a function is filled several
times:
- GetBackwardReferences -> CalculateBestCacheSize ->
BackwardReferencesLz77 that computes the hash chain
- GetBackwardReferences ->
(not always) BackwardReferencesTraceBackwards ->
BackwardReferencesHashChainDistanceOnly that computes the hash
chain in a slightly different way
Speed and compression performance are slightly changed (+ or -)
but will be homogneized in a later patch.
Change-Id: I43f0ecc7a9312c2ed6cdba1c0fabc6c5ad91c953
Instead of comparing all the following pixels over len (which can
frequently reach the maximum MAX_LENGTH=4096 for some images),
intervals are stored and compared.
Change-Id: I0dafef6cc988dde3c1c03ae07305ac48901d60ee
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.
Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.
Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
This makes the chains more efficient and a larger variety of data is tested.
0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by
callgrind.
0.16 % compression gain for lossy alpha ?!
Change-Id: I888120133352799eb14f5f602c7f40ab404bd665
a total impact of 1 % on encoding speed
This allows for performance neutral removal of the binary search
in cache bits selection. This will give a small improvement in
compression density.
Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a
0.21 % compression density improvement for 1000 png corpus in
lossless mode
0.50 % compression density improvement for 1000 png corpus in
lossy mode
Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d
do not do length 2 matches far away
speedup for non compressible data by inserting two literals at a time
when no matches are found
Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell).
Note that the two 32-bit multiples are not strictly equivalent
to the 64-bit one, since we're missing one carry propagation.
In practice, no observable difference was seen because of this
slightly different hashing result.
Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab
Simplify and speedup backward references for low-effort settings by evaluating
LZ77 references only. This change speeds up compression by 10-25% at lower
(q <= 25) quality range with a slight drop (0.2%) in the compression density.
Change-Id: Ibd6f03b1a062d8ab9191786c2a425e9132e4779f
Disable costly TraceBackwards heuristic for computing the backward references
for low_effort (method=0) compression.
The TraceBackwards heuristic is already disabled for lower (q < 25) quality
range. Following is the compression data for 1000 image corpus for q >= 25.
This speeds up compression (q >= 25) by a factor of 2.5-3X with slight loss of
compression density (0.7% for lower quality range and 1.2% for higher qualities).
Change-Id: I256c9e2137c7de4083f423ea32ee12d3b0f46253