A bug fix in zlib 1.2.12 resulted in a slight slowdown (1-2%) of
deflate. This commit provides the option to #define LIT_MEM, which
uses more memory to reverse most of that slowdown. The memory for
the pending buffer and symbol buffers is increased by 25%, which
increases the total memory usage with the default parameters by
about 6%.
madler/zlib#ac8f12c97d1afd9bafa9c710f827d40a407d3266
Just in case this is the very first call to longest match, we should
instead assign the function pointer instead of the function itself. This
way, by the time it leaves the stub, the function pointer gets
reassigned. This was found incidentally while debugging something else.
Commit bc5915e2de ("Fixed unsigned integer overflow ASAN error when
hash_head > s->strstart.") removed hash_head != 0 checks in fast,
medium and slow deflate, because it improved performance [1].
Unfortunately, the attached test started failing after that.
Apparently, as the comments suggest, the code implicitly relies on
matches with the beginning of the window being skipped. So restore the
check.
[1] https://github.com/zlib-ng/zlib-ng/pull/772#issuecomment-710760300
zlib-ng/deflate_medium.c:244:47: runtime error: unsigned integer overflow: 58442 - 58452 cannot be represented in type 'unsigned int'
Co-authored-by: Mika Lindqvist <postmaster@raasu.org>
Co-authored-by: Hans Kristian Rosbach <hk-git@circlestorm.org>
This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.
The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.
zlib-ng used to fail when compiled with UBSan with this error:
deflate_slow.c:112:21: runtime error: unsigned integer overflow: 45871 - 45872 cannot be represented in type 'unsigned int'
The bug occurs in code added to zlib-ng under `#ifndef NOT_TWEAK_COMPILER`.
The original code of zlib contains a loop with two induction variables:
s->prev_length -= 2;
do {
if (++s->strstart <= max_insert) {
functable.insert_string(s, s->strstart, 1);
}
} while (--s->prev_length != 0);
The function insert_string is not executed when
!(++s->strstart <= max_insert)
i.e., when
!(s->strstart + 1 <= max_insert)
!(s->strstart < max_insert)
max_insert <= s->strstart
The function insert_string is executed when
++s->strstart <= max_insert
i.e., when
s->strstart + 1 <= max_insert
s->strstart < max_insert
The function is executed at most `max_insert - s->strstart` times, following the
exit condition of the do-while `(--s->prev_length != 0)`. If the loop exits
after evaluating the exit condition once, the function is executed once
independently of `max_insert - s->strstart`. The number of times the function
executes is the minimum between the number of iterations in the do-while loop
and `max_insert - s->strstart`.
The number of iterations of the loop is `mov_fwd = s->prev_length - 2`, and we
know that this is at least one as otherwise `--s->prev_length` would overflow.
The number of times the function insert_string is called is
`min(mov_fwd, max_insert - s->strstart)`
The struct contains pointers to select functions to be used by the
rest of zlib, and the init function selects what functions will be
used depending on what optimizations has been compiled in and what
instruction-sets are available at runtime.
Tests done on a haswell cpu running minigzip -6 compression of a
40M file shows a 2.5% decrease in branches, and a 25-30% reduction
in iTLB-loads. The reduction i iTLB-loads is likely mostly due to
the inability to inline functions. This also causes a slight
performance regression of around 1%, this might still be worth it
to make it much easier to implement new optimized functions for
various architectures and instruction sets.
The performance penalty will get smaller for functions that get more
alternative implementations to choose from, since there is no need
to add more branches to every call of the function.
Today insert_string has 1 branch to choose insert_string_sse
or insert_string_c, but if we also add for example insert_string_sse4
then that would have needed another branch, and it would probably
at some point hinder effective inlining too.
** Partial merge of this commit, based on a8c94e9f5a3b9d3c62182bcf84e72304a3c1a6e5
Excludes changes to fill_window_sse.c, changes to fill_window_c() in deflate.c
and several unrelated changes in the commit.
* Separate common inlines and macros to deflate_p.h
* Separate deflate_fast related code to deflate_fast.c
* Separate deflate_medium related code to deflate_medium.c
* Separate deflate_slow related code to deflate_slow.c