- Deflate_quick (level 1), no longer limit window, improves compression.
- Deflate_medium, don't check next position for levels below 5.
- Use deflate_medium instead of deflate_fast for level 3.
- Tweak level 4 to give a more predictable speed/compression tradeoff curve.
Commit bc5915e2de ("Fixed unsigned integer overflow ASAN error when
hash_head > s->strstart.") removed hash_head != 0 checks in fast,
medium and slow deflate, because it improved performance [1].
Unfortunately, the attached test started failing after that.
Apparently, as the comments suggest, the code implicitly relies on
matches with the beginning of the window being skipped. So restore the
check.
[1] https://github.com/zlib-ng/zlib-ng/pull/772#issuecomment-710760300
deflate_medium.c(204,49): warning C4244: '=': conversion from 'unsigned int' to 'uint16_t', possible loss of data
deflate_medium.c(217,59): warning C4244: '=': conversion from 'unsigned int' to 'uint16_t', possible loss of data
deflate_medium.c(238,46): warning C4244: '=': conversion from 'unsigned int' to 'uint16_t', possible loss of data
deflate_medium.c(250,56): warning C4244: '=': conversion from 'unsigned int' to 'uint16_t', possible loss of data
zlib-ng/deflate_medium.c:244:47: runtime error: unsigned integer overflow: 58442 - 58452 cannot be represented in type 'unsigned int'
Co-authored-by: Mika Lindqvist <postmaster@raasu.org>
Co-authored-by: Hans Kristian Rosbach <hk-git@circlestorm.org>
This bug was reported by Danilo Ramos of Eideticom, Inc. It has
lain in wait 13 years before being found! The bug was introduced
in zlib 1.2.2.2, with the addition of the Z_FIXED option. That
option forces the use of fixed Huffman codes. For rare inputs with
a large number of distant matches, the pending buffer into which
the compressed data is written can overwrite the distance symbol
table which it overlays. That results in corrupted output due to
invalid distances, and can result in out-of-bound accesses,
crashing the application.
The fix here combines the distance buffer and literal/length
buffers into a single symbol buffer. Now three bytes of pending
buffer space are opened up for each literal or length/distance
pair consumed, instead of the previous two bytes. This assures
that the pending buffer cannot overwrite the symbol table, since
the maximum fixed code compressed length/distance is 31 bits, and
since there are four bytes of pending space for every three bytes
of symbol space.
==4908==ERROR: MemorySanitizer: SEGV on unknown address 0x730fffffffff (pc 0x0000004b1b97 bp 0x7ffd4bf59a00 sp 0x7ffd4bf598a0 T4908)
==4908==The signal is caused by a READ memory access.
#0 0x5a0599 in fizzle_matches zlib-ng/deflate_medium.c:168:12
#1 0x59ea27 in deflate_medium zlib-ng/deflate_medium.c:296:21
#2 0x5901c5 in zng_deflate zlib-ng/deflate.c:951:18
#3 0x586955 in zng_compress2 zlib-ng/compress.c:59:15
#4 0x5861eb in LLVMFuzzerTestOneInput zlib-ng/test/fuzz/compress_fuzzer.c:18:3
#5 0x4e9b48 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/libfuzzer/FuzzerLoop.cpp:575:15
#6 0x4a2f66 in fuzzer::RunOneTest(fuzzer::Fuzzer*, char const*, unsigned long) /src/libfuzzer/FuzzerDriver.cpp:280:6
#7 0x4b3adb in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/libfuzzer/FuzzerDriver.cpp:715:9
#8 0x4a2091 in main /src/libfuzzer/FuzzerMain.cpp:20:10
#9 0x7fa3d7ff582f in __libc_start_main /build/glibc-Cl5G7W/glibc-2.23/csu/libc-start.c:291
#10 0x41ec68 in _start
The struct contains pointers to select functions to be used by the
rest of zlib, and the init function selects what functions will be
used depending on what optimizations has been compiled in and what
instruction-sets are available at runtime.
Tests done on a haswell cpu running minigzip -6 compression of a
40M file shows a 2.5% decrease in branches, and a 25-30% reduction
in iTLB-loads. The reduction i iTLB-loads is likely mostly due to
the inability to inline functions. This also causes a slight
performance regression of around 1%, this might still be worth it
to make it much easier to implement new optimized functions for
various architectures and instruction sets.
The performance penalty will get smaller for functions that get more
alternative implementations to choose from, since there is no need
to add more branches to every call of the function.
Today insert_string has 1 branch to choose insert_string_sse
or insert_string_c, but if we also add for example insert_string_sse4
then that would have needed another branch, and it would probably
at some point hinder effective inlining too.
it's speed optimization as the inner code also checks that previous hash value
is not same as new hash value. Essentially those two checks together makes the
compression a little more efficient as it can remember matches further apart.
As far as I remember from my tests, the secondary path was triggered only twice
in very long uncompressed file, but the gain in compression rate was still noticeable.
** Partial merge of this commit, based on a8c94e9f5a3b9d3c62182bcf84e72304a3c1a6e5
Excludes changes to fill_window_sse.c, changes to fill_window_c() in deflate.c
and several unrelated changes in the commit.