improves performance of inflate by up to 6% on an A-73 Hikey running at 2.36 GHz
when executing the chromium benchmark on the snappy data set. In a few cases
inflate is slower by up to 0.8%. Overall performance of inflate is better by
about 0.3%.
Performance benchmarks have so far not shown that any platform benefits from UNROLL_MORE,
although this might be beneficial on older compilers/cpus or for compiling without optimizations.
The extra UNROLL_MORE code should be considered for removal since it is never enabled by us
and will likely only serve to confuse and contribute to bitrot.
Move decrement in loop to avoid the following errors:
adler32.c:91:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
adler32.c:136:19: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'size_t' (aka 'unsigned long')
inflate.c:972:32: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
Fix the following bugs as recommended by Mika Lindqvist:
arch/x86/deflate_quick.c:233:22: runtime error: unsigned integer overflow: 0 - 1 cannot be represented in type 'unsigned int'
arch/x86/fill_window_sse.c:52:28: runtime error: unsigned integer overflow: 1 - 8192 cannot be represented in type 'unsigned int'
to co-exist in an application that has been linked to something that
depends on stock zlib. Previously, that would cause random problems
since there is no way to guarantee what zlib version is being used
for each dynamically linked function.
Add the corresponding zlib-ng.h.
Tests, example and minigzip will not compile before they have been
adapted to use the correct functions as well.
Either duplicate them, so we have minigzip-ng.c for example, or add
compile-time detection in the source code.
- Split functableInit() function as separate functions for each functable member, so we don't need to initialize full functable in multiple places in the zlib-ng code, or to check for NULL on every invocation.
- Optimized function for each functable member is detected on first invocation and the functable item is updated for subsequent invocations.
- Remove NULL check in adler32() and adler32_z() as it is no longer needed.
* add adler32_neon to main dependency checking and ARM/Windows Makefile
* split non-optimized adler32 to adler32_c so we can test/compare both without recompiling.
* add detection of default floating point ABI in gcc
NOTE: This should avoid build error when gcc supports both ABIs but header for just one ABI is installed.
The checksum is calculated in the uncompressed PNG data and can be
made much faster by using SIMD. Tests in ARMv8 yielded an improvement
of about 3x (e.g. walltime was 350ms x 125ms for a 4096x4096 bytes
executed 30 times).
This yields an improvement in image decoding in Chromium around 18%
(see https://bugs.chromium.org/p/chromium/issues/detail?id=688601).
Excessive loop unrolling is detrimental to performance. This patch
adds a preprocessor define, ADLER32_UNROLL_LESS, to reduce unrolling
factor from 16 to 8.
Updates configure script to set as default on x86