So a lot of alterations had to be done to make this not worse and
so far, it's not really better, either. I had to force inlining for
the adler routine, I had to remove the x4 load instruction otherwise
pipelining stalled, and I had to use restrict pointers with a copy
idiom for GCC to inline a copy routine for the tail.
Still, we see a small benefit in benchmarks, particularly when done
with size of our window or larger. There's also an added benefit that
this will fix#1824.
- Remove obsolete checks
- Fix checks that are inconsistent
- Stop compiling compare256/longest_match variants that never gets called
- Improve how the generic compare256 functions are handled.
- Allow overriding OPTIMAL_CMP
This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.
No longer do the big iron on yore which lack SIMD optimized loads need
to search strings a byte at a time like primitive machines of the vax
era. This guard here was mostly due to the fact that the string
comparison was searched with "count trailing zero", which assumes an
endianness. We can just conditionally use leading zeros when on big
endian and stop using the extremely naive C implementation. This makes
things a tad bit faster.
Deflate used to call allocate 5 times during init.
- 5 calls to external alloc function now becomes 1
- Handling alignment of allocated buffers is simplified
- Efforts to align the allocated buffer now needs to happen only once.
- Individual buffers are ordered so that they have natural sequential alignment.
- Due to reduced losses to alignment, we allocate less memory in total.
- While doing alloc(), we now store pointer to corresponding free(), avoiding crashes
with applications that incorrectly set alloc/free pointers after running init function.
- Removed need for extra padding after window, chunked reads can now go beyond the window
buffer without causing a segfault.
Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
Currently the DFLTCC sanitizer instrumentation is limited to
MSAN-unpoisoning the parameter block. Add ASAN and MSAN checks;
also MSAN-unpoison the window.
Introduce the generic instrument_read(), instrument_write() and
instrument_read_write() macros, that are modeled after the repsective
functions in the Linux kernel.
ARM64EC is a new ARM64 variant introduced in Windows 11 that uses an
ABI similar to AMD64, which allows for better interoperability with
emulated AMD64 applications. When enabled in MSVC, it defines _M_AMD64
and _M_ARM64EC, but not _M_ARM64, so we need to check for _M_ARM64EC.
Currently all the usages of __msan_unpoison() have to be guarded by
"#ifdef Z_MEMORY_SANITIZER". Simplify things by defining an empty
__msan_unpoison() when the code is compiled without MSan.
- Zlib Compat: Move definition of z_size_t to zconf.h, so it is exported to applications.
Always defined as size_t to follow zlib 1.2.13 behavior with STDC compilers.
- Zlib-NG: Keeps internal definition of z_size_t in zbuild.h
Check for compiler support in CMake and the configure script. This
allows ALIGNED_ to be defined for more compilers so that more than
just Clang, GCC and MSVC can build the project.
Google Test uses strdup(), which makes building tests fail on a fresh
MSYS2 setup:
In file included from zlib-ng/_deps/googletest-src/googletest/include/gtest/internal/gtest-internal.h:40,
from zlib-ng/_deps/googletest-src/googletest/include/gtest/gtest.h:62,
from zlib-ng/test/test_compress.cc:17:
zlib-ng/_deps/googletest-src/googletest/include/gtest/internal/gtest-port.h: In function ‘char* testing::internal::posix::StrDup(const char*)’:
zlib-ng/_deps/googletest-src/googletest/include/gtest/internal/gtest-port.h:2046:47: error: ‘strdup’ was not declared in this scope; did you mean ‘StrDup’?
2046 | inline char* StrDup(const char* src) { return strdup(src); }
| ^~~~~~
| StrDup
Bump _POSIX_C_SOURCE to enable this function. An alternative solution
would be to define _POSIX_C_SOURCE in test/CMakeLists.txt, but having a
bigger value for zlib-ng itself should not hurt.
Include zbuild.h earlier in minideflate.c in order to make the new
setting take effect for this file.
zbuild.h is included from every .c file of zlib-ng, which forces every translation unit to parse all windows system includes only to be able to typedef ssize_t. This change removes windows.h include from zbuild.h and ssize_t is instead defined in-line with equivalent defines from windows.h