Commit Graph

147 Commits

Author SHA1 Message Date
Cameron Cawley
d7e121e56b Use GCC's may_alias attribute for unaligned memory access 2024-12-24 12:55:44 +01:00
Mika Lindqvist
5b04d9ce04 Enable warning C4242 and treat warnings as errors for Visual C++. 2024-08-22 16:47:59 +02:00
Hans Kristian Rosbach
130055e8d1 Rewrite deflate memory allocation.
Deflate used to call allocate 5 times during init.

- 5 calls to external alloc function now becomes 1
- Handling alignment of allocated buffers is simplified
  - Efforts to align the allocated buffer now needs to happen only once.
  - Individual buffers are ordered so that they have natural sequential alignment.
- Due to reduced losses to alignment, we allocate less memory in total.
- While doing alloc(), we now store pointer to corresponding free(), avoiding crashes
  with applications that incorrectly set alloc/free pointers after running init function.
- Removed need for extra padding after window, chunked reads can now go beyond the window
  buffer without causing a segfault.

Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
2024-05-28 16:35:13 +02:00
Ilya Leoshkevich
05ef29eda5 IBM zSystems DFLTCC: Inline DLFTCC states into zlib states
Currently DFLTCC states are allocated using hook macros, complicating
memory management. Inline them into zlib states and remove the hooks.
2024-05-15 11:28:10 +02:00
Hans Kristian Rosbach
9953f12e21 Move update_hash(), insert_string() and quick_insert_string() out of functable
and remove SSE4.2 and ACLE optimizations. The functable overhead is higher
than the benefit from using optimized functions.
2024-02-23 13:34:10 +01:00
Nathan Moinvaziri
a090529ece Remove deflate_state parameter from update_hash functions. 2024-02-23 13:34:10 +01:00
Pavel P
7745c28dbe Increase alignment from 8 to 16 to avoid warnings with ms compiler
Fixing align attribute, makes ms compiler warn: 'internal_state': Alignment specifier is less than actual alignment (16), and will be ignored.
Increasing alignemnt fixes the warning
2024-02-15 16:13:06 +01:00
Pavel P
0456bce1cc Fix deflate_state alignment with MS or clang-cl compilers
When building with clang-cl, compiler produces the following warning:

zlib-ng/deflate.h(287,3): warning : attribute 'align' is ignored, place it after "struct" to apply attribute to type declaration [-Wignored-attributes]
zlib-ng/zbuild.h(196,34): note: expanded from macro 'ALIGNED_'

Repositioning align attribute after "struct" fixes the warning and aligns `deflate_state` correctly.
2024-02-15 16:13:06 +01:00
Nathan Moinvaziri
8ef6098a65 Enable LIT_MEM by default expect when WITH_REDUCED_MEM is ON. 2024-02-07 19:15:56 +01:00
Hans Wennborg
6345d05782 Fix the copy of pending_buf in deflateCopy() for the LIT_MEM case.
madler/zlib#60c31985ecdc2b40873564867e1ad2aef0b88697
2024-02-07 19:15:56 +01:00
Mark Adler
a3fb271c6e Add LIT_MEM define to use more memory for a small deflate speedup.
A bug fix in zlib 1.2.12 resulted in a slight slowdown (1-2%) of
deflate. This commit provides the option to #define LIT_MEM, which
uses more memory to reverse most of that slowdown. The memory for
the pending buffer and symbol buffers is increased by 25%, which
increases the total memory usage with the default parameters by
about 6%.

madler/zlib#ac8f12c97d1afd9bafa9c710f827d40a407d3266
2024-02-07 19:15:56 +01:00
Hans Kristian Rosbach
06895bc1b3 Move crc32 C fallbacks to arch/generic 2024-01-19 15:22:34 +01:00
Hans Kristian Rosbach
4e132cc0ec Move adler32 C fallbacks to arch/generic 2024-01-19 15:22:34 +01:00
Nathan Moinvaziri
e9a48a2ecb Simplify deflate stream/state check. 2023-08-06 10:20:43 +02:00
Nathan Moinvaziri
fa9bfeddcf Use named defines instead of hard coded numbers. 2023-02-18 20:30:55 +01:00
Nathan Moinvaziri
b047c7247f Prefix shared functions to prevent symbol conflict when linking native api against compat api. 2023-01-09 15:10:11 +01:00
Nathan Moinvaziri
e22195e5bc Don't use unaligned access for memcpy instructions due to GCC 11 assuming it is aligned in certain instances. 2022-08-17 14:41:18 +02:00
Tobias Stoeckmann
3f7b0b411d Extend GZIP conditional
If gzip support has been disabled during compilation then also
consider gzip relevant states as invalid in deflateStateCheck.

Also the gzip state definitions can be removed.

This change leads to failure in test/example, and I am not sure
what the GZIP conditional is trying to achieve. All gzip related
functions are still defined in zlib.h

Alternative approach is to remove the GZIP define.
2022-06-16 14:08:44 +02:00
Adam Stylinski
d79984b5bc Adding avx512_vnni inline + copy elision
Interesting revelation while benchmarking all of this is that our
chunkmemset_avx seems to be slower in a lot of use cases than
chunkmemset_sse.  That will be an interesting function to attempt to
optimize.

Right now though, we're basically beating google for all PNG decode and
encode benchmarks.  There are some variations of flags that can
basically have us trading blows, but we're about as much as 14% faster
than chromium's zlib patches.

While we're here, add a more direct benchmark of the folded copy method
versus the explicit copy + checksum.
2022-05-23 16:13:39 +02:00
Adam Stylinski
b8269bb7d4 Added inlined AVX512 adler checksum + copy
While we're here, also simplfy the "fold" signature, as reducing the
number of rebases and horizontal sums did not prove to be meaningfully
faster (slower in many circumstances).
2022-05-23 16:13:39 +02:00
Adam Stylinski
21f461e238 Adding an SSE42 optimized copy + adler checksum implementation
We are protecting its usage around a lot of preprocessor macros as the
other methods are not yet implemented and calling this version bypasses
the faster adler implementations implicitly.

When more versions are written for faster vectorizations, the functable
entries will be populated and preprocessor macros removed. This round,
the copy + checksum is not employing as many tricks as one would hope
with a "folded" checksum routine.  The reason for this is the
particularly tricky case of dealing with unaligned buffers.  The
implementations which don't have CPUs in the mix that have a huge
penalty for unaligned loads will have a much faster implementation.

Fancier methods that minimized rebasing, while having the potential to
be faster, ended up being slower because the compiler structured the
code in a way that ended up either spilling to the stack or trampolining
out of a loop and back in it instead of just jumping over the first load
and store.

Revisiting this for AVX512, where more registers are abundant and more
advanced loads exist, may be prudent.
2022-05-23 16:13:39 +02:00
Ilya Leoshkevich
9be98893aa Use PREFIX() for some of the Z_INTERNAL symbols
https://github.com/powturbo/TurboBench links zlib and zlib-ng into the
same binary, causing non-static symbol conflicts. Fix by using PREFIX()
for flush_pending(), bi_reverse(), inflate_ensure_window() and all of
the IBM Z symbols.

Note: do not use an explicit zng_, since one of the long-term goals is
to be able to link two versions of zlib-ng into the same binary for
benchmarking [1].

[1] https://github.com/zlib-ng/zlib-ng/pull/1248#issuecomment-1096648932
2022-04-27 10:37:43 +02:00
Nathan Moinvaziri
363a95fb9b Introduce zmemcpy to use unaligned access for architectures we know support unaligned access, otherwise use memcpy. 2022-02-10 16:10:48 +01:00
Nathan Moinvaziri
91bc814e39 Clean up crc32_fold structure and clearly define the size of the fold buffer. 2022-01-17 09:11:53 +01:00
Nathan Moinvaziri
5bc87f1581 Use memcpy for unaligned reads.
Co-authored-by: Matija Skala <mskala@gmx.com>
2022-01-08 14:33:19 +01:00
Nathan Moinvaziri
d802e8900f Move crc32 folding functions into functable. 2021-08-13 15:05:34 +02:00
Nathan Moinvaziri
d06be1bcd1 Don't define HASH_SIZE if it is already defined. 2021-06-26 08:23:26 +02:00
Nathan Moinvaziri
1c766dbf67 Setup hash functions to be switched based on compression level. 2021-06-25 20:09:14 +02:00
Nathan Moinvaziri
6948789969 Added rolling hash functions for hash table. 2021-06-25 20:09:14 +02:00
Nathan Moinvaziri
857e4f1e04 Added Z_UNUSED define for ignore unused variables. 2021-06-18 09:16:44 +02:00
Hans Kristian Rosbach
cf9127a231 Separate MIN_MATCH into STD_MIN_MATCH and WANT_MIN_MATCH
Rename MAX_MATCH to STD_MAX_MATCH
2021-06-13 20:55:01 +02:00
Nathan Moinvaziri
1118fa8ecc Change bi_reverse to use a bit-twiddling hack for 240x speed improvement. 2021-06-13 11:38:09 +02:00
Hans Kristian Rosbach
d81f8cf04a Add extra space in deflate internal_state struct for future expansion.
Also make internal_state struct have a static size regardless of what features have been activated.
Internal_state is now always 6040 bytes on Linux/x86-64, and 5952 bytes on Linux/x86-32.
2021-01-27 14:07:22 +01:00
Nathan Moinvaziri
1c58894c67 Remove NIL preprocessor macro which isn't consistently enforced. 2020-09-23 17:00:11 +02:00
Hans Kristian Rosbach
de160d6585 Minor comments/whitespace cleanup 2020-08-31 13:22:54 +02:00
Hans Kristian Rosbach
e9a8fa9af3 Reorder s->block_open and s->reproducible. 2020-08-31 13:22:54 +02:00
Hans Kristian Rosbach
b55a8cad38 Remove s->method since it is always set to the same value and never read. 2020-08-31 13:22:54 +02:00
Hans Kristian Rosbach
a7cd452ca9 Move and reduce size of s->pending_buf_size 2020-08-31 13:22:54 +02:00
Nathan Moinvaziri
7cffba4dd6 Rename ZLIB_INTERNAL to Z_INTERNAL for consistency. 2020-08-31 12:33:16 +02:00
Hans Kristian Rosbach
6264b5a58d Fix more conversion warnings related to s->bi_valid, stored_len and misc. 2020-08-27 19:20:38 +02:00
Hans Kristian Rosbach
0a7acaab6e Changes to deflate's internal_state struct members:
- Change window_size from unsigned long to unsigned int
- Change block_start from long to int
- Change high_water from unsigned long to unsigned int
- Reorder to promote cache locality in hot code and decrease holes.

On x86_64 this means the struct goes from:
        /* size: 6008, cachelines: 94, members: 57 */
        /* sum members: 5984, holes: 6, sum holes: 24 */
        /* last cacheline: 56 bytes */

To:
        /* size: 5984, cachelines: 94, members: 57 */
        /* sum members: 5972, holes: 3, sum holes: 8 */
        /* padding: 4 */
        /* last cacheline: 32 bytes */
2020-08-27 19:20:38 +02:00
Hans Kristian Rosbach
cbc3962b93 Increase hash table size from 15 to 16 bits.
This gives a good performance increase, and usually also improves compression.
Make separate define HASH_SLIDE for fallback version of UPDATE_HASH.
2020-08-23 09:57:45 +02:00
Hans Kristian Rosbach
e7bb6db09a Replace hash_bits, hash_size and hash_mask with defines. 2020-08-23 09:57:45 +02:00
Nathan Moinvaziri
b0a3461245 Use unaligned 32-bit and 64-bit compare based on best match length when searching for matches.
Move TRIGGER_LEVEL to match_tpl.h since it is only used in longest match.
Use early return inside match loops instead of cont variable.
Added back two variable check for platforms that don't supported unaligned access.
2020-08-23 09:56:11 +02:00
NiLuJe
9fccbde10c Prevent unaligned double word access on ARMv7 in put_uint64
By implementing a (UNALIGNED_OK && !UNALIGNED64_OK) codepath.
2020-08-20 12:04:56 +02:00
Nathan Moinvaziri
a0fa24f92f Remove IPos typedef which also helps to reduce casting warnings. 2020-05-30 21:29:44 +02:00
Nathan Moinvaziri
07207681ed Simplify generic hash function using knuth's multiplicative hash. 2020-05-24 14:32:26 +02:00
Nathan Moinvaziri
600dcc3012 Use 64-bit bit buffer when emitting codes. 2020-05-24 14:06:57 +02:00
Hans Kristian Rosbach
6884f3715c Remove several NOT_TWEAK_COMPILER checks and their legacy code. 2020-05-06 10:00:11 +02:00
Nathan Moinvaziri
e3c858c2c7 Split tree emitting code into its own source header to be included by both trees.c and deflate_quick.c so that their functions can be statically linked for performance reasons. 2020-05-06 09:39:52 +02:00