Commit Graph

51 Commits

Author SHA1 Message Date
Mika Lindqvist
43b2703435 Fix shift overflow in inflate and send_code. 2025-02-08 21:43:51 +01:00
Hans Kristian Rosbach
18af70057a Reorder 'inflate_state' struct to improve cache-locality of variables
needed by inffast (from 6 cachelines to 1).
Also fill in some unnecessary holes.
2024-10-08 15:51:12 +02:00
Hans Kristian Rosbach
a5c20ed67e Add variable 'wbufsize' to track window buffer including padding, to allow
the chunkset code to spill garbage data into the padding area if available.
2024-10-08 15:51:12 +02:00
Hans Kristian Rosbach
39e9c86ec0 Don't use 'dmax' and 'sane' variables unless their checks have been compiled in. 2024-10-08 15:51:12 +02:00
Hans Kristian Rosbach
e024068dac Revert "Split chunkcopy_safe to allow the first part to be inlined more often."
This reverts commit 6b8efe7868.

New and improved chunkcopy_safe is coming soon.
2024-09-17 14:12:24 +02:00
Hans Kristian Rosbach
6b8efe7868 Split chunkcopy_safe to allow the first part to be inlined more often. 2024-09-13 12:48:43 +02:00
Mika Lindqvist
5b04d9ce04 Enable warning C4242 and treat warnings as errors for Visual C++. 2024-08-22 16:47:59 +02:00
Hans Kristian Rosbach
037c6f84b5 Simplify inflate window management now that there is no need to
worry about failed allocs other than during init.
2024-05-30 13:59:40 +02:00
Hans Kristian Rosbach
63e1d460aa Rewrite inflate memory allocation.
Inflate used to allocate state during init, but window would be allocated
when/if needed and could be resized and that required a new free/alloc round.

- Now, we allocate state and a 32K window during init, allowing the latency cost
  of allocs to be done during init instead of at one or more times later.
- Total memory allocation is about the same when requesting a 32K window, but
  if now window or a smaller window was requested, then it is an increase.
- While doing alloc(), we now store pointer to corresponding free(), avoiding crashes
  with applications that incorrectly set alloc/free pointers after running init function.
- After init has succeeded, inflate will no longer possibly fail due to a failing malloc.

Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
2024-05-28 16:35:13 +02:00
Ilya Leoshkevich
05ef29eda5 IBM zSystems DFLTCC: Inline DLFTCC states into zlib states
Currently DFLTCC states are allocated using hook macros, complicating
memory management. Inline them into zlib states and remove the hooks.
2024-05-15 11:28:10 +02:00
Hans Kristian Rosbach
06895bc1b3 Move crc32 C fallbacks to arch/generic 2024-01-19 15:22:34 +01:00
Hans Kristian Rosbach
4e132cc0ec Move adler32 C fallbacks to arch/generic 2024-01-19 15:22:34 +01:00
Nathan Moinvaziri
b047c7247f Prefix shared functions to prevent symbol conflict when linking native api against compat api. 2023-01-09 15:10:11 +01:00
Nathan Moinvaziri
d43822b9a7 zlib 1.2.12 2022-06-13 15:58:03 +02:00
Adam Stylinski
d79984b5bc Adding avx512_vnni inline + copy elision
Interesting revelation while benchmarking all of this is that our
chunkmemset_avx seems to be slower in a lot of use cases than
chunkmemset_sse.  That will be an interesting function to attempt to
optimize.

Right now though, we're basically beating google for all PNG decode and
encode benchmarks.  There are some variations of flags that can
basically have us trading blows, but we're about as much as 14% faster
than chromium's zlib patches.

While we're here, add a more direct benchmark of the folded copy method
versus the explicit copy + checksum.
2022-05-23 16:13:39 +02:00
Adam Stylinski
b8269bb7d4 Added inlined AVX512 adler checksum + copy
While we're here, also simplfy the "fold" signature, as reducing the
number of rebases and horizontal sums did not prove to be meaningfully
faster (slower in many circumstances).
2022-05-23 16:13:39 +02:00
Adam Stylinski
b1389ac2d5 Create adler32_fold_c* functions
These are very simple wrappers that do nothing clever but serve as a
shim interface for implementing versions which do cleverly track the
number of scalar sums performed so that we can minimize rebasing and
also have an efficient copy elision.

This serves as the baseline as each vectorization gets its own commit.
That way the PR will be bisectable.
2022-05-23 16:13:39 +02:00
Ilya Leoshkevich
9be98893aa Use PREFIX() for some of the Z_INTERNAL symbols
https://github.com/powturbo/TurboBench links zlib and zlib-ng into the
same binary, causing non-static symbol conflicts. Fix by using PREFIX()
for flush_pending(), bi_reverse(), inflate_ensure_window() and all of
the IBM Z symbols.

Note: do not use an explicit zng_, since one of the long-term goals is
to be able to link two versions of zlib-ng into the same binary for
benchmarking [1].

[1] https://github.com/zlib-ng/zlib-ng/pull/1248#issuecomment-1096648932
2022-04-27 10:37:43 +02:00
Adam Stylinski
8550a90de4 Leverage inline CRC + copy
This brings back a bit of the performance that may have been sacrificed
by reverting the reorganized inflate window. Doing a copy at the same
time as a CRC is basically free.
2022-03-31 16:11:15 +02:00
Nathan Moinvaziri
6c4beb611d Revert "Reorganize inflate window layout"
This reverts commit dc3b60841d.
2022-03-23 11:30:35 +01:00
Nathan Moinvaziri
91bc814e39 Clean up crc32_fold structure and clearly define the size of the fold buffer. 2022-01-17 09:11:53 +01:00
Jim Kukunas
dc3b60841d Reorganize inflate window layout
This commit significantly improves inflate performance by reorganizing the window buffer into a contiguous window and pending output buffer. The goal of this layout is to reduce branching, improve cache locality, and enable for the use of crc folding with gzip input.

The window buffer is allocated as a multiple of the user-selected window size. In this commit, a factor of 2 is utilized.

The layout of the window buffer is divided into two sections. The first section, window offset [0, wsize), is reserved for history that has already been output. The second section, window offset [wsize, 2 * wsize), is reserved for buffering pending output that hasn't been flushed to the user's output buffer yet.

The history section grows downwards, towards the window offset of 0. The pending output section grows upwards, towards the end of the buffer. As a result, all of the possible distance/length data that may need to be copied is contiguous. This removes the need to stitch together output from 2 separate buffers.

In the case of gzip input, crc folding is used to copy the pending output to the user's buffers.

Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
2021-12-02 09:26:32 +01:00
Hans Kristian Rosbach
234b282cb9 Small formatting changes in inflate.c, inflate.h and inffast.c 2020-10-24 15:51:46 +02:00
Nathan Moinvaziri
7cffba4dd6 Rename ZLIB_INTERNAL to Z_INTERNAL for consistency. 2020-08-31 12:33:16 +02:00
Nathan Moinvaziri
10be7c55f6 Only calculate inflate chunk size once and store it for future use for performance. 2020-06-28 11:16:05 +02:00
Mark Adler
33ce336b82 Don't bother computing check value after successful inflateSync().
inflateSync() is used to skip invalid deflate data, which means
that the check value that was being computed is no longer useful.
This commit turns off the check value computation, and furthermore
allows a successful return if the compressed data terminated in a
graceful manner. This commit also fixes a bug in the case that
inflateSync() is used before a header is ever processed. In that
case, there is no knowledge of a trailer, so the remainder is
treated as raw.
2019-10-22 09:57:38 +02:00
Hans Kristian Rosbach
4bc6ffa41a Deduplicate inflate's fixedtables(), and no longer inline the inffixed tables.
This also reduces the library size by 4120bytes or ~2.9%.
2019-08-06 09:39:26 +02:00
Ilya Leoshkevich
b7f659f2fa Introduce inflate_ensure_window, make bi_reverse and flush_pending ZLIB_INTERNAL 2019-05-23 12:44:59 +02:00
Mika Lindqvist
aff0fc6e3c Adapt code to support PREFIX macros and update build scripts 2018-01-31 10:45:29 +01:00
Mark Adler
2a51c84f6c zlib 1.2.9 2017-02-09 11:39:40 +01:00
Mark Adler
6d55bd6a78 Do a more thorough check of the state for every stream call.
This verifies that the state has been initialized, that it is the
expected type of state, deflate or inflate, and that at least the
first several bytes of the internal state have not been clobbered.
2017-02-01 12:22:21 +01:00
Mark Adler
9efc516587 Add option to not compute or check check values.
The undocumented (except in these commit comments) function
inflateValidate(strm, check) can be called after an inflateInit(),
inflateInit2(), or inflateReset2() with check equal to zero to
turn off the check value (CRC-32 or Adler-32) computation and
comparison. Calling with check not equal to zero turns checking
back on. This should only be called immediately after the init or
reset function. inflateReset() does not change the state, so a
previous inflateValidate() setting will remain in effect.

This also turns off validation of the gzip header CRC when
present.

This should only be used when a zlib or gzip stream has already
been checked, and repeated decompressions of the same stream no
longer need to be validated.
2017-01-31 11:01:32 +01:00
Mark Adler
2dd8ad328a Correct the size of the inflate state in the comments. 2017-01-31 10:55:05 +01:00
Mika Lindqvist
9c3a280877 Type cleanup. 2015-12-14 11:00:22 +02:00
Hans Kristian Rosbach
8145354217 Revert "Replace 'unsigned long' with most suitable fixed-size type."
This commit was cherry-picked and was not done, resulting in a few
problems with gcc on 64bit windows.

This reverts commit edd7a72e05.

Conflicts:
	arch/x86/crc_folding.c
	arch/x86/fill_window_sse.c
	deflate.c
	deflate.h
	match.c
	trees.c
2015-06-05 18:35:25 +02:00
Hans Kristian Rosbach
2c22452ff2 Style cleanup for inflate code 2015-05-25 23:00:54 +02:00
Mika Lindqvist
edd7a72e05 Replace 'unsigned long' with most suitable fixed-size type. 2015-05-23 22:06:25 +02:00
Hans Kristian Rosbach
62c6d5ec70 Replace unsigned short with uint16_t
Conflicts:
	inflate.h
	inftrees.c
	inftrees.h
	match.c
2015-05-22 22:15:29 +03:00
hansr
0db1040667 Remove FAR definition
Remove a few leftovers from the legacy OS support removal
2014-10-09 13:55:20 +02:00
Mark Adler
d004b04783 zlib 1.2.3.5 2011-09-09 23:26:49 -07:00
Mark Adler
f6194ef39a zlib 1.2.3.4 2011-09-09 23:26:40 -07:00
Mark Adler
639be99788 zlib 1.2.3.3 2011-09-09 23:26:29 -07:00
Mark Adler
b1c19ca6d8 zlib 1.2.3.1 2011-09-09 23:25:27 -07:00
Mark Adler
0484693e17 zlib 1.2.2.2 2011-09-09 23:24:33 -07:00
Mark Adler
9811b53dd9 zlib 1.2.2.1 2011-09-09 23:24:24 -07:00
Mark Adler
4b5a43a219 zlib 1.2.0.5 2011-09-09 23:22:37 -07:00
Mark Adler
086e982175 zlib 1.2.0.4 2011-09-09 23:22:30 -07:00
Mark Adler
8e34b3a802 zlib 1.2.0.2 2011-09-09 23:22:10 -07:00
Mark Adler
7c2a874e50 zlib 1.2.0 2011-09-09 23:21:47 -07:00
Mark Adler
913afb9174 zlib 0.79 2011-09-09 22:52:17 -07:00