Commit Graph

32 Commits

Author SHA1 Message Date
Cameron Cawley
d7e121e56b Use GCC's may_alias attribute for unaligned memory access 2024-12-24 12:55:44 +01:00
Adam Stylinski
94aacd8bd6 Try to simply the inflate loop by collapsing most cases to chunksets 2024-10-23 21:20:11 +02:00
Adam Stylinski
b80eb4c6ec Simplify chunking in the copy ladder here
As it turns out, trying to peel off the remainder with so many branches
caused the code size to inflate a bit too much that this function
wouldn't inline without some fairly aggressive optimization flags. Only
catching vector sized chunks here makes the loop body small enough and
having the byte by byte copy idiom at the bottom gives the compiler some
flexibility that it is likely to do something there.
2024-09-26 08:22:03 +02:00
Hans Kristian Rosbach
e024068dac Revert "Split chunkcopy_safe to allow the first part to be inlined more often."
This reverts commit 6b8efe7868.

New and improved chunkcopy_safe is coming soon.
2024-09-17 14:12:24 +02:00
Hans Kristian Rosbach
6b8efe7868 Split chunkcopy_safe to allow the first part to be inlined more often. 2024-09-13 12:48:43 +02:00
Hans Kristian Rosbach
63e1d460aa Rewrite inflate memory allocation.
Inflate used to allocate state during init, but window would be allocated
when/if needed and could be resized and that required a new free/alloc round.

- Now, we allocate state and a 32K window during init, allowing the latency cost
  of allocs to be done during init instead of at one or more times later.
- Total memory allocation is about the same when requesting a 32K window, but
  if now window or a smaller window was requested, then it is an increase.
- While doing alloc(), we now store pointer to corresponding free(), avoiding crashes
  with applications that incorrectly set alloc/free pointers after running init function.
- After init has succeeded, inflate will no longer possibly fail due to a failing malloc.

Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>
2024-05-28 16:35:13 +02:00
Ilya Leoshkevich
7a55ec9aca Prepare DFLTCC changes for new malloc system 2024-05-28 16:35:13 +02:00
Ilya Leoshkevich
05ef29eda5 IBM zSystems DFLTCC: Inline DLFTCC states into zlib states
Currently DFLTCC states are allocated using hook macros, complicating
memory management. Inline them into zlib states and remove the hooks.
2024-05-15 11:28:10 +02:00
Vladislav Shchapov
fe0a6407da Explicitly indicate functions are conditionally dispatched
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-03-06 23:32:15 +01:00
Fabian Vogt
e5a4a8ae6b Handle complete overlap in chunkcopy_safe
Fixes #1525
2023-06-28 17:22:11 +02:00
Dougall Johnson
6a74e9294f Inflate: add fast-path for literals 2023-02-24 13:24:49 +01:00
Nathan Moinvaziri
aa1109bb2e Use arch-specific versions of inflate_fast.
This should reduce the cost of indirection that occurs when calling functable
chunk copying functions inside inflate_fast. It should also allow the compiler
to optimize the inflate fast path for the specific architecture.
2023-02-05 17:51:46 +01:00
Pavel P
7659b386a6 Make sure inflate_p.h is fully guarded by header guard 2023-01-20 00:29:32 +01:00
Ilya Leoshkevich
3eab3173ac IBM zSystems DFLTCC: Support inflate with small window
There is no hardware control for DFLTCC window size, and because of
that supporting small windows for deflate is not trivial: one has to
make sure that DFLTCC does not emit large distances, which most likely
entails somehow trimming the window and/or input in order to make sure
that whave + avail_in <= wsize.

But inflate is much easier: one only has to allocate enough space. Do
that in dfltcc_alloc_window(), and also introduce ZCOPY_WINDOW() in
order to copy everything, not just what the software implementation
cares about.

After this change, software and hardware window formats no longer
match: the software will use wbits and wsize, and the hardware will use
HB_BITS and HB_SIZE. Unlike deflate, inflate does not switch between
software and hardware implementations mid-stream, which leaves only
inflateSetDictionary() and inflateGetDictionary() interesting.
2022-12-11 12:03:12 +01:00
Nathan Moinvaziri
e22195e5bc Don't use unaligned access for memcpy instructions due to GCC 11 assuming it is aligned in certain instances. 2022-08-17 14:41:18 +02:00
Nathan Moinvaziri
7092c2be68 Fixed inflate size conversion warning in chunkcopy_safe.
inflate_p.h(142,27): warning C4244: 'function': conversion from 'uint64_t' to 'size_t', possible loss of data
2022-06-13 15:58:25 +02:00
Adam Stylinski
ef0cf5ca17 Improved chunkset substantially where it's heavily used
For most realistic use cases, this doesn't make a ton of difference.
However, for things which are highly compressible and enjoy very large
run length encodes in the window, this is a huge win.

We leverage a permutation table to swizzle the contents of the memory
chunk into a vector register and then splat that over memory with a fast
copy loop.

In essence, where this helps, it helps a lot.  Where it doesn't, it does
no measurable damage to the runtime.

This commit also simplifies a chunkcopy_safe call for determining a
distance.  Using labs is enough to give the same behavior as before,
with the added benefit that no predication is required _and_, most
importantly, static analysis by GCC's string fortification can't throw a
fit because it conveys better to the compiler that the input into
builtin_memcpy will always be in range.
2022-05-23 16:13:29 +02:00
Ilya Leoshkevich
c592b1b332 IBM Z DFLTCC: Split deflate and inflate states
Currently deflate and inflate both use a common state struct. There are
several variables in this struct that we don't need for inflate, and
more may be coming in the future. Therefore split them in two separate
structs. This in turn requires splitting ZALLOC_STATE and ZCOPY_STATE
macros.
2022-04-28 12:01:57 +02:00
Nathan Moinvaziri
41faa0843d Fixed clang signed/unsigned warning in chunkcopy_safe.
inflate_p.h:159:18: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'size_t' (aka 'unsigned long') [-Wsign-compare]
        tocopy = MIN(non_olap_size, len);
                 ^   ~~~~~~~~~~~~~  ~~~
zbuild.h:74:24: note: expanded from macro 'MIN'
#define MIN(a, b) ((a) > (b) ? (b) : (a))
                    ~  ^  ~
2022-03-29 13:14:18 +02:00
Nathan Moinvaziri
d979b89e00 Fixed MSVC warnings in chunkcopy_safe.
inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data
inffast.c
inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data
inflate.c
inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch
inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data
2022-03-27 19:17:21 +02:00
Nathan Moinvaziri
6c4beb611d Revert "Reorganize inflate window layout"
This reverts commit dc3b60841d.
2022-03-23 11:30:35 +01:00
Nathan Moinvaziri
b5af5013d9 Revert "Add back original version of inflate_fast for use with inflateBack."
This reverts commit 2d2dde43b1.
2022-03-23 11:30:35 +01:00
Nathan Moinvaziri
097f789fa2 Revert "DFLTCC update for window optimization from Jim & Nathan"
This reverts commit b4ca25afab.
2022-03-23 11:30:35 +01:00
Adam Stylinski
49a6bb5d41 Speed up chunkcopy and memset
This was found to have a significant impact on a highly compressible PNG
for both the encode and decode.  Some deltas show performance improving
as much as 60%+.

For the scenarios where the "dist" is not an even modulus of our chunk
size, we simply repeat the bytes as many times as possible into our
vector registers.  We then copy the entire vector and then advance the
quotient of our chunksize divided by our dist value.

If dist happens to be 1, there's no reason to not just call memset from
libc (this is likely to be just as fast if not faster).
2022-03-16 11:42:19 +01:00
Nathan Moinvaziri
5b0ffa63f2 Added checks and comments to ensure that when using raw mode no checksumming takes place. 2021-12-09 14:28:34 +01:00
Ilya Leoshkevich
b4ca25afab DFLTCC update for window optimization from Jim & Nathan
Stop relying on software and hardware inflate window formats being the
same and act the way we already do for deflate: provide and implement
window-related hooks.

Another possibility would be to use an in-line history buffer (by not
setting HBT_CIRCULAR), but this would require an extra memmove().

Also fix a couple corner cases in the software implementation of
inflateGetDictionary() and inflateSetDictionary().
2021-12-02 09:26:32 +01:00
Nathan Moinvaziri
2d2dde43b1 Add back original version of inflate_fast for use with inflateBack. 2021-12-02 09:26:32 +01:00
Jim Kukunas
dc3b60841d Reorganize inflate window layout
This commit significantly improves inflate performance by reorganizing the window buffer into a contiguous window and pending output buffer. The goal of this layout is to reduce branching, improve cache locality, and enable for the use of crc folding with gzip input.

The window buffer is allocated as a multiple of the user-selected window size. In this commit, a factor of 2 is utilized.

The layout of the window buffer is divided into two sections. The first section, window offset [0, wsize), is reserved for history that has already been output. The second section, window offset [wsize, 2 * wsize), is reserved for buffering pending output that hasn't been flushed to the user's output buffer yet.

The history section grows downwards, towards the window offset of 0. The pending output section grows upwards, towards the end of the buffer. As a result, all of the possible distance/length data that may need to be copied is contiguous. This removes the need to stitch together output from 2 separate buffers.

In the case of gzip input, crc folding is used to copy the pending output to the user's buffers.

Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
2021-12-02 09:26:32 +01:00
Hans Kristian Rosbach
6ce39348ba inflate: add SET_BAD macro, to make inflate.c a little cleaner. 2020-10-24 15:51:46 +02:00
Mika Lindqvist
28e5e73f34
Reintroduce support for ZLIB_CONST in compat mode. (#704)
* Reintroduce support for ZLIB_CONST in compat mode.
2020-08-23 09:58:57 +02:00
Nathan Moinvaziri
feff87a53e Fixed compiler warning when using BITS macro. 2020-02-07 18:42:03 +01:00
Hans Kristian Rosbach
d8eedcfa3e Deduplicate common inflate/inflatefast/inflateBack macros into inflate_p.h 2019-08-06 09:39:26 +02:00