Commit Graph

11 Commits

Author SHA1 Message Date
Adam Stylinski
0ed5ac8289 Make an AVX512 inflate fast with low cost masked writes
This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.
2024-11-20 22:14:44 +01:00
Adam Stylinski
94aacd8bd6 Try to simply the inflate loop by collapsing most cases to chunksets 2024-10-23 21:20:11 +02:00
Hans Kristian Rosbach
dae668dbff Reorder variables in inflate functions to reduce padding holes
due to variable alignment requirements.
2024-10-10 13:22:50 +02:00
Hans Kristian Rosbach
a5c20ed67e Add variable 'wbufsize' to track window buffer including padding, to allow
the chunkset code to spill garbage data into the padding area if available.
2024-10-08 15:51:12 +02:00
Hans Kristian Rosbach
39e9c86ec0 Don't use 'dmax' and 'sane' variables unless their checks have been compiled in. 2024-10-08 15:51:12 +02:00
Adam Stylinski
3297953f81 Compute the "safe" distance properly
The safe pointer that is computed is an exclusive, not inclusive bounds.
While we were probably rarely ever bit this, if ever, it still makes
sense to apply the limit, properly.
2024-10-08 12:43:01 +02:00
Nathan Moinvaziri
72c50edd26 Don't use chunkunroll for inflateBack
If the output buffer and the window buffer are the same
memory allocation, we cannot make the assumptions that chunkunroll
does, that it is okay to overwrite the output buffer.
2024-09-11 10:31:56 +02:00
Dougall Johnson
6a74e9294f Inflate: add fast-path for literals 2023-02-24 13:24:49 +01:00
Dougall Johnson
3cebd47211 Inflate: refill unconditionally 2023-02-24 13:24:49 +01:00
Nathan Moinvaziri
fa9bfeddcf Use named defines instead of hard coded numbers. 2023-02-18 20:30:55 +01:00
Nathan Moinvaziri
aa1109bb2e Use arch-specific versions of inflate_fast.
This should reduce the cost of indirection that occurs when calling functable
chunk copying functions inside inflate_fast. It should also allow the compiler
to optimize the inflate fast path for the specific architecture.
2023-02-05 17:51:46 +01:00