zlib-ng

mirror of https://github.com/GerbilSoft/zlib-ng.git synced 2025-06-18 11:35:35 -04:00

Author	SHA1	Message	Date
Adam Stylinski	0ed5ac8289	Make an AVX512 inflate fast with low cost masked writes This takes advantage of the fact that on AVX512 architectures, masked moves are incredibly cheap. There are many places where we have to fallback to the safe C implementation of chunkcopy_safe because of the assumed overwriting that occurs. We're to sidestep most of the branching needed here by simply controlling the bounds of our writes with a mask.	2024-11-20 22:14:44 +01:00
Adam Stylinski	94aacd8bd6	Try to simply the inflate loop by collapsing most cases to chunksets	2024-10-23 21:20:11 +02:00
Hans Kristian Rosbach	dae668dbff	Reorder variables in inflate functions to reduce padding holes due to variable alignment requirements.	2024-10-10 13:22:50 +02:00
Hans Kristian Rosbach	a5c20ed67e	Add variable 'wbufsize' to track window buffer including padding, to allow the chunkset code to spill garbage data into the padding area if available.	2024-10-08 15:51:12 +02:00
Hans Kristian Rosbach	39e9c86ec0	Don't use 'dmax' and 'sane' variables unless their checks have been compiled in.	2024-10-08 15:51:12 +02:00
Adam Stylinski	3297953f81	Compute the "safe" distance properly The safe pointer that is computed is an exclusive, not inclusive bounds. While we were probably rarely ever bit this, if ever, it still makes sense to apply the limit, properly.	2024-10-08 12:43:01 +02:00
Nathan Moinvaziri	72c50edd26	Don't use chunkunroll for inflateBack If the output buffer and the window buffer are the same memory allocation, we cannot make the assumptions that chunkunroll does, that it is okay to overwrite the output buffer.	2024-09-11 10:31:56 +02:00
Dougall Johnson	6a74e9294f	Inflate: add fast-path for literals	2023-02-24 13:24:49 +01:00
Dougall Johnson	3cebd47211	Inflate: refill unconditionally	2023-02-24 13:24:49 +01:00
Nathan Moinvaziri	fa9bfeddcf	Use named defines instead of hard coded numbers.	2023-02-18 20:30:55 +01:00
Nathan Moinvaziri	aa1109bb2e	Use arch-specific versions of inflate_fast. This should reduce the cost of indirection that occurs when calling functable chunk copying functions inside inflate_fast. It should also allow the compiler to optimize the inflate fast path for the specific architecture.	2023-02-05 17:51:46 +01:00

11 Commits