zlib-ng

mirror of https://github.com/GerbilSoft/zlib-ng.git synced 2025-06-19 12:05:39 -04:00

Author	SHA1	Message	Date
Cameron Cawley	d7e121e56b	Use GCC's may_alias attribute for unaligned memory access	2024-12-24 12:55:44 +01:00
Adam Stylinski	94aacd8bd6	Try to simply the inflate loop by collapsing most cases to chunksets	2024-10-23 21:20:11 +02:00
Adam Stylinski	b80eb4c6ec	Simplify chunking in the copy ladder here As it turns out, trying to peel off the remainder with so many branches caused the code size to inflate a bit too much that this function wouldn't inline without some fairly aggressive optimization flags. Only catching vector sized chunks here makes the loop body small enough and having the byte by byte copy idiom at the bottom gives the compiler some flexibility that it is likely to do something there.	2024-09-26 08:22:03 +02:00
Hans Kristian Rosbach	e024068dac	Revert "Split chunkcopy_safe to allow the first part to be inlined more often." This reverts commit `6b8efe7868`. New and improved chunkcopy_safe is coming soon.	2024-09-17 14:12:24 +02:00
Hans Kristian Rosbach	6b8efe7868	Split chunkcopy_safe to allow the first part to be inlined more often.	2024-09-13 12:48:43 +02:00
Hans Kristian Rosbach	63e1d460aa	Rewrite inflate memory allocation. Inflate used to allocate state during init, but window would be allocated when/if needed and could be resized and that required a new free/alloc round. - Now, we allocate state and a 32K window during init, allowing the latency cost of allocs to be done during init instead of at one or more times later. - Total memory allocation is about the same when requesting a 32K window, but if now window or a smaller window was requested, then it is an increase. - While doing alloc(), we now store pointer to corresponding free(), avoiding crashes with applications that incorrectly set alloc/free pointers after running init function. - After init has succeeded, inflate will no longer possibly fail due to a failing malloc. Co-authored-by: Ilya Leoshkevich <iii@linux.ibm.com>	2024-05-28 16:35:13 +02:00
Ilya Leoshkevich	7a55ec9aca	Prepare DFLTCC changes for new malloc system	2024-05-28 16:35:13 +02:00
Ilya Leoshkevich	05ef29eda5	IBM zSystems DFLTCC: Inline DLFTCC states into zlib states Currently DFLTCC states are allocated using hook macros, complicating memory management. Inline them into zlib states and remove the hooks.	2024-05-15 11:28:10 +02:00
Vladislav Shchapov	fe0a6407da	Explicitly indicate functions are conditionally dispatched Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>	2024-03-06 23:32:15 +01:00
Fabian Vogt	e5a4a8ae6b	Handle complete overlap in chunkcopy_safe Fixes #1525	2023-06-28 17:22:11 +02:00
Dougall Johnson	6a74e9294f	Inflate: add fast-path for literals	2023-02-24 13:24:49 +01:00
Nathan Moinvaziri	aa1109bb2e	Use arch-specific versions of inflate_fast. This should reduce the cost of indirection that occurs when calling functable chunk copying functions inside inflate_fast. It should also allow the compiler to optimize the inflate fast path for the specific architecture.	2023-02-05 17:51:46 +01:00
Pavel P	7659b386a6	Make sure inflate_p.h is fully guarded by header guard	2023-01-20 00:29:32 +01:00
Ilya Leoshkevich	3eab3173ac	IBM zSystems DFLTCC: Support inflate with small window There is no hardware control for DFLTCC window size, and because of that supporting small windows for deflate is not trivial: one has to make sure that DFLTCC does not emit large distances, which most likely entails somehow trimming the window and/or input in order to make sure that whave + avail_in <= wsize. But inflate is much easier: one only has to allocate enough space. Do that in dfltcc_alloc_window(), and also introduce ZCOPY_WINDOW() in order to copy everything, not just what the software implementation cares about. After this change, software and hardware window formats no longer match: the software will use wbits and wsize, and the hardware will use HB_BITS and HB_SIZE. Unlike deflate, inflate does not switch between software and hardware implementations mid-stream, which leaves only inflateSetDictionary() and inflateGetDictionary() interesting.	2022-12-11 12:03:12 +01:00
Nathan Moinvaziri	e22195e5bc	Don't use unaligned access for memcpy instructions due to GCC 11 assuming it is aligned in certain instances.	2022-08-17 14:41:18 +02:00
Nathan Moinvaziri	7092c2be68	Fixed inflate size conversion warning in chunkcopy_safe. inflate_p.h(142,27): warning C4244: 'function': conversion from 'uint64_t' to 'size_t', possible loss of data	2022-06-13 15:58:25 +02:00
Adam Stylinski	ef0cf5ca17	Improved chunkset substantially where it's heavily used For most realistic use cases, this doesn't make a ton of difference. However, for things which are highly compressible and enjoy very large run length encodes in the window, this is a huge win. We leverage a permutation table to swizzle the contents of the memory chunk into a vector register and then splat that over memory with a fast copy loop. In essence, where this helps, it helps a lot. Where it doesn't, it does no measurable damage to the runtime. This commit also simplifies a chunkcopy_safe call for determining a distance. Using labs is enough to give the same behavior as before, with the added benefit that no predication is required _and_, most importantly, static analysis by GCC's string fortification can't throw a fit because it conveys better to the compiler that the input into builtin_memcpy will always be in range.	2022-05-23 16:13:29 +02:00
Ilya Leoshkevich	c592b1b332	IBM Z DFLTCC: Split deflate and inflate states Currently deflate and inflate both use a common state struct. There are several variables in this struct that we don't need for inflate, and more may be coming in the future. Therefore split them in two separate structs. This in turn requires splitting ZALLOC_STATE and ZCOPY_STATE macros.	2022-04-28 12:01:57 +02:00
Nathan Moinvaziri	41faa0843d	Fixed clang signed/unsigned warning in chunkcopy_safe. inflate_p.h:159:18: warning: comparison of integers of different signs: 'int32_t' (aka 'int') and 'size_t' (aka 'unsigned long') [-Wsign-compare] tocopy = MIN(non_olap_size, len); ^ ~~~~~~~~~~~~~ ~~~ zbuild.h:74:24: note: expanded from macro 'MIN' #define MIN(a, b) ((a) > (b) ? (b) : (a)) ~ ^ ~	2022-03-29 13:14:18 +02:00
Nathan Moinvaziri	d979b89e00	Fixed MSVC warnings in chunkcopy_safe. inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data inffast.c inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data inflate.c inflate_p.h(244,18): warning C4018: '>': signed/unsigned mismatch inflate_p.h(234,38): warning C4244: 'initializing': conversion from '__int64' to 'int', possible loss of data	2022-03-27 19:17:21 +02:00
Nathan Moinvaziri	6c4beb611d	Revert "Reorganize inflate window layout" This reverts commit `dc3b60841d`.	2022-03-23 11:30:35 +01:00
Nathan Moinvaziri	b5af5013d9	Revert "Add back original version of inflate_fast for use with inflateBack." This reverts commit `2d2dde43b1`.	2022-03-23 11:30:35 +01:00
Nathan Moinvaziri	097f789fa2	Revert "DFLTCC update for window optimization from Jim & Nathan" This reverts commit `b4ca25afab`.	2022-03-23 11:30:35 +01:00
Adam Stylinski	49a6bb5d41	Speed up chunkcopy and memset This was found to have a significant impact on a highly compressible PNG for both the encode and decode. Some deltas show performance improving as much as 60%+. For the scenarios where the "dist" is not an even modulus of our chunk size, we simply repeat the bytes as many times as possible into our vector registers. We then copy the entire vector and then advance the quotient of our chunksize divided by our dist value. If dist happens to be 1, there's no reason to not just call memset from libc (this is likely to be just as fast if not faster).	2022-03-16 11:42:19 +01:00
Nathan Moinvaziri	5b0ffa63f2	Added checks and comments to ensure that when using raw mode no checksumming takes place.	2021-12-09 14:28:34 +01:00
Ilya Leoshkevich	b4ca25afab	DFLTCC update for window optimization from Jim & Nathan Stop relying on software and hardware inflate window formats being the same and act the way we already do for deflate: provide and implement window-related hooks. Another possibility would be to use an in-line history buffer (by not setting HBT_CIRCULAR), but this would require an extra memmove(). Also fix a couple corner cases in the software implementation of inflateGetDictionary() and inflateSetDictionary().	2021-12-02 09:26:32 +01:00
Nathan Moinvaziri	2d2dde43b1	Add back original version of inflate_fast for use with inflateBack.	2021-12-02 09:26:32 +01:00
Jim Kukunas	dc3b60841d	Reorganize inflate window layout This commit significantly improves inflate performance by reorganizing the window buffer into a contiguous window and pending output buffer. The goal of this layout is to reduce branching, improve cache locality, and enable for the use of crc folding with gzip input. The window buffer is allocated as a multiple of the user-selected window size. In this commit, a factor of 2 is utilized. The layout of the window buffer is divided into two sections. The first section, window offset [0, wsize), is reserved for history that has already been output. The second section, window offset [wsize, 2 * wsize), is reserved for buffering pending output that hasn't been flushed to the user's output buffer yet. The history section grows downwards, towards the window offset of 0. The pending output section grows upwards, towards the end of the buffer. As a result, all of the possible distance/length data that may need to be copied is contiguous. This removes the need to stitch together output from 2 separate buffers. In the case of gzip input, crc folding is used to copy the pending output to the user's buffers. Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>	2021-12-02 09:26:32 +01:00
Hans Kristian Rosbach	6ce39348ba	inflate: add SET_BAD macro, to make inflate.c a little cleaner.	2020-10-24 15:51:46 +02:00
Mika Lindqvist	28e5e73f34	Reintroduce support for ZLIB_CONST in compat mode. (#704 ) * Reintroduce support for ZLIB_CONST in compat mode.	2020-08-23 09:58:57 +02:00
Nathan Moinvaziri	feff87a53e	Fixed compiler warning when using BITS macro.	2020-02-07 18:42:03 +01:00
Hans Kristian Rosbach	d8eedcfa3e	Deduplicate common inflate/inflatefast/inflateBack macros into inflate_p.h	2019-08-06 09:39:26 +02:00

32 Commits