zlib-ng

mirror of https://github.com/GerbilSoft/zlib-ng.git synced 2025-06-18 11:35:35 -04:00

Author	SHA1	Message	Date
Nathan Moinvaziri	dee0ff75f8	Remove NMake build projects Reduce development burden by getting rid of NMake files that are manually kept up to date. For continued NMake support please generate NMake project files using CMake.	2025-04-14 23:18:18 +02:00
Adam Stylinski	724dc0cfb4	Explicit SSE2 vectorization of Chorba CRC method The version that's currently in the generic implementation for 32768 byte buffers leverages the stack. It manages to autovectorize but unfortunately the trips to the stack hurt its performance for CPUs which need this the most. This version is explicitly SIMD vectorized and doesn't use trips to the stack. In my testing it's ~10% faster than the "small" variant, and about 42% faster than the "32768" variant.	2025-03-28 20:43:59 +01:00
Sam Russell	b33ba962c2	implement chorba algorithm	2025-02-15 14:31:50 +01:00
Cameron Cawley	721c488aff	Rename most ACLE references to ARMv8	2025-02-12 13:54:30 +01:00
Cameron Cawley	d7e121e56b	Use GCC's may_alias attribute for unaligned memory access	2024-12-24 12:55:44 +01:00
Vladislav Shchapov	775053110c	Rename cpu_functions.h to arch_functions.h in depcheck.cpp Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>	2024-03-04 11:59:09 +01:00
Mika Lindqvist	1d08728c52	Cleanup and update NMake Makefiles. * Add depcheck.exe to validate NMake Makefiles	2024-02-24 14:38:49 +01:00
Hans Kristian Rosbach	9953f12e21	Move update_hash(), insert_string() and quick_insert_string() out of functable and remove SSE4.2 and ACLE optimizations. The functable overhead is higher than the benefit from using optimized functions.	2024-02-23 13:34:10 +01:00
Vladislav Shchapov	ac25a2ea6a	Split CPU features checks and CPU-specific function prototypes and reduce include-dependencies. Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>	2024-02-22 20:11:46 +01:00
Nathan Moinvaziri	fc63426372	Update copyright years in other source files.	2024-02-07 19:15:56 +01:00
Mark Adler	6c7b9a4c9b	Update copyright year in win32 resource files. madler/zlib#8988e03256e9c80766ac6899e86c3bc57c347efc	2024-02-07 19:15:56 +01:00
Hans Kristian Rosbach	0fddd5f125	Rename crc32_braid.c to crc32.c	2024-02-05 08:17:33 +01:00
Hans Kristian Rosbach	86250d40fa	Move compare256 and longest_match C fallbacks to arch/generic	2024-01-19 16:58:53 +01:00
Hans Kristian Rosbach	3416e44ba1	Move slide_hash C fallback to arch/generic	2024-01-19 16:58:53 +01:00
Hans Kristian Rosbach	9a1722a22f	Move insert_string and update_hash C fallbacks to arch/generic. Also add missing insert_string dependencies to windows makefiles.	2024-01-19 16:58:53 +01:00
Hans Kristian Rosbach	30856c33bf	Move chunkset and inffast C fallbacks to arch/generic	2024-01-19 16:58:53 +01:00
Hans Kristian Rosbach	06895bc1b3	Move crc32 C fallbacks to arch/generic	2024-01-19 15:22:34 +01:00
Hans Kristian Rosbach	4e132cc0ec	Move adler32 C fallbacks to arch/generic	2024-01-19 15:22:34 +01:00
Hans Kristian Rosbach	6f38b4c5fc	Simplify includes	2024-01-19 15:22:34 +01:00
Cameron Cawley	16fe1f885e	Add ARMv6 version of slide_hash	2023-09-16 11:11:18 +02:00
Hans Kristian Rosbach	2167377c46	Clean up SSE4.2 support, and no longer use asm fallback or gcc builtin. Defines changing meaning: X86_SSE42 used to mean the compiler supports crc asm fallback. X86_SSE42_CRC_INTRIN used to mean compiler supports SSE4.2 intrinsics. X86_SSE42 now means compiler supports SSE4.2 intrinsics. This therefore also fixes the adler32_sse42 checks, since those were depending on SSE4.2 intrinsics but was mistakenly checking the X86_SSE42 define. Now the X86_SSE42 define actually means what it appears to.	2023-08-06 10:17:24 +02:00
Cameron Cawley	1ae7b0545d	Rename chunkset_avx to chunkset_avx2	2023-04-19 00:35:28 +02:00
Cameron Cawley	b09215f75a	Enable use of _mm_shuffle_epi8 on machines without SSE4.1	2023-04-01 17:27:49 +02:00
Vladislav Shchapov	fdb87d63a5	Split crc32 pclmulqdq and vpclmulqdq implementations Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>	2023-02-24 13:25:54 +01:00
Hans Kristian Rosbach	7e1d80742e	Reduce the amount of different defines required for arch-specific optimizations. Also removed a reference to a nonexistant adler32_sse41 in test/test_adler32.cc.	2023-02-17 15:11:25 +01:00
Nathan Moinvaziri	aa1109bb2e	Use arch-specific versions of inflate_fast. This should reduce the cost of indirection that occurs when calling functable chunk copying functions inside inflate_fast. It should also allow the compiler to optimize the inflate fast path for the specific architecture.	2023-02-05 17:51:46 +01:00
Mika T. Lindqvist	d5db5aa985	Sync with zlib 1.2.13 and declare compatibility.	2023-02-03 15:49:02 +01:00
Viktor Szakats	89763032d5	cmake: respect custom `RC` flags and delete `GCC_WINDRES` Before this patch, `zlib.rc` was compiled using a manual command [1] when using the MinGW (and MSYS/Cygwin) toolchains. This method ignores `CMAKE_RC_FLAGS` and offers no other way to pass a custom flag, breaking the build in cases where a custom `windres` option is required. E.g. `--target=` or `-I` on some platforms and configuration, in particular with `llvm-windres`. This patch deletes the special case for these toolchains and lets CMake compile the `.rc` file the default way used for all Windows targets. I'm not entirely sure why this special case was added back in 2011. The need to pass `-DGCC_WINDRES` is my suspect. We can resolve this much simpler by adding this line for the targets that require it: set(CMAKE_RC_FLAGS "${CMAKE_RC_FLAGS} -DGCC_WINDRES") But, the `.rc` line protected by `GCC_WINDRES`, these days work just fine with `windres`. Moreover, that protected line are oboslete flags from the 16-bit era, which for a long time have no effect, as documented here: <https://docs.microsoft.com/windows/win32/menurc/common-resource-attributes> So, this patch deletes `GCC_WINDRES` from the project entirely. [1] `dc5a43e`	2022-08-17 14:42:34 +02:00
Hans Kristian Rosbach	2f4e2372a2	Simplify zlib-ng native API by removing version and struct size checks. This should be backwards compatible with applications compiled for 2.0.x.	2022-06-03 10:21:01 +02:00
Nathan Moinvaziri	a6155234a2	Speed up software CRC-32 computation by a factor of 1.5 to 3. Use the interleaved method of Kadatch and Jenkins in order to make use of pipelined instructions through multiple ALUs in a single core. This also speeds up and simplifies the combination of CRCs, and updates the functions to pre-calculate and use an operator for CRC combination. Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>	2022-05-25 12:04:35 +02:00
Adam Stylinski	d79984b5bc	Adding avx512_vnni inline + copy elision Interesting revelation while benchmarking all of this is that our chunkmemset_avx seems to be slower in a lot of use cases than chunkmemset_sse. That will be an interesting function to attempt to optimize. Right now though, we're basically beating google for all PNG decode and encode benchmarks. There are some variations of flags that can basically have us trading blows, but we're about as much as 14% faster than chromium's zlib patches. While we're here, add a more direct benchmark of the folded copy method versus the explicit copy + checksum.	2022-05-23 16:13:39 +02:00
Adam Stylinski	b8269bb7d4	Added inlined AVX512 adler checksum + copy While we're here, also simplfy the "fold" signature, as reducing the number of rebases and horizontal sums did not prove to be meaningfully faster (slower in many circumstances).	2022-05-23 16:13:39 +02:00
Adam Stylinski	21f461e238	Adding an SSE42 optimized copy + adler checksum implementation We are protecting its usage around a lot of preprocessor macros as the other methods are not yet implemented and calling this version bypasses the faster adler implementations implicitly. When more versions are written for faster vectorizations, the functable entries will be populated and preprocessor macros removed. This round, the copy + checksum is not employing as many tricks as one would hope with a "folded" checksum routine. The reason for this is the particularly tricky case of dealing with unaligned buffers. The implementations which don't have CPUs in the mix that have a huge penalty for unaligned loads will have a much faster implementation. Fancier methods that minimized rebasing, while having the potential to be faster, ended up being slower because the compiler structured the code in a way that ended up either spilling to the stack or trampolining out of a loop and back in it instead of just jumping over the first load and store. Revisiting this for AVX512, where more registers are abundant and more advanced loads exist, may be prudent.	2022-05-23 16:13:39 +02:00
Adam Stylinski	b1389ac2d5	Create adler32_fold_c* functions These are very simple wrappers that do nothing clever but serve as a shim interface for implementing versions which do cleverly track the number of scalar sums performed so that we can minimize rebasing and also have an efficient copy elision. This serves as the baseline as each vectorization gets its own commit. That way the PR will be bisectable.	2022-05-23 16:13:39 +02:00
Nathan Moinvaziri	48f346e806	Implement neon version of compare256. Co-authored-by: Adam Stylinski <kungfujesus06@gmail.com>	2022-05-06 12:19:35 +02:00
Nathan Moinvaziri	445284c570	Fixed missing crc32_combine exports for zlib 1.2.12.	2022-04-05 13:43:29 +02:00
Nathan Moinvaziri	e38c493337	Move UNALIGNED_OK detection to compile time instead of configure time.	2022-03-17 11:03:26 +01:00
Adam Stylinski	b3260fd0c8	Axe the SSE4 compare256 functions	2022-02-11 09:56:19 +01:00
Nathan Moinvaziri	cc361feaad	Rename CPU feature header and source files for consistency.	2022-02-06 16:52:10 +01:00
Adam Stylinski	9146bd472c	Marginal improvement by pipelining loads on NEON The ld1{4 reg} variant saves us instructions and only adds 3 cycles of latency to load 3 more neon/asimd registers worth of data.	2022-02-01 13:31:00 +01:00
Nathan Moinvaziri	6f179fd301	Added adler32, compare256, crc32, and slide_hash benchmarks using Google Benchmark. Co-authored-by: Adam Stylinski <kungfujesus06@gmail.com>	2022-01-17 09:10:02 +01:00
Nathan Moinvaziri	66506ace8d	Convert compare258 to compare256 and moved 2 byte check into deflate_quick. Prevents having multiple compare258 functions with 2 byte checks.	2022-01-16 17:30:15 +01:00
Nathan Moinvaziri	76c2ddf201	Remove unmaintained and out-dated DLL FAQ.	2022-01-14 20:48:15 +01:00
Nathan Moinvaziri	2af7ead293	Rename x86 source files with instruction set version.	2022-01-14 20:43:03 +01:00
Nathan Moinvaziri	ff52d46714	Remove old win32 readme.	2022-01-06 22:13:44 +01:00
Dženan Zukić	714f624d79	Add support for name mangling This is useful when zlib-ng is embedded into another library, such as ITK: https://itk.org/ Closes #1025. Co-authored-by: Mika Lindqvist <postmaster@raasu.org>	2021-10-09 09:19:55 +02:00
Nathan Moinvaziri	d802e8900f	Move crc32 folding functions into functable.	2021-08-13 15:05:34 +02:00
Nathan Moinvaziri	12a975ac9f	Rename slide source files to slide_hash to match function name.	2021-07-08 09:33:41 +02:00
Nathan Moinvaziri	e52d08ea92	Separate slide_hash_c in the same way that insert_string_c is separated from deflate.c.	2021-07-08 09:33:41 +02:00
Nathan Moinvaziri	6948789969	Added rolling hash functions for hash table.	2021-06-25 20:09:14 +02:00

1 2 3 4

179 Commits