Adam Stylinski
94aacd8bd6
Try to simply the inflate loop by collapsing most cases to chunksets
2024-10-23 21:20:11 +02:00
Vladislav Shchapov
af8169a724
Replace conditional call to functable.force_init with macro FUNCTABLE_INIT
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-03-06 23:32:15 +01:00
Vladislav Shchapov
c694bcdaf6
Add option to disable runtime CPU detection
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-03-06 23:32:15 +01:00
Vladislav Shchapov
fe0a6407da
Explicitly indicate functions are conditionally dispatched
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-03-06 23:32:15 +01:00
Hans Kristian Rosbach
9953f12e21
Move update_hash(), insert_string() and quick_insert_string() out of functable
...
and remove SSE4.2 and ACLE optimizations. The functable overhead is higher
than the benefit from using optimized functions.
2024-02-23 13:34:10 +01:00
Nathan Moinvaziri
a090529ece
Remove deflate_state parameter from update_hash functions.
2024-02-23 13:34:10 +01:00
Vladislav Shchapov
ac25a2ea6a
Split CPU features checks and CPU-specific function prototypes and reduce include-dependencies.
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-02-22 20:11:46 +01:00
Pavel P
0ec36a4732
Fix include paths
...
zlb-ng shouldn't require to have arch/generic in the include path
2024-02-18 10:12:20 +01:00
Nathan Moinvaziri
379eda2e80
Remove type declarations for z_stream/zng_stream from cpu_features.
2024-01-30 20:50:05 +01:00
Hans Kristian Rosbach
06895bc1b3
Move crc32 C fallbacks to arch/generic
2024-01-19 15:22:34 +01:00
Hans Kristian Rosbach
4e132cc0ec
Move adler32 C fallbacks to arch/generic
2024-01-19 15:22:34 +01:00
Vladislav Shchapov
9d486b5073
Atomic functable
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2023-12-25 20:47:24 +01:00
Vladislav Shchapov
0c32ad4237
Add force initialization functable, because deflate captures function pointers from functable
...
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2023-12-21 16:12:00 +01:00
Hans Kristian Rosbach
9db6a98894
Sort functable alphabetically
2023-02-17 15:11:25 +01:00
Pavel P
3e75a5c981
Correct inflate_fast function signature
2023-02-08 15:22:22 +01:00
Nathan Moinvaziri
c72cd309ca
Remove unused chunk memory functions from functable.
2023-02-05 17:51:46 +01:00
Nathan Moinvaziri
aa1109bb2e
Use arch-specific versions of inflate_fast.
...
This should reduce the cost of indirection that occurs when calling functable
chunk copying functions inside inflate_fast. It should also allow the compiler
to optimize the inflate fast path for the specific architecture.
2023-02-05 17:51:46 +01:00
Cameron Cawley
1ab443812a
Use size_t instead of uint64_t for len in all adler32 functions
2023-01-22 00:58:12 +01:00
Cameron Cawley
23e4305932
Use size_t instead of uint64_t for len in all crc32 functions
2023-01-22 00:58:12 +01:00
Pavel P
e49d55848f
Fix compilation error where crc32_fold
type matches field name in struct functable_s
...
If functable.h is included by a c++ compiler, compiler issues the following error (VS 2022):
```
zlib-ng/functable.h(20,49): error C2327: 'functable_s::crc32_fold': is not a type name, static, or enumerator
```
The error happens on line 20 because on previous line crc32_fold is declared as a struct member. Using `struct crc32_fold_s` instead of `crc32_fold` fixes the error.
2023-01-21 22:27:16 +01:00
Nathan Moinvaziri
2ca4a77761
Used fixed width uint8_t for crc32 and adler32 function declarations.
2022-06-24 15:12:00 +02:00
Nathan Moinvaziri
5f370cd887
Use uint64_t instead of size_t for len in adler32 to be consistent with crc32.
2022-06-24 15:12:00 +02:00
Nathan Moinvaziri
7e243e4436
Fix MSVC possible loss of data warning in crc32_pclmulqdq by converting len types to use uint64_t.
...
arch\x86\crc32_fold_pclmulqdq.c(604,43): warning C4244: 'function':
conversion from 'uint64_t' to 'size_t', possible loss of data
2022-06-24 15:12:00 +02:00
Adam Stylinski
b8269bb7d4
Added inlined AVX512 adler checksum + copy
...
While we're here, also simplfy the "fold" signature, as reducing the
number of rebases and horizontal sums did not prove to be meaningfully
faster (slower in many circumstances).
2022-05-23 16:13:39 +02:00
Adam Stylinski
b1389ac2d5
Create adler32_fold_c* functions
...
These are very simple wrappers that do nothing clever but serve as a
shim interface for implementing versions which do cleverly track the
number of scalar sums performed so that we can minimize rebasing and
also have an efficient copy elision.
This serves as the baseline as each vectorization gets its own commit.
That way the PR will be bisectable.
2022-05-23 16:13:39 +02:00
Adam Stylinski
8550a90de4
Leverage inline CRC + copy
...
This brings back a bit of the performance that may have been sacrificed
by reverting the reorganized inflate window. Doing a copy at the same
time as a CRC is basically free.
2022-03-31 16:11:15 +02:00
Adam Stylinski
49a6bb5d41
Speed up chunkcopy and memset
...
This was found to have a significant impact on a highly compressible PNG
for both the encode and decode. Some deltas show performance improving
as much as 60%+.
For the scenarios where the "dist" is not an even modulus of our chunk
size, we simply repeat the bytes as many times as possible into our
vector registers. We then copy the entire vector and then advance the
quotient of our chunksize divided by our dist value.
If dist happens to be 1, there's no reason to not just call memset from
libc (this is likely to be just as fast if not faster).
2022-03-16 11:42:19 +01:00
Nathan Moinvaziri
784c563465
Group together functable definitions that use deflate_state.
2022-01-23 16:39:48 +01:00
Nathan Moinvaziri
0911015e48
Use fixed width types in compare256 definition.
2022-01-23 16:39:48 +01:00
Nathan Moinvaziri
66506ace8d
Convert compare258 to compare256 and moved 2 byte check into deflate_quick. Prevents having multiple compare258 functions with 2 byte checks.
2022-01-16 17:30:15 +01:00
Nathan Moinvaziri
d802e8900f
Move crc32 folding functions into functable.
2021-08-13 15:05:34 +02:00
Nathan Moinvaziri
ef416b7e27
Separate fast-zlib matching algorithm into its own longest_match variant.
2021-06-25 20:09:14 +02:00
Nathan Moinvaziri
5998d5b632
Added update_hash to build hash incrementally.
2021-06-25 20:09:14 +02:00
Hans Kristian Rosbach
93ae5483d8
Fix numerous sign-conversion warnings in compare256/compare258 and
...
longest_match related code.
2020-08-31 13:22:54 +02:00
Nathan Moinvaziri
7cffba4dd6
Rename ZLIB_INTERNAL to Z_INTERNAL for consistency.
2020-08-31 12:33:16 +02:00
Nathan Moinvaziri
a540c3f963
Add optional support for thread local storage. ( #733 )
2020-08-23 09:59:38 +02:00
Hans Kristian Rosbach
0cd1818e86
Remove return value from insert_string, since it is always ignored and
...
quick_insert_string is being used instead.
2020-08-21 09:46:03 +02:00
Nathan Moinvaziri
9ee4f8a100
Fixed many possible loss of data warnings where insert_string and quick_insert_string function used on Windows.
2020-08-14 22:20:50 +02:00
Nathan Moinvaziri
e40d88adc9
Split memcopy by architecture.
...
Use uint8_t[8] struct on big-endian machines for speed.
2020-06-28 11:16:05 +02:00
Nathan Moinvaziri
a0fa24f92f
Remove IPos typedef which also helps to reduce casting warnings.
2020-05-30 21:29:44 +02:00
Nathan Moinvaziri
c97a965f18
Converted compare258 to static and convert longest_match to template.
2020-05-24 13:53:25 +02:00
Nathan Moinvaziri
9bd28d9381
Abstracted out architecture specific implementations of 258 byte comparison to compare258.
2020-05-24 13:53:25 +02:00
Nathan Moinvaziri
e09d131b5a
Standardize fill_window implementations and abstract out slide_hash_neon for ARM.
2020-05-01 00:21:18 +02:00
Nathan Moinvaziri
69bbb0d823
Standardize insert_string functionality across architectures. Added unaligned conditionally compiled code for insert_string and quick_insert_string. Unify sse42 crc32 assembly between insert_string and quick_insert_string. Modified quick_insert_string to work across architectures.
2020-04-30 10:01:46 +02:00
Hans Kristian Rosbach
4cee5dcdfe
Add slide_hash to functable, and enable the sse2-optimized version.
...
Add necessary code to cmake and configure.
Fix slide_hash_sse2 to compile with zlib-ng.
2019-09-04 08:53:36 +02:00
Mika Lindqvist
16b6cda67b
Make functable thread-local.
2018-09-17 11:28:28 +02:00
Daniel Black
9ec0d91a01
wrap crc32 in functable ( #145 )
...
* wrap crc32 in functable
* change internal crc32 api to use uint64_t rather than size_t for length
2018-02-16 11:41:44 +01:00
Hans Kristian Rosbach
eb7fd8a1b0
Make sure we don't export internal functions
2017-08-17 11:24:46 +02:00
Mika Lindqvist
5adc2052eb
Lazily initialize functable members. ( #108 )
...
- Split functableInit() function as separate functions for each functable member, so we don't need to initialize full functable in multiple places in the zlib-ng code, or to check for NULL on every invocation.
- Optimized function for each functable member is detected on first invocation and the functable item is updated for subsequent invocations.
- Remove NULL check in adler32() and adler32_z() as it is no longer needed.
2017-05-03 19:14:57 +02:00
Hans Kristian Rosbach
a7c7119009
- Add adler32 to functable
...
- Add missing call to functableinit from inflateinit
- Fix external direct calls to adler32 functions without calling functableinit
2017-04-24 12:47:24 +02:00