Commit Graph

8 Commits

Author SHA1 Message Date
Adam Stylinski
50e9ca06e2 Fold a copy into the adler32 function for UPDATEWINDOW for neon
So a lot of alterations had to be done to make this not worse and
so far, it's not really better, either. I had to force inlining for
the adler routine, I had to remove the x4 load instruction otherwise
pipelining stalled, and I had to use restrict pointers with a copy
idiom for GCC to inline a copy routine for the tail.

Still, we see a small benefit in benchmarks, particularly when done
with size of our window or larger. There's also an added benefit that
this will fix #1824.
2025-03-05 22:17:55 +01:00
Cameron Cawley
721c488aff Rename most ACLE references to ARMv8 2025-02-12 13:54:30 +01:00
Adam Stylinski
785444de08 Fix native detection of CRC instruction
It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
preprocessor macro for that flag is not defined with -march=native on a
raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
and the auto detection when built without NATIVE support does properly
get dispatched to. Since we only need the scalar CRC32 and not the polynomial
stuff anyhow, let's make it be an || condition and not a && one.
2024-12-01 16:05:15 +01:00
Adam Stylinski
94aacd8bd6 Try to simply the inflate loop by collapsing most cases to chunksets 2024-10-23 21:20:11 +02:00
Vladislav Shchapov
c694bcdaf6 Add option to disable runtime CPU detection
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-03-06 23:32:15 +01:00
Hans Kristian Rosbach
9953f12e21 Move update_hash(), insert_string() and quick_insert_string() out of functable
and remove SSE4.2 and ACLE optimizations. The functable overhead is higher
than the benefit from using optimized functions.
2024-02-23 13:34:10 +01:00
Nathan Moinvaziri
a090529ece Remove deflate_state parameter from update_hash functions. 2024-02-23 13:34:10 +01:00
Vladislav Shchapov
ac25a2ea6a Split CPU features checks and CPU-specific function prototypes and reduce include-dependencies.
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-02-22 20:11:46 +01:00