Commit Graph

495 Commits

Author SHA1 Message Date
Mika Lindqvist
2c98ece180 [configure] Add support for RISC-V ZBC extension
Some checks failed
Configure / ${{ matrix.name }} (mips-linux-gnu, mips-linux-gnu-gcc, --warn, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu libc-dev-mips-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips64-linux-gnuabi64, mips64-linux-gnuabi64-gcc, --warn, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 libc-dev-mips64-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc-linux-gnu, powerpc-linux-gnu-gcc, --warn --without-power8, Ubuntu GCC PPC No Power8, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu libc-dev-powerpc-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc64le-linux-gnu, powerpc64le-linux-gnu-gcc, --warn, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu libc-dev-ppc64el-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn --without-rvv, Ubuntu GCC RISCV64 No RVV, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn --zlib-compat --without-optimizations --without-new-strategies, Ubuntu GCC RISCV64 Compat No Opt, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn, Ubuntu GCC RISCV64, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
OSS-Fuzz / Fuzzing (push) Has been cancelled
Libpng / Ubuntu Clang (push) Has been cancelled
Link / Link zlib (push) Has been cancelled
Link / Link zlib-ng compat (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DCMAKE_TOOLCHAIN_FILE=../../cmake/toolchain-aarch64.cmake, ubuntu_gcc_pigz_aarch64, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu libc-dev-arm64-cross) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_OPTIM=OFF, ubuntu_clang_pigz_no_optim, clang, llvm-cov-15 gcov, Ubuntu Clang No Optim, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_THREADS=OFF -DPIGZ_VERSION=v2.6, ubuntu_clang_pigz_no_threads, clang, llvm-cov-15 gcov, Ubuntu Clang No Threads, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, ubuntu_gcc_pigz, gcc, Ubuntu GCC Symbol Prefix, ubuntu-latest) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_clang_pigz, clang, llvm-cov-15 gcov, Ubuntu Clang, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_gcc_pigz, gcc, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, clang, --sprefix=zTest_, clang++, macOS Clang Symbol Prefix, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-m32, -DCMAKE_C_FLAGS=-m32 -DCMAKE_CXX_FLAGS=-m32, gcc, g++, -m32, -m32, Ubuntu GCC -m32, ubuntu-latest, gcc-multilib g++-multilib) (push) Has been cancelled
Package Check / ${{ matrix.name }} (aarch64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-aarch64.cmake, aarch64-linux-gnu-gcc, aarch64-linux-gnu-g++, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (arm-linux-gnueabihf, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-armhf.cmake, arm-linux-gnueabihf-gcc, arm-linux-gnueabihf-g++, Ubuntu GCC ARM HF, ubuntu-latest, qemu-user gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-c… (push) Has been cancelled
Package Check / ${{ matrix.name }} (clang, clang++, macOS Clang, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (gcc, g++, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips.cmake, mips-linux-gnu-gcc, mips-linux-gnu-g++, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu g++-mips-linux-gnu libc6-dev-mips-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips64-linux-gnuabi64, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips64.cmake, mips64-linux-gnuabi64-gcc, mips64-linux-gnuabi64-g++, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 g++-mips64-linux-gnuabi64 libc6-… (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc.cmake, powerpc-linux-gnu-gcc, powerpc-linux-gnu-g++, Ubuntu GCC PPC, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu g++-powerpc-linux-gnu libc6-dev-powerpc-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc64le-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc64le.cmake, powerpc64le-linux-gnu-gcc, powerpc64le-linux-gnu-g++, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu g++-powerpc64le-linux-gnu … (push) Has been cancelled
Package Check / ${{ matrix.name }} (riscv64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-riscv.cmake, riscv64-linux-gnu-gcc, riscv64-linux-gnu-g++, Ubuntu GCC RISC-V, ubuntu-latest, qemu-user gcc-riscv64-linux-gnu g++-riscv64-linux-gnu libc6-dev-riscv64-cross) (push) Has been cancelled
CMake / Upload Coverage Reports (push) Has been cancelled
Pigz / Upload Coverage Reports (push) Has been cancelled
2025-05-28 13:32:14 +02:00
Sam Russell
41d72b9d24 Fix 32bit large chorba 2025-05-28 13:31:37 +02:00
Mika Lindqvist
f90c01107f [WebAssembly] Fix stack overflow in crc32_chorba_118960_nondestructive. 2025-05-27 14:45:52 +02:00
yintong
830995ff78 riscv: add bash configure script and related ci support for riscv
Some checks failed
Configure / ${{ matrix.name }} (mips-linux-gnu, mips-linux-gnu-gcc, --warn, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu libc-dev-mips-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips64-linux-gnuabi64, mips64-linux-gnuabi64-gcc, --warn, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 libc-dev-mips64-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc-linux-gnu, powerpc-linux-gnu-gcc, --warn --without-power8, Ubuntu GCC PPC No Power8, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu libc-dev-powerpc-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc64le-linux-gnu, powerpc64le-linux-gnu-gcc, --warn, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu libc-dev-ppc64el-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn --without-rvv, Ubuntu GCC RISCV64 No RVV, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn --zlib-compat --without-optimizations --without-new-strategies, Ubuntu GCC RISCV64 Compat No Opt, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
Configure / ${{ matrix.name }} (riscv64-linux-gnu, riscv64-linux-gnu-gcc, --warn, Ubuntu GCC RISCV64, ubuntu-latest, qemu-user crossbuild-essential-riscv64) (push) Has been cancelled
OSS-Fuzz / Fuzzing (push) Has been cancelled
Libpng / Ubuntu Clang (push) Has been cancelled
Link / Link zlib (push) Has been cancelled
Link / Link zlib-ng compat (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DCMAKE_TOOLCHAIN_FILE=../../cmake/toolchain-aarch64.cmake, ubuntu_gcc_pigz_aarch64, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu libc-dev-arm64-cross) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_OPTIM=OFF, ubuntu_clang_pigz_no_optim, clang, llvm-cov-15 gcov, Ubuntu Clang No Optim, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_THREADS=OFF -DPIGZ_VERSION=v2.6, ubuntu_clang_pigz_no_threads, clang, llvm-cov-15 gcov, Ubuntu Clang No Threads, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, ubuntu_gcc_pigz, gcc, Ubuntu GCC Symbol Prefix, ubuntu-latest) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_clang_pigz, clang, llvm-cov-15 gcov, Ubuntu Clang, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_gcc_pigz, gcc, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, clang, --sprefix=zTest_, clang++, macOS Clang Symbol Prefix, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-m32, -DCMAKE_C_FLAGS=-m32 -DCMAKE_CXX_FLAGS=-m32, gcc, g++, -m32, -m32, Ubuntu GCC -m32, ubuntu-latest, gcc-multilib g++-multilib) (push) Has been cancelled
Package Check / ${{ matrix.name }} (aarch64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-aarch64.cmake, aarch64-linux-gnu-gcc, aarch64-linux-gnu-g++, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (arm-linux-gnueabihf, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-armhf.cmake, arm-linux-gnueabihf-gcc, arm-linux-gnueabihf-g++, Ubuntu GCC ARM HF, ubuntu-latest, qemu-user gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-c… (push) Has been cancelled
Package Check / ${{ matrix.name }} (clang, clang++, macOS Clang, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (gcc, g++, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips.cmake, mips-linux-gnu-gcc, mips-linux-gnu-g++, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu g++-mips-linux-gnu libc6-dev-mips-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips64-linux-gnuabi64, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips64.cmake, mips64-linux-gnuabi64-gcc, mips64-linux-gnuabi64-g++, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 g++-mips64-linux-gnuabi64 libc6-… (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc.cmake, powerpc-linux-gnu-gcc, powerpc-linux-gnu-g++, Ubuntu GCC PPC, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu g++-powerpc-linux-gnu libc6-dev-powerpc-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc64le-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc64le.cmake, powerpc64le-linux-gnu-gcc, powerpc64le-linux-gnu-g++, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu g++-powerpc64le-linux-gnu … (push) Has been cancelled
Package Check / ${{ matrix.name }} (riscv64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-riscv.cmake, riscv64-linux-gnu-gcc, riscv64-linux-gnu-g++, Ubuntu GCC RISC-V, ubuntu-latest, qemu-user gcc-riscv64-linux-gnu g++-riscv64-linux-gnu libc6-dev-riscv64-cross) (push) Has been cancelled
CMake / Upload Coverage Reports (push) Has been cancelled
Pigz / Upload Coverage Reports (push) Has been cancelled
2025-05-01 22:59:25 +02:00
Pavel P
dd15e04991 Remove unnecessary extern 2025-04-28 21:27:13 +02:00
Pavel P
f09f7791bf Match function declaration for chorba_small_nondestructive_sse2 2025-04-28 21:27:13 +02:00
yintong
10b51fa592 riscv: add crc32 optimization using zbc extension
Some checks failed
Configure / ${{ matrix.name }} (gcc, --warn, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --sprefix=zTest_, macOS GCC Symbol Prefix, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --warn, macOS GCC, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --zlib-compat --sprefix=zTest_, macOS GCC Symbol Prefix & Compat, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips-linux-gnu, mips-linux-gnu-gcc, --warn, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu libc-dev-mips-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips64-linux-gnuabi64, mips64-linux-gnuabi64-gcc, --warn, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 libc-dev-mips64-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc-linux-gnu, powerpc-linux-gnu-gcc, --warn --without-power8, Ubuntu GCC PPC No Power8, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu libc-dev-powerpc-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc64le-linux-gnu, powerpc64le-linux-gnu-gcc, --warn, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu libc-dev-ppc64el-cross) (push) Has been cancelled
OSS-Fuzz / Fuzzing (push) Has been cancelled
Libpng / Ubuntu Clang (push) Has been cancelled
Link / Link zlib (push) Has been cancelled
Link / Link zlib-ng compat (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DCMAKE_TOOLCHAIN_FILE=../../cmake/toolchain-aarch64.cmake, ubuntu_gcc_pigz_aarch64, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu libc-dev-arm64-cross) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_OPTIM=OFF, ubuntu_clang_pigz_no_optim, clang, llvm-cov-15 gcov, Ubuntu Clang No Optim, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_THREADS=OFF -DPIGZ_VERSION=v2.6, ubuntu_clang_pigz_no_threads, clang, llvm-cov-15 gcov, Ubuntu Clang No Threads, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, ubuntu_gcc_pigz, gcc, Ubuntu GCC Symbol Prefix, ubuntu-latest) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_clang_pigz, clang, llvm-cov-15 gcov, Ubuntu Clang, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_gcc_pigz, gcc, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, clang, --sprefix=zTest_, clang++, macOS Clang Symbol Prefix, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-m32, -DCMAKE_C_FLAGS=-m32 -DCMAKE_CXX_FLAGS=-m32, gcc, g++, -m32, -m32, Ubuntu GCC -m32, ubuntu-latest, gcc-multilib g++-multilib) (push) Has been cancelled
Package Check / ${{ matrix.name }} (aarch64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-aarch64.cmake, aarch64-linux-gnu-gcc, aarch64-linux-gnu-g++, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (arm-linux-gnueabihf, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-armhf.cmake, arm-linux-gnueabihf-gcc, arm-linux-gnueabihf-g++, Ubuntu GCC ARM HF, ubuntu-latest, qemu-user gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-c… (push) Has been cancelled
Package Check / ${{ matrix.name }} (clang, clang++, macOS Clang, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (gcc, g++, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips.cmake, mips-linux-gnu-gcc, mips-linux-gnu-g++, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu g++-mips-linux-gnu libc6-dev-mips-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips64-linux-gnuabi64, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips64.cmake, mips64-linux-gnuabi64-gcc, mips64-linux-gnuabi64-g++, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 g++-mips64-linux-gnuabi64 libc6-… (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc.cmake, powerpc-linux-gnu-gcc, powerpc-linux-gnu-g++, Ubuntu GCC PPC, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu g++-powerpc-linux-gnu libc6-dev-powerpc-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc64le-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc64le.cmake, powerpc64le-linux-gnu-gcc, powerpc64le-linux-gnu-g++, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu g++-powerpc64le-linux-gnu … (push) Has been cancelled
CMake / Upload Coverage Reports (push) Has been cancelled
Pigz / Upload Coverage Reports (push) Has been cancelled
2025-04-27 18:23:50 +02:00
Adam Stylinski
46fc33f39d SSE4.1 optimized chorba
This is ~25-30% faster than the SSE2 variant on a core2 quad. The main reason
for this has to do with the fact that, while incurring far fewer shifts,
an entirely separate stack buffer has to be managed that is the size of
the L1 cache on most CPUs. This was one of the main reasons the 32k
specialized function was slower for the scalar counterpart, despite auto
vectorizing. The auto vectorized loop was setting up the stack buffer at
unaligned offsets, which is detrimental to performance pre-nehalem.
Additionally, we were losing a fair bit of time to the zero
initialization, which we are now doing more selectively.

There are a ton of loads and stores happening, and for sure we are bound
on the fill buffer + store forwarding. An SSE2 version of this code is
probably possible by simply replacing the shifts with unpacks with zero
and the palignr's with shufpd's. I'm just not sure it'll be all that worth
it, though. We are gating against SSE4.1 not because we are using specifically
a 4.1 instruction but because that marks when Wolfdale came out and palignr
became a lot faster.
2025-04-15 14:11:12 +02:00
Detlef Riekenberg
5a232688e1 port: Use __cpuid only, when available.
Add a fallback, when __cpuid is not available
2025-04-15 14:08:46 +02:00
Hans Kristian Rosbach
00a3168d5d Add AVX512 version of compare256
Improve the speed of sub-16 byte matches by first using a
128-bit intrinsic, after that use only 512-bit intrinsics.
This requires us to overlap on the last run, but this is cheaper than
processing the tail using a 256-bit and then a 128-bit run.

Change benchmark steps to avoid it hitting chunk boundaries
of one or the other function as much, this gives more fair benchmarks.
2025-04-14 23:28:38 +02:00
Adam Stylinski
724dc0cfb4 Explicit SSE2 vectorization of Chorba CRC method
The version that's currently in the generic implementation for 32768
byte buffers leverages the stack. It manages to autovectorize but
unfortunately the trips to the stack hurt its performance for CPUs which
need this the most. This version is explicitly SIMD vectorized and
doesn't use trips to the stack.  In my testing it's ~10% faster than the
"small" variant, and about 42% faster than the "32768" variant.
2025-03-28 20:43:59 +01:00
Icenowy Zheng
2bba7e8468 riscv: chunkset_rvv: fix SIGSEGV in CHUNKCOPY
The chunkset_tpl comment allows negative dist (out - from) as long as
the length is smaller than the absolute value of dist (i.e. memory does
not overlap). However this case is currently broken in the RVV override
of CHUNKCOPY -- it compares dist (which is a ptrdiff_t, a value that
should be of the same size with size_t but signed) with the result of
sizeof (which is a size_t), and this triggers the implicit conversion
from signed to unsigned (thus losing negative values).

As it's promised to be not overlapping when dist is negative, just use a
gaint memcpy() call to copy everything.

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
2025-03-28 15:30:21 +01:00
Adam Stylinski
18b933b88a Fix a bug on the 32k and greater chorba specializations
In testing a SIMD vectorization for this, I wrote a gtest which stumbled
onto the fact that this had a bug on big endian. Before the initial CRC
had been mixed in it needed to be byte swapped.
2025-03-28 15:29:11 +01:00
Nathan Moinvaziri
d0652423c1 Disable MSVC optimizations for AVX512 GET_CHUNK_MAG #1883
MSVC compiler (VS 17.11.x) incorrectly optimizes the GET_CHUNK_MAG code on
older versions. Appears to be resolved in VS 17.13.2. The compiler would
optimize the code in such a way that it would cause a decompression failure.
It only happens when /Os flag is set.
2025-03-26 20:06:38 +01:00
Hans Kristian Rosbach
fd0d263ced [CI] Instead of selecting the most recent tag, select the highest version number. 2025-03-18 21:04:58 +01:00
Eddy S.
5f1c7303ab fix the url of the s390x actions worker patch
gaplib changed their patch name scheme with 1a5e012.
2025-03-17 12:23:16 +01:00
Adam Stylinski
50e9ca06e2 Fold a copy into the adler32 function for UPDATEWINDOW for neon
So a lot of alterations had to be done to make this not worse and
so far, it's not really better, either. I had to force inlining for
the adler routine, I had to remove the x4 load instruction otherwise
pipelining stalled, and I had to use restrict pointers with a copy
idiom for GCC to inline a copy routine for the tail.

Still, we see a small benefit in benchmarks, particularly when done
with size of our window or larger. There's also an added benefit that
this will fix #1824.
2025-03-05 22:17:55 +01:00
Hans Kristian Rosbach
9d4af458ea Make Chorba configurable,and add a few missing header files to CMake config.
Add CI run without chorba enabled.
2025-02-18 23:59:16 +01:00
Hans Kristian Rosbach
c1796e2145 Use OPTIMAL_CMP instead of BRAID_W to test for optimal size for Chorba. 2025-02-18 23:59:16 +01:00
Hans Kristian Rosbach
f411580733 Clean up internal crc32 function handling.
Mark crc32_c and crc32_braid functions as internal, and remove prefix.
Reorder contents of generic_functions, and remove Z_INTERNAL hints from declarations.
Add test/benchmark output to indicate whether Chorba is used.
2025-02-18 23:59:16 +01:00
Hans Kristian Rosbach
ed30965e29 Replace DO1/DO8 macros 2025-02-18 23:59:16 +01:00
Hans Kristian Rosbach
5fb2a1c493 Move Chorba defines 2025-02-18 23:59:16 +01:00
Hans Kristian Rosbach
8648ffef49 Clean up crc32_braid.
- Rename N and W to BRAID_N and BRAID_W
- Remove override capabilities for BRAID_N and BRAID_W
- Fix formatting in crc32_braid_tbl.h
- Make makecrct not rely on crc32_braid_p.h
2025-02-18 23:59:16 +01:00
Sam Russell
b33ba962c2 implement chorba algorithm 2025-02-15 14:31:50 +01:00
Cameron Cawley
7ea78f12c8 Provide an inline asm fallback for the ARMv8 intrinsics 2025-02-12 13:54:30 +01:00
Cameron Cawley
721c488aff Rename most ACLE references to ARMv8 2025-02-12 13:54:30 +01:00
Adam Stylinski
287c4dce22 Fix an unfortunate bug with Visual Studio 2015
Evidently this instruction, despite the intrinsic having a register operand,
is a memory-register instruction. There seems to be no alignment requirement
for the source operand. Because of this, compilers when not optimized are doing
the unaligned load and then dumping back to the stack to do the broadcasting load.
In doing this, MSVC seems to be dumping to the stack with an aligned move at an
unaligned address, causing a segfault.  GCC does not seem to make this mistake, as
it stashes to an aligned address.

If we're on Visual Studio 2015, let's just do the longer 9 cycle sequence of a 128
bit load followed by a vinserti128. This _should_ fix this (issue #1861).
2025-02-04 20:02:39 +01:00
Hans Kristian Rosbach
a0fa24710c s390x: Add workaround to install custom Clang 19.1.5 rpms to actions-runner
image in order to avoid the VX compiler bug in older clang versions.
2025-01-27 12:39:53 +01:00
Vladislav Shchapov
69a60bfc18 Rename "arch/power/fallback_builtins.h" to avoid possible conflict with "fallback_builtins.h" in zlib-ng sources directory
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2025-01-27 12:38:30 +01:00
Eduard Stefes
5e3510e314 Disable CRC32-VX Extention for some Clang versions
We have to disable the CRC32-VX implementation for some Clang versions
(18 <= version < 19.1.2) that generate bad code for the IBM S390 VGFMA intrinsics.
2025-01-25 17:25:17 +01:00
Hans Kristian Rosbach
212563db62 Improve image/container rebuild script to work properly under cron. 2025-01-19 16:53:46 +01:00
Dmitry Kurtaev
9064a25f11 Workaround error G6E97C40B
Warning as an error with GCC from Uubuntu 24.04:
```
/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/external/zlib-ng/arch/riscv/riscv_features.c(25,33): error G6E97C40B: suggest parentheses around ‘&&’ within ‘||’ [-Wparentheses] [/home/runner/work/dotnet_riscv/dotnet_riscv/runtime/src/native/libs/build-native.proj]
```
2025-01-19 16:11:15 +01:00
Hans Kristian Rosbach
bf05e882b8 Continued cleanup of old UNALIGNED_OK checks
- Remove obsolete checks
- Fix checks that are inconsistent
- Stop compiling compare256/longest_match variants that never gets called
- Improve how the generic compare256 functions are handled.
- Allow overriding OPTIMAL_CMP

This simplifies the code and avoids having a lot of code in the compiled library than can never get executed.
2024-12-26 22:14:46 +01:00
Hans Kristian Rosbach
1aeb2915a0 Rename functions to get rid of old and now misleading "unaligned" naming 2024-12-26 22:14:46 +01:00
Cameron Cawley
d7e121e56b Use GCC's may_alias attribute for unaligned memory access 2024-12-24 12:55:44 +01:00
Adam Stylinski
06bba67470 Fix unaligned access in ACLE based crc32
This fixes a rightful complaint from the alignment sanitizer that we
alias memory in an unaligned fashion. A nice added bonus is that this
improves performance a tiny bit on the larger buffers, perhaps due to
loops that idiomatically decrement a count and increment a single buffer
pointer rather than the maze of conditional pointer reassignments.

While here, let's write a unit test just for this. Since this is the only
variant that accesses memory in a potentially unaligned fashion that doesn't
explicitly go byte by byte or use intrinsics that don't require alignment,
we'll enable it only for this function for now. Adding more tests later if
need be should be possible. For everything else not crc, we're relying on
ubsan to hopefully catch things by chance.
2024-12-23 14:06:35 +01:00
Hans Kristian Rosbach
87d8e95408 Update s390x actions-runner docker 2024-12-22 15:41:08 +01:00
Adam Stylinski
04d1b75819 Make big endians first class citizens again
No longer do the big iron on yore which lack SIMD optimized loads need
to search strings a byte at a time like primitive machines of the vax
era. This guard here was mostly due to the fact that the string
comparison was searched with "count trailing zero", which assumes an
endianness.  We can just conditionally use leading zeros when on big
endian and stop using the extremely naive C implementation. This makes
things a tad bit faster.
2024-12-21 13:16:08 +01:00
Icenowy Zheng
dbccbd17a9 adler32_rvv: Fix some overflow problems
There are currently some overflow problems in adler32_rvv
implementation, which can lead to wrong results for some input, and
these problems could be easily exhibited when running `git fsck` with
zlib-ng suitituting the system zlib on a big git repository.

These problems and the solutions are the following:

- When the input data is long enough, the v_buf32_accu can overflow too.
  Add it to the modulo code that happens per ~NMAX bytes.
- When the vector data is reduced to scalar ones, the resulting scalar
  value (and the proceeded length) may lead to the calculation of sum2
  to overflow. Add mod BASE to all these reductions and initial
  calculation of sum2.
- When the remaining data less than vl bytes, the code falls back to a
  scalar implementation; however the sum2 and alder2 values are just
  reduced from vectors and could be very big that makes sum2 overflows
  in the scalar code. Modulo them before the scalar code to prevent such
  overflow (because vl is surely quite smaller than NMAX).

Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
2024-12-21 13:14:59 +01:00
Hans Kristian Rosbach
509f6b5818 Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.
2024-12-21 00:46:48 +01:00
Hans Kristian Rosbach
037ab0fd35 Revert "Since we long ago make unaligned reads safe (by using memcpy or intrinsics),"
This reverts commit 80fffd72f3.
It was mistakenly pushed to develop instead of going through a PR and the appropriate reviews.
2024-12-17 23:09:31 +01:00
Hans Kristian Rosbach
80fffd72f3 Since we long ago make unaligned reads safe (by using memcpy or intrinsics),
it is time to replace the UNALIGNED_OK checks that have since really only been
used to select the optimal comparison sizes for the arch instead.
2024-12-17 23:02:32 +01:00
Adam Stylinski
43d74a223b Improve pipeling for AVX512 chunking
For reasons that aren't quite so clear, using the masked writes here
did not pipeline very well. Either setting up the mask stalled things
or masked moves have issues overlapping regular moves. Simply putting
the masked moves behind a branch that is rarely taken seemed to do the
trick in improving the ILP. While here, put masked loads behind the same
branch in case there were ever a hazard for overreading.
2024-12-10 22:17:14 +01:00
Adam Stylinski
7020cb3f74 Enable AVX2 functions to be built with BMI2 instructions
While these are technically different instructions, no such CPU exists
that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
eliminate several flag stalls by having flagless versions of shifts, and
allows us to not clobber and move around GPRs so much in scalar code.
There's usually a sizeable benefit for enabling it. Since we're building
with BMI2 for AVX2 functions, let's also just make sure the CPU claims
to support it (just to cover our bases).
2024-12-07 22:32:29 +01:00
Adam Stylinski
785444de08 Fix native detection of CRC instruction
It's unclear if raspberry pi OS's shipped GCC doesn't properly detect
ACLE or not (/proc/cpuinfo claims to support AES), but in any case, the
preprocessor macro for that flag is not defined with -march=native on a
raspberry pi 5. Unfortunately that means when built "WITH_NATIVE", we do
not get a fast CRC function.  The CRC32 preprocessor macro _IS_ defined,
and the auto detection when built without NATIVE support does properly
get dispatched to. Since we only need the scalar CRC32 and not the polynomial
stuff anyhow, let's make it be an || condition and not a && one.
2024-12-01 16:05:15 +01:00
Pavel P
3c11f65f41 Remove unused HAVE_CHUNKMEMSET_1 define 2024-12-01 16:04:58 +01:00
Adam Stylinski
0ed5ac8289 Make an AVX512 inflate fast with low cost masked writes
This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.
2024-11-20 22:14:44 +01:00
Adam Stylinski
94aacd8bd6 Try to simply the inflate loop by collapsing most cases to chunksets 2024-10-23 21:20:11 +02:00
Adam Stylinski
e874b34e1a Make chunkset_avx2 half chunk aware
This gives us appreciable gains on a number of fronts.  The first being
we're inlining a pretty hot function that was getting dispatched to
regularly. Another is that we're able to do a safe lagged copy of a
distance that is smaller, so CHUNKCOPY gets its teeth back here for
smaller sizes, without having to do another dispatch to a function.

We're also now doing two overlapping writes at once and letting the CPU
do its store forwarding. This was an enhancement @dougallj had suggested
a while back.

Additionally, the "half chunk mag" here is fundamentally less
complicated because it doesn't require sythensizing cross lane permutes
with a blend operation, so we can optimistically do that first if the
len is small enough that a full 32 byte chunk doesn't make any sense.
2024-10-12 13:21:03 +02:00
Adam Stylinski
b52e703417 Simplify avx2 chunkset a bit
Put length 16 in the length checking ladder and take care of it there
since it's also a simple case to handle. We kind of went out of our way
to pretend 128 bit vectors didn't exist when using avx2 but this can be
handled in a single instruction. Strangely the intrinsic uses vector
register operands but the instruction itself assumes a memory operand
for the source. This also means we don't have to handle this case in our
"GET_CHUNK_MAG" function.
2024-10-12 13:21:03 +02:00