Commit Graph

67 Commits

Author SHA1 Message Date
yintong
10b51fa592 riscv: add crc32 optimization using zbc extension
Some checks failed
Configure / ${{ matrix.name }} (gcc, --warn, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --sprefix=zTest_, macOS GCC Symbol Prefix, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --warn, macOS GCC, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (gcc-11, --zlib-compat --sprefix=zTest_, macOS GCC Symbol Prefix & Compat, macos-13, gcc@11) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips-linux-gnu, mips-linux-gnu-gcc, --warn, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu libc-dev-mips-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (mips64-linux-gnuabi64, mips64-linux-gnuabi64-gcc, --warn, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 libc-dev-mips64-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc-linux-gnu, powerpc-linux-gnu-gcc, --warn --without-power8, Ubuntu GCC PPC No Power8, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu libc-dev-powerpc-cross) (push) Has been cancelled
Configure / ${{ matrix.name }} (powerpc64le-linux-gnu, powerpc64le-linux-gnu-gcc, --warn, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu libc-dev-ppc64el-cross) (push) Has been cancelled
OSS-Fuzz / Fuzzing (push) Has been cancelled
Libpng / Ubuntu Clang (push) Has been cancelled
Link / Link zlib (push) Has been cancelled
Link / Link zlib-ng compat (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DCMAKE_TOOLCHAIN_FILE=../../cmake/toolchain-aarch64.cmake, ubuntu_gcc_pigz_aarch64, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu libc-dev-arm64-cross) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_OPTIM=OFF, ubuntu_clang_pigz_no_optim, clang, llvm-cov-15 gcov, Ubuntu Clang No Optim, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DWITH_THREADS=OFF -DPIGZ_VERSION=v2.6, ubuntu_clang_pigz_no_threads, clang, llvm-cov-15 gcov, Ubuntu Clang No Threads, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, ubuntu_gcc_pigz, gcc, Ubuntu GCC Symbol Prefix, ubuntu-latest) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_clang_pigz, clang, llvm-cov-15 gcov, Ubuntu Clang, ubuntu-latest, llvm-15 llvm-15-tools) (push) Has been cancelled
Pigz / ${{ matrix.name }} (ubuntu_gcc_pigz, gcc, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-DZLIB_SYMBOL_PREFIX=zTest_, clang, --sprefix=zTest_, clang++, macOS Clang Symbol Prefix, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (-m32, -DCMAKE_C_FLAGS=-m32 -DCMAKE_CXX_FLAGS=-m32, gcc, g++, -m32, -m32, Ubuntu GCC -m32, ubuntu-latest, gcc-multilib g++-multilib) (push) Has been cancelled
Package Check / ${{ matrix.name }} (aarch64-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-aarch64.cmake, aarch64-linux-gnu-gcc, aarch64-linux-gnu-g++, Ubuntu GCC AARCH64, ubuntu-latest, qemu-user gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (arm-linux-gnueabihf, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-armhf.cmake, arm-linux-gnueabihf-gcc, arm-linux-gnueabihf-g++, Ubuntu GCC ARM HF, ubuntu-latest, qemu-user gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-c… (push) Has been cancelled
Package Check / ${{ matrix.name }} (clang, clang++, macOS Clang, macOS-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (gcc, g++, Ubuntu GCC, ubuntu-latest) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips.cmake, mips-linux-gnu-gcc, mips-linux-gnu-g++, Ubuntu GCC MIPS, ubuntu-latest, qemu-user gcc-mips-linux-gnu g++-mips-linux-gnu libc6-dev-mips-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (mips64-linux-gnuabi64, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-mips64.cmake, mips64-linux-gnuabi64-gcc, mips64-linux-gnuabi64-g++, Ubuntu GCC MIPS64, ubuntu-latest, qemu-user gcc-mips64-linux-gnuabi64 g++-mips64-linux-gnuabi64 libc6-… (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc.cmake, powerpc-linux-gnu-gcc, powerpc-linux-gnu-g++, Ubuntu GCC PPC, ubuntu-latest, qemu-user gcc-powerpc-linux-gnu g++-powerpc-linux-gnu libc6-dev-powerpc-cross) (push) Has been cancelled
Package Check / ${{ matrix.name }} (powerpc64le-linux-gnu, -DCMAKE_TOOLCHAIN_FILE=cmake/toolchain-powerpc64le.cmake, powerpc64le-linux-gnu-gcc, powerpc64le-linux-gnu-g++, Ubuntu GCC PPC64LE, ubuntu-latest, qemu-user gcc-powerpc64le-linux-gnu g++-powerpc64le-linux-gnu … (push) Has been cancelled
CMake / Upload Coverage Reports (push) Has been cancelled
Pigz / Upload Coverage Reports (push) Has been cancelled
2025-04-27 18:23:50 +02:00
Adam Stylinski
46fc33f39d SSE4.1 optimized chorba
This is ~25-30% faster than the SSE2 variant on a core2 quad. The main reason
for this has to do with the fact that, while incurring far fewer shifts,
an entirely separate stack buffer has to be managed that is the size of
the L1 cache on most CPUs. This was one of the main reasons the 32k
specialized function was slower for the scalar counterpart, despite auto
vectorizing. The auto vectorized loop was setting up the stack buffer at
unaligned offsets, which is detrimental to performance pre-nehalem.
Additionally, we were losing a fair bit of time to the zero
initialization, which we are now doing more selectively.

There are a ton of loads and stores happening, and for sure we are bound
on the fill buffer + store forwarding. An SSE2 version of this code is
probably possible by simply replacing the shifts with unpacks with zero
and the palignr's with shufpd's. I'm just not sure it'll be all that worth
it, though. We are gating against SSE4.1 not because we are using specifically
a 4.1 instruction but because that marks when Wolfdale came out and palignr
became a lot faster.
2025-04-15 14:11:12 +02:00
Cameron Cawley
231c4b3a64 Use -Wa,-march with older ARM toolchains 2025-02-12 13:54:30 +01:00
Cameron Cawley
7ea78f12c8 Provide an inline asm fallback for the ARMv8 intrinsics 2025-02-12 13:54:30 +01:00
Cameron Cawley
721c488aff Rename most ACLE references to ARMv8 2025-02-12 13:54:30 +01:00
Adam Stylinski
7020cb3f74 Enable AVX2 functions to be built with BMI2 instructions
While these are technically different instructions, no such CPU exists
that has AVX2 that doesn't have BMI2. Enabling BMI2 allows us to
eliminate several flag stalls by having flagless versions of shifts, and
allows us to not clobber and move around GPRs so much in scalar code.
There's usually a sizeable benefit for enabling it. Since we're building
with BMI2 for AVX2 functions, let's also just make sure the CPU claims
to support it (just to cover our bases).
2024-12-07 22:32:29 +01:00
Adam Stylinski
0ed5ac8289 Make an AVX512 inflate fast with low cost masked writes
This takes advantage of the fact that on AVX512 architectures, masked
moves are incredibly cheap. There are many places where we have to
fallback to the safe C implementation of chunkcopy_safe because of the
assumed overwriting that occurs. We're to sidestep most of the branching
needed here by simply controlling the bounds of our writes with a mask.
2024-11-20 22:14:44 +01:00
Alexander Smorkalov
4549279dbf Fixed false positive HAVE_ARMV6_INTRIN value on old ARM platforms. 2024-09-11 12:40:39 +02:00
Ilya Leoshkevich
f858914696 IBM zSystems: Hardcode HWCAP_S390_VXRS
Compiling zlib-ng with glibc 2.17 (minimum version still supported by
crosstool-ng) fails due to the lack of HWCAP_S390_VX - it was
introduced in glibc 2.23.

Strictly speaking, this is a problem with the feature detection logic
in cmake. However, it's not worth disabling the s390x vectorized CRC32
if the hwcap constant is missing and the compiler intrinsics are
available.

So fix by hardcoding the constant. It's a part of the kernel ABI,
which does not change.
2024-08-16 11:52:11 +02:00
Un1q32
c5b4b35106
Improved ACLE check (#1727)
Co-authored-by: Cameron Cawley <ccawley2011@gmail.com>
2024-06-13 13:23:29 +02:00
Mika Lindqvist
93b870fbef Add test for checking if -march=native needs -mfpu=neon for 32-bit ARM. 2024-02-24 14:40:52 +01:00
Mika Lindqvist
ca0e4634e1 Fix PCLMULQDQ support for IntelLLVM. 2024-02-21 11:52:25 +01:00
Mika T. Lindqvist
9d945f0d71 Fix xsave intrinsic test for clang, and gcc 8.2 or later, and icc. 2024-02-18 10:10:45 +01:00
Vladislav Shchapov
00e06ab5e1 Allow overwrite NATIVEFLAG value by option NATIVE_ARCH_OVERRIDE.
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-02-18 10:08:45 +01:00
Mika Lindqvist
598128f5d1 Fix regression caused by 2fa631e029
* POWER8/9 feature checks were enabled even if the toolchain didn't support AT_HWCAP2
* Add detection if we need to include <linux/auxvec.h>
2024-01-30 20:49:32 +01:00
Vladislav Shchapov
1aa53f40fc Improve x86 intrinsics dependencies.
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-01-25 10:21:49 +01:00
Vladislav Shchapov
44e6bfcc5b Remove unused macro X86_MASK_INTRIN.
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2024-01-25 10:21:49 +01:00
Mika Lindqvist
b7fc54ef87 Make sure uqsub16 mnemonic doesn't get optimized away. 2023-12-25 20:44:58 +01:00
Hans Kristian Rosbach
0b080ede77 Always run CMake tests without LTO. 2023-12-24 16:01:42 +01:00
Yoshiki Matsuda
1003ae6b6a Fix clang-cl warnings 2023-11-28 10:25:13 +01:00
Hajin Jang
f9228d8475 Support llvm-mingw toolchain
zlib-ng requires some patches to make it compilable on LLVM-mingw.
1. Add -Wno-pedantic-ms-format only if a toolchain is MinGW GCC.
- llvm-mingw does not support it, causing build to break.
2. Include arm_neon.h instead of arm64_neon.h (aarch64 only).
- arm64_neon.h is MSVC only.
- GCC, Clang does not have arm64_neon.h but arm_neon.h on aarch64.
- Also applied to configure and detect-instrinsics.cmake
2023-09-28 00:15:12 +02:00
Nathan Moinvaziri
31497b545c Don't run test intrinsic code with native flag in CMake.
Native flag should already determine what code will run on the architecture.
This appears to have just been an extra run check with limited benefits. Any
compiler that compiles code not available on the native platform is buggy and
not our problem.
2023-09-19 17:32:07 +02:00
Deniz Bahadir
3eb7cd2d8a Match CMAKE_GENERATOR_TOOLSET variable case-insensitive
The Visual Studio CMake generator allows to select different toolsets.
One of these toolsets is Clang-Cl.

However, the generator does accept the toolset name case-agnostic, so it
could be "ClangCl", but also "Clangcl" or "clangcl" or ...
This value will be stored verbatim in variable CMAKE_GENERATOR_TOOLSET
by CMake. Therefore, this variable must be matched case-insensitive,
which is what this commit does.

fixes: #1576

Signed-off-by: Deniz Bahadir <deniz@code.bahadir.email>
2023-09-16 11:12:01 +02:00
Cameron Cawley
16fe1f885e Add ARMv6 version of slide_hash 2023-09-16 11:11:18 +02:00
Cameron Cawley
1c1e728637 Use GCC cpuid intrinsics with MinGW 2023-09-16 11:08:25 +02:00
Nathan Moinvaziri
7ecbaa25fc Use consistent NEON_AVAILABLE variable across CMake/configure. 2023-09-13 11:55:01 +02:00
Harmen Stoppels
ca2d4e5adc cast _xgetbv to int to silence conversion warning 2023-09-13 11:54:42 +02:00
Harmen Stoppels
120fe069d3 Do the same for detect-intrinsics.cmake 2023-09-13 11:54:42 +02:00
Nathan Moinvaziri
ca7573297a Clean up extra whitespaces at line endings in check_rvv_intrinsics. 2023-08-13 17:53:01 +02:00
Nathan Moinvaziri
c7d98c239a Remove inert check for HAVE_ACLE_FLAG in check_acle_compiler_flag. 2023-08-13 17:53:01 +02:00
Hans Kristian Rosbach
4894be9c93 Move check_c_source_compile_or_run cmake macro to the only place it is used. 2023-08-06 10:17:24 +02:00
Hans Kristian Rosbach
2167377c46 Clean up SSE4.2 support, and no longer use asm fallback or gcc builtin.
Defines changing meaning:
X86_SSE42 used to mean the compiler supports crc asm fallback.
X86_SSE42_CRC_INTRIN used to mean compiler supports SSE4.2 intrinsics.

X86_SSE42 now means compiler supports SSE4.2 intrinsics.

This therefore also fixes the adler32_sse42 checks, since those were depending
on SSE4.2 intrinsics but was mistakenly checking the X86_SSE42 define.
Now the X86_SSE42 define actually means what it appears to.
2023-08-06 10:17:24 +02:00
David Korth
8976caa3f0 Handle ARM64EC as ARM64.
ARM64EC is a new ARM64 variant introduced in Windows 11 that uses an
ABI similar to AMD64, which allows for better interoperability with
emulated AMD64 applications. When enabled in MSVC, it defines _M_AMD64
and _M_ARM64EC, but not _M_ARM64, so we need to check for _M_ARM64EC.
2023-07-16 12:42:38 +02:00
Mika T. Lindqvist
7cda3bf660 Use endianess-specific built-in function for gcc < 12 on PowerPC64
* Add support for cross-compiling using clang 13 and later for PowerPC64 little-endian and big-endian
* Fix detection for availability of Power9 intrinsics
2023-06-23 19:43:34 +02:00
Hans Kristian Rosbach
362945baec Fix the same AVX512 error in CMake. 2023-05-13 22:57:47 +02:00
Hans Kristian Rosbach
f2da905287 Fix AVX512-VNNI compile flags. 2023-05-13 20:15:00 +02:00
Alex Chiang
c3cdf434f3 Add supporting RISC-V cross compilation workflows
Add RISC-V cross-compilation test
Enable RVV support at compile time
2023-05-12 16:57:32 +02:00
Cameron Cawley
b1aafe5c67 Clean up SSE4.2 detection 2023-04-15 15:22:36 +02:00
Cameron Cawley
b09215f75a Enable use of _mm_shuffle_epi8 on machines without SSE4.1 2023-04-01 17:27:49 +02:00
Georgiy Manuilov
a4d9d697b3 Enable using AVX512 intrinsics with GCC <9
Replace missing '_mm512_set_epi8' with
'_mm512_set_epi32' in test code for configuring;
Add fallback for '-mtune=cascadelake' flag used
when AVX512 is enabled.
2023-03-28 20:36:19 +02:00
Ilya Leoshkevich
b8c2114d51 IBM zSystems: Use HWCAP_S390_VXRS
glibc defines HWCAP_S390_VX and, since v2.33, its alias
HWCAP_S390_VXRS; musl has only HWCAP_S390_VXRS.

Use the common HWCAP_S390_VXRS, define it as HWCAP_S390_VX if
necessary.
2023-03-10 13:14:09 +01:00
Mika Lindqvist
b892331cf7 Fix MinGW build
* Add detection of XSAVE intrinsics
2023-02-02 17:34:12 +01:00
Dimitri Papadopoulos
9119de005b Fix typo found by codespell 2023-02-02 16:44:00 +01:00
Piotr Kubaj
0a59b4e745 Add FreeBSD/powerpc* support to cmake/detect-intrinsics.cmake 2023-01-13 20:23:15 +01:00
Vladislav Shchapov
b57e10d316 Fix AVX2 detect
Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
2022-10-11 21:25:02 +02:00
Hans Kristian Rosbach
6490b70c48 vpclmulqdq compilation fails without avx512f also enabled 2022-10-09 11:36:03 +02:00
Shawn Hoffman
ece74eec32 msvc/armv7: disable crc32_acle
msvc compiler targeting 32bit arm supports
only armv7 and lacks these intrinsics
2022-09-26 20:09:53 +02:00
Shawn Hoffman
8098fde200 fix ACLE detection on msvc/arm64 2022-09-05 11:26:37 +02:00
Mika Lindqvist
c62b35ffac [ARM] We need to include NEON headers when testing for -mfpu=neon.
* If -mfpu is already specified in C_FLAGS, it can disable NEON support.
2022-06-02 12:25:24 +02:00
Matheus Castanho
02d10b252c Implement power9 version of compare256.
Co-authored-by: Nathan Moinvaziri <nathan@nathanm.com>
2022-05-07 14:06:42 +02:00