LIBXSMM is a library for specialized dense and sparse matrix 
operations as well as for deep learning primitives such as small 
convolutions. The library is targeting Intel Architecture with 
Intel SSE, Intel AVX, Intel AVX2, Intel AVX512 (with VNNI and 
Bfloat16), and Intel AMX (Advanced Matrix Extensions) supported by 
future Intel processor code-named Sapphire Rapids. Code generation 
is mainly based on Just-In-Time (JIT) code specialization for 
compiler-independent performance (matrix multiplications, matrix 
transpose/copy, sparse functionality, and deep learning). 
LIBXSMM is suitable for "build once and deploy everywhere", 
i.e., no special target flags are needed to exploit the available 
performance. Supported GEMM datatypes are: FP64, FP32, bfloat16, 
int16, and int8.

NOTE: the library does not support 32-bit architecture (64-bit only)