DBCSR is a library designed to efficiently perform sparse matrix-matrix multiplication, among other operations. It is MPI and OpenMP parallel and can exploit Nvidia and AMD GPUs via CUDA and HIP. DBCSR was developed as a part of CP2K, where it provides core functionality for linear scaling electronic structure theory. It is now released as a standalone library for integration in other projects. This requires a MPI implementation, however the package isn't working with mpich. Use openmpi instead. * HIP and OpenCL still experimental